In Python, a file operation takes place in the following order.
- Open a file
- Read or write (perform operation)
- Close the file

## How to open a file?
Python has a built-in function open() to open a file. This function **returns a file object**, also called a handle, as it is used to read or modify the file accordingly.

In [3]:
f = open("test.txt")    # open file in current directory


We can specify the mode while opening a file. In mode, we specify whether we want to read 'r', write 'w' or append 'a' to the file. We also specify if we want to open the file in text mode or binary mode.

The default is reading in text mode. In this mode, we get strings when reading from the file.


In [7]:
f = open("test.txt")      # equivalent to 'r' or 'rt'
f = open("test_w.txt",'w')  # write in text mode


Unlike other languages, the character 'a' does not imply the number 97 until it is encoded using ASCII (or other equivalent encodings). <br>

Moreover, the default encoding is platform dependent. In windows, it is 'cp1252' but 'utf-8' in Linux.  <br>
So, we must not also rely on the default encoding or else our code will behave differently in different platforms.  <br>

Hence, when working with files in text mode, it is highly recommended to specify the encoding type.

In [8]:
f = open("test.txt",mode = 'r',encoding = 'utf-8')
f.close()

## How to close a file Using Python?
When we are done with operations to the file, we need to properly close the file.
Closing a file will free up the resources that were tied with the file and is done using Python close() method.
Python has a garbage collector to clean up unreferenced objects but, we must not rely on it to close the file.

In [12]:
f = open("test.txt",encoding = 'utf-8')
# perform file operations

f.close()

This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.
A safer way is to use a try...finally block.

In [14]:
try:
    f = open("test.txt",encoding = 'utf-8')
    # perform file operations
finally:
    f.close()

This way, we are guaranteed that the file is properly closed even if an exception is raised, causing program flow to stop.
The best way to do this is using the with statement. **This ensures that the file is closed when the block inside with is exited.**
We don't need to explicitly call the close() method. It is done internally.

In [16]:
with open("test.txt",encoding = 'utf-8') as f:
    line = f.readline()
    print(line)

my first file



## How to write to File Using Python?
In order to write into a file in Python, we need to open it in write 'w', append 'a' or exclusive creation 'x' mode.
We need to be careful with the 'w' mode as it will overwrite into the file if it already exists. All previous data are erased.
Writing a string or sequence of bytes (for binary files) is done using write() method. This method returns the number of characters written to the file.

In [17]:
with open("test1.txt",'w',encoding = 'utf-8') as f:
    f.write("my first file\n")
    f.write("This file\n\n")
    f.write("contains three lines\n")

## How to read files in Python?
To read a file in Python, we must open the file in reading mode.
There are various methods available for this purpose. We can use the read(size) method to read in size number of data. If size parameter is not specified, it reads and returns up to the end of the file.

In [18]:
f = open("test1.txt",'r',encoding = 'utf-8')
f.read(4)    # read the first 4 data

'my f'

In [19]:
f.read(4)    # read the next 4 data

'irst'

In [20]:
f.read()     # read in the rest till end of file

' file\nThis file\n\ncontains three lines\n'

In [21]:
f.read()  # further reading returns empty sting

''

We can see that, the read() method returns newline as '\n'. Once the end of file is reached, we get empty string on further reading.

**We can change our current file cursor (position) using the seek() method. Similarly, the tell() method returns our current position (in number of bytes).**

In [26]:
f = open("test1.txt",'r',encoding = 'utf-8')
print(f.tell())    # get the current file position
f.seek(0)   # bring file cursor to initial position
print(f.read())  # read the entire file
f.close()    

0
my first file
This file

contains three lines



We can read a file line-by-line using a for loop. This is both efficient and fast.

In [27]:
f = open("test1.txt",'r',encoding = 'utf-8')
for line in f:
    print(line, end = '')
f.close()    

my first file
This file

contains three lines


The lines in file itself has a newline character '\n'.
Moreover, the print() end parameter to avoid two newlines when printing.
Alternately, we can use readline() method to read individual lines of a file. This method reads a file till the newline, including the newline character.

In [28]:
f = open("test1.txt",'r',encoding = 'utf-8')
f.readline()

'my first file\n'

In [29]:
f.readline()

'This file\n'

In [30]:
f.readline()

'\n'

In [31]:
f.readline()

'contains three lines\n'

## Reading CSV Files With csv
Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.

Here’s the employee_birthday.txt file:

In [26]:
import csv

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

Column names are name, department, birthday month
	John Smith works in the Accounting department, and was born in November.
	Erica Meyers works in the IT department, and was born in March.
Processed 3 lines.


**Each row returned by the reader is a list of String elements containing the data found by removing the delimiters.** The first row returned contains the column names, which is handled in a special way.

## Writing CSV Files With csv
You can also write to a CSV file using a writer object and the .write_row() method:

In [28]:
import csv

with open('employee_file.csv', mode='w') as employee_file:
    employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    employee_writer.writerow(['John Smith', 'Accounting', 'November'])
    employee_writer.writerow(['Erica Meyers', 'IT', 'March'])

## Parsing CSV Files With the pandas Library

**Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.**

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Installing pandas and its dependencies in Anaconda is easily done:

```$ conda install pandas```

As is using pip/pipenv for other Python installations:

```$ pip install pandas```

We won’t delve into the specifics of how pandas works or how to use it. For an in-depth treatment on using pandas to read and analyze large data sets, check out Shantnu Tiwari’s superb article on working with large Excel files in pandas.

In [32]:
import pandas
df = pandas.read_csv('hrdata.csv')
print(df)

             Name Hire Date   Salary  Sick Days remaining
0  Graham Chapman  03/15/14  50000.0                   10
1     John Cleese  06/01/15  65000.0                    8
2       Eric Idle  05/12/14  45000.0                   10
3     Terry Jones  11/01/13  70000.0                    3
4   Terry Gilliam  08/12/14  48000.0                    7
5   Michael Palin  05/23/13  66000.0                    8


In [33]:
print(type(df['Hire Date'][0]))

<class 'str'>


In [34]:
import pandas
df = pandas.read_csv('hrdata.csv', index_col='Name')
print(df)

               Hire Date   Salary  Sick Days remaining
Name                                                  
Graham Chapman  03/15/14  50000.0                   10
John Cleese     06/01/15  65000.0                    8
Eric Idle       05/12/14  45000.0                   10
Terry Jones     11/01/13  70000.0                    3
Terry Gilliam   08/12/14  48000.0                    7
Michael Palin   05/23/13  66000.0                    8


Next, let’s fix the data type of the Hire Date field. You can force pandas to read data as a date **with the parse_dates optional parameter**, which is defined as a list of column names to treat as dates:

In [35]:
import pandas
df = pandas.read_csv('hrdata.csv', index_col='Name', parse_dates=['Hire Date'])
print(df)

                Hire Date   Salary  Sick Days remaining
Name                                                   
Graham Chapman 2014-03-15  50000.0                   10
John Cleese    2015-06-01  65000.0                    8
Eric Idle      2014-05-12  45000.0                   10
Terry Jones    2013-11-01  70000.0                    3
Terry Gilliam  2014-08-12  48000.0                    7
Michael Palin  2013-05-23  66000.0                    8


In [36]:
print(type(df['Hire Date'][0]))

<class 'pandas._libs.tslibs.timestamps.Timestamp'>


If your CSV files doesn’t have column names in the first line, you can use the **names optional parameter** to provide a list of column names. You can also use this if you want to override the column names provided in the first line. In this case, you must also tell pandas.read_csv() to ignore existing column names using the header=0 optional parameter:

In [37]:
import pandas
df = pandas.read_csv('hrdata.csv', 
            index_col='Employee', 
            parse_dates=['Hired'], 
            header=0, 
            names=['Employee', 'Hired','Salary', 'Sick Days'])
print(df)

                    Hired   Salary  Sick Days
Employee                                     
Graham Chapman 2014-03-15  50000.0         10
John Cleese    2015-06-01  65000.0          8
Eric Idle      2014-05-12  45000.0         10
Terry Jones    2013-11-01  70000.0          3
Terry Gilliam  2014-08-12  48000.0          7
Michael Palin  2013-05-23  66000.0          8


## Writing CSV Files With pandas
Of course, if you can’t get your data out of pandas again, it doesn’t do you much good. Writing a DataFrame to a CSV file is just as easy as reading one in. Let’s write the data with the new column names to a new CSV file:

In [38]:
import pandas
df = pandas.read_csv('hrdata.csv', 
            index_col='Employee', 
            parse_dates=['Hired'],
            header=0, 
            names=['Employee', 'Hired', 'Salary', 'Sick Days'])
df.to_csv('hrdata_modified.csv')