#**Working with Files**

**What will you learn?**
1. **Text Files** : Opening, Reading and Writing
2. **CSV Files** : Opening, Reading and Writing

##**Text Files**

###**Opening Text Files**

####**open()**

The key function for working with files in Python is the open() function.

The open() function takes two parameters; filename, and mode.

There are four different methods (modes) for opening a file:

1. **"r"** - Read - Default value. Opens a file for reading, error if the file does not exist
2. **"a"** - Append - Opens a file for appending, creates the file if it does not exist
3. **"w"** - Write - Opens a file for writing, creates the file if it does not exist
4. **"x"** - Create - Creates the specified file, returns an error if the file exists

In addition you can specify if the file should be handled as binary or text mode

1. **"t"** - Text - Default value. Text mode
2. **"b"** - Binary - Binary mode (e.g. images)

In [None]:
!wget https://files.codingninjas.in/sample-7766.txt                            ## Downloading the sample text file from our server

--2021-02-01 19:14:31--  https://files.codingninjas.in/sample-7766.txt
Resolving files.codingninjas.in (files.codingninjas.in)... 52.84.161.83, 52.84.161.68, 52.84.161.34, ...
Connecting to files.codingninjas.in (files.codingninjas.in)|52.84.161.83|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6737 (6.6K)
Saving to: ‘sample-7766.txt’


2021-02-01 19:14:32 (642 MB/s) - ‘sample-7766.txt’ saved [6737/6737]



In [None]:
file_obj = open('sample-7766.txt', 'r')                    ## If you want to open a file saved on your PC, you need to define its path

###**Reading Text Files**

####**read()**

By default the read() method returns the whole text

In [None]:
file_data = file_obj.read()
print(type(file_data))
file_data

<class 'str'>


'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.\nDonec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu. Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Phasellus dictum 

But you can also specify how many characters you want to return:

In [None]:
file_obj = open('sample-7766.txt', 'r')     
file_data = file_obj.read(100)
file_data

'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet'

####**readline()**

You can return one line by using the readline() method:

In [None]:
file_obj = open('sample-7766.txt', 'r')     
file_data = file_obj.readline()     ## Reads first line
print(file_data)

## Reads next two lines
print('Next Two Lines')
print()
print(file_obj.readline())          ## Second Line
print(file_obj.readline())          ## Third

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et erat at mauris tincidunt porttitor vitae ac dui.

Next Two Lines

Donec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu. Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Ph

####**readlines()**

You can return all lines by using the readline() method:

In [None]:
file_obj = open('sample-7766.txt', 'r')   
file_data = file_obj.readlines()
print(type(file_data))
print(len(file_data))
print(file_data[3])       ## Printing the 3rd line from list of lines
file_obj.close()          ## Closing the file

<class 'list'>
14
Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed feugiat semper velit consequat facilisis. Etiam facilisis justo non iaculis dictum. Fusce turpis neque, pharetra ut odio eu, hendrerit rhoncus lacus. Nunc orci felis, imperdiet vel interdum quis, porta eu ipsum. Pellentesque dictum sem lacinia, auctor dui in, malesuada nunc. Maecenas sit amet mollis eros. Proin fringilla viverra ligula, sollicitudin viverra ante sollicitudin congue. Donec mollis felis eu libero malesuada, et lacinia risus interdum.



In [None]:
file_data = file_obj.readlines(100)        ## Error generator because we closed the file

ValueError: ignored

###**Shorthand for files**

Using this shorthand, we avoid using the close() function with the file objects.

In [None]:
with open('sample-7766.txt', 'r') as file_obj :
    file_data = file_obj.readlines()
    
print(file_data[1])

The scope of the file object is inside the 'with' block only. Hence we need not use the close() function separately now.

##**CSV Files**

We may use the readlines() functions to read a CSV file too, but the problem with that is, it retreives each row as one bit string. Hence separating the columns becomes a difficult task.

In [None]:
!wget https://files.codingninjas.in/year2017-7767.csv

In [None]:
with open('year2017-7767.csv', 'r') as file_obj :
    file_data = file_obj.readlines()
    
print(file_data[:5])

To open and read in a more efficient manner, we use the CSV module of python.

###**Opening and Reading CSV Files**

####**csv.reader()**

In [None]:
import csv
with open('year2017-7767.csv') as file_obj :
    file_data = csv.reader(file_obj)
    
    for row in file_data :
        print(row)
    
print(type(file_data))

Now, we are getting each row as a separate list, with each column as a different element inside that list.

####**CSV files with Custom Delimiters**

By default, a comma is used as a delimiter in a CSV file. However, some CSV files can use delimiters other than a comma. Few popular ones are | and \t.

In [None]:
!wget https://files.codingninjas.in/sample_delim-7772.csv

--2021-02-01 19:27:36--  https://files.codingninjas.in/sample_delim-7772.csv
Resolving files.codingninjas.in (files.codingninjas.in)... 52.84.161.83, 52.84.161.34, 52.84.161.46, ...
Connecting to files.codingninjas.in (files.codingninjas.in)|52.84.161.83|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 326
Saving to: ‘sample_delim-7772.csv’


2021-02-01 19:27:37 (77.3 MB/s) - ‘sample_delim-7772.csv’ saved [326/326]



**Without Delimiter**

In [None]:
import csv
with open('sample_delim-7772.csv') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

['Year|   Month|    Day|    Country|    Region|    City']
['2017|   1|    2|    Afghanistan|    Region|    Takhta Pul']
['2017|   1|    3|    Sudan|    Sub-Saharan Africa|    Fantaga']
['2017|   1|    1|    Democratic Republic of the Congo|    Region|    Sabako']
['2017|   1|    1|    Democratic Republic of the Congo|    Region|    Bialee']


Each column is retrieved as a single string, we do not want that.

**With Delimiter**

In [None]:
import csv
with open('sample_delim-7772.csv') as file:
    reader = csv.reader(file, delimiter = '|')
    for row in reader:
        print(row)

['Year', '   Month', '    Day', '    Country', '    Region', '    City']
['2017', '   1', '    2', '    Afghanistan', '    Region', '    Takhta Pul']
['2017', '   1', '    3', '    Sudan', '    Sub-Saharan Africa', '    Fantaga']
['2017', '   1', '    1', '    Democratic Republic of the Congo', '    Region', '    Sabako']
['2017', '   1', '    1', '    Democratic Republic of the Congo', '    Region', '    Bialee']


Now, we have separated each column successfully. But did you notice that there is some initial space before each column entry. Lets see how we can remove that.

####**CSV files with initial spaces**

Some CSV files can have a space character after a delimiter. When we use the default csv.reader() function to read these CSV files, we will get spaces in the output as well.

To remove these initial spaces, we need to pass an additional parameter called skipinitialspace.

In [None]:
import csv
with open('sample_delim-7772.csv', ) as csvfile:
    reader = csv.reader(csvfile, delimiter = '|', skipinitialspace=True)
    for row in reader:
        print(row)

['Year', 'Month', 'Day', 'Country', 'Region', 'City']
['2017', '1', '2', 'Afghanistan', 'Region', 'Takhta Pul']
['2017', '1', '3', 'Sudan', 'Sub-Saharan Africa', 'Fantaga']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Sabako']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Bialee']


####**Dialects in CSV module**

We have passed multiple parameters (quoting and skipinitialspace) to the csv.reader() function.

This practice is acceptable when dealing with one or two files. But it will make the code more redundant and ugly once we start working with multiple CSV files with similar formats.

As a solution to this, the csv module offers dialect as an optional parameter.

Dialect helps in grouping together many specific formatting patterns like delimiter, skipinitialspace, quoting, escapechar into a single dialect name.

It can then be passed as a parameter to multiple writer or reader instances.

In [None]:
import csv
csv.register_dialect('myDialect',
                     delimiter='|',
                     skipinitialspace=True)

with open('sample_delim-7772.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, dialect='myDialect')
    for row in reader:
        print(row)

['Year', 'Month', 'Day', 'Country', 'Region', 'City']
['2017', '1', '2', 'Afghanistan', 'Region', 'Takhta Pul']
['2017', '1', '3', 'Sudan', 'Sub-Saharan Africa', 'Fantaga']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Sabako']
['2017', '1', '1', 'Democratic Republic of the Congo', 'Region', 'Bialee']


####**csv.DictReader()**

The objects of a csv.DictReader() class can be used to read a CSV file as a dictionary.

In [None]:
import csv
with open('sample_delim-7772.csv', 'r') as file:
    csv_file = csv.DictReader(file, dialect='myDialect')
    for row in csv_file:
        print(dict(row))

{'Year': '2017', 'Month': '1', 'Day': '2', 'Country': 'Afghanistan', 'Region': 'Region', 'City': 'Takhta Pul'}
{'Year': '2017', 'Month': '1', 'Day': '3', 'Country': 'Sudan', 'Region': 'Sub-Saharan Africa', 'City': 'Fantaga'}
{'Year': '2017', 'Month': '1', 'Day': '1', 'Country': 'Democratic Republic of the Congo', 'Region': 'Region', 'City': 'Sabako'}
{'Year': '2017', 'Month': '1', 'Day': '1', 'Country': 'Democratic Republic of the Congo', 'Region': 'Region', 'City': 'Bialee'}
