# CSV files

CSV files are a common format for storing data in a tabular format. They are often used for data analysis and machine learning. Data is stored with a comma as a separator between columns and a new line as a separator between rows. The extension for CSV files is `.csv`. If you double click on a CSV file, it will open in a spreadheet program like Excel or Google Sheets depending on where the file is stored.

Let us read a CSV file using the `open` function:

In [None]:
f = open("country.csv")
contents = f.readlines()

for line in contents:
    print(line)

f.close()

Name,Continent,Population

Afghanistan,Asia,22720000

Albania,Europe,3401200

Algeria,Africa,31471000

American Samoa,Oceania,68000

Andorra,Europe,78000

Angola,Africa,12878000

Anguilla,North America,8000

Antarctica,Antarctica,0

Antigua and Barbuda,North America,68000

Argentina,South America,37032000


We see that the `readlines()` function reads the entire file, one line at a time. In the loop, we see that each time, the variable `line` is a string that has the contents of the line. What we would like to do is to split the contents cell by cell. We can do this by using the `split()` function:

In [2]:
f = open("country.csv")
contents = f.readlines()

for line in contents:
    print(line.split(','))

f.close()

['Name', 'Continent', 'Population\n']
['Afghanistan', 'Asia', '22720000\n']
['Albania', 'Europe', '3401200\n']
['Algeria', 'Africa', '31471000\n']
['American Samoa', 'Oceania', '68000\n']
['Andorra', 'Europe', '78000\n']
['Angola', 'Africa', '12878000\n']
['Anguilla', 'North America', '8000\n']
['Antarctica', 'Antarctica', '0\n']
['Antigua and Barbuda', 'North America', '68000\n']
['Argentina', 'South America', '37032000']


We split on the comma because we know that in the CSV file the data is separated by commas. 

Instead of us having to worry about the commas, we can use the read_csv function to do this for us:

In [9]:
import csv

f = open("country.csv")
reader = csv.reader(f)
for row in reader:
    print(row)
f.close()

['Name', 'Continent', 'Population']
['Afghanistan', 'Asia', '22720000']
['Albania', 'Europe', '3401200']
['Algeria', 'Africa', '31471000']
['American Samoa', 'Oceania', '68000']
['Andorra', 'Europe', '78000']
['Angola', 'Africa', '12878000']
['Anguilla', 'North America', '8000']
['Antarctica', 'Antarctica', '0']
['Antigua and Barbuda', 'North America', '68000']
['Argentina', 'South America', '37032000']


Notice that the reader handled the splitting for us. If we want to write to the file, we can also use a csv writer which will handle the formatting for us:

In [None]:
to_add = ['Canada', 'North America', '40100000']

f = open("country.csv", "a", newline='')
writer = csv.writer(f)
writer.writerow(to_add)
f.close()

Let us now read the contents of the file again to see if the row that we added is there:

In [None]:
f = open("country.csv")
reader = csv.reader(f)
for row in reader:
    print(row)
f.close()

['Name', 'Continent', 'Population']
['Afghanistan', 'Asia', '22720000']
['Albania', 'Europe', '3401200']
['Algeria', 'Africa', '31471000']
['American Samoa', 'Oceania', '68000']
['Andorra', 'Europe', '78000']
['Angola', 'Africa', '12878000']
['Anguilla', 'North America', '8000']
['Antarctica', 'Antarctica', '0']
['Antigua and Barbuda', 'North America', '68000']
['Argentina', 'South America', '37032000']
['Canada', 'North America', '40100000']
