 Python has a CSV module that exists to make reading from CSV files easier.

The CSV module defines readers which allow you to read each row. There are two options:

reader: reads a row and returns the values as a list of strings

DictReader: read each row and return the values as an ordered dictionary where the first row is used as the keys.

An ordered dictionary is just like a dictionary, except that the order of the elements is important. For example, if it were important to you that the name element in your dictionary appeared before the age element, then an ordered dictionary would keep the ordering. Usually, with dictionaries, you don't care about the order the values are stored, but DictReader returns an ordered dictionary, in case it is important to your task. You access the ordered dictionary in the same way you access a dictionary.

Choose the reader that matches how you want to store your data (list or dictionary).

In [1]:
# example with reader - creates a list for each row
import csv
dataFile = open('customer.csv')

reader = csv.reader(dataFile)

for row in reader:
    print(row)
    
dataFile.close

['锘縩ame', 'age', 'postcode']
['John', '52', '5002']
['Ye', '18', '3005']
['Siobhan', '34', '2356']


<function TextIOWrapper.close()>

An even safer way of opening your data file is to use with-as.

In [2]:
# example using with-as
import csv
with open('customer.csv') as dataFile:
    reader = csv.reader(dataFile)
    for row in reader:
        print(row)

['锘縩ame', 'age', 'postcode']
['John', '52', '5002']
['Ye', '18', '3005']
['Siobhan', '34', '2356']


This has the same effect of opening your file, but if there is an error in reading or writing your file that causes your program to stop, Python automatically closes the file for you.

In [3]:
# example with DicRader - creates an ordered dictionary for each row
import csv

with open('customer.csv') as dataFile:
    dreader = csv.DictReader(dataFile)
    for row in dreader:
        print(row)

OrderedDict([('锘縩ame', 'John'), ('age', '52'), ('postcode', '5002')])
OrderedDict([('锘縩ame', 'Ye'), ('age', '18'), ('postcode', '3005')])
OrderedDict([('锘縩ame', 'Siobhan'), ('age', '34'), ('postcode', '2356')])


## What is \ufeff?

Note that **\ufeff** has appeared before the first header field in the example (you may not have this depending on what application you used to create your file). This character is a special indicator of how the file has been encoded. You can remove this by specifying the encoding when you open the file, but unfortunately there isn't any way of easily detecting how the file has been encoded. Whenever you are reading files, check that your data does not contain extra encoding information. In general, this will appear at the start of the file, so you'll pick it up if you look at the first data item in the first row. If you have some unusual values or errors in your data results, you should consider errors in reading the data in as a possible cause.

A quick online search of '\ufeff' reveals that this is the start of a 'utf-8-sig' encoded file. utf-8 is common encoding from Microsoft products. You can add the encoding parameter to the open call to read in this encoding and remove the \ufeff from the data.

In [4]:
# including encoding in the open()
# example with DictRader - creates an ordered dictionary for each row

import csv

with open('customer.csv', encoding='utf-8-sig') as dataFile:
    dreader = csv.DictReader(dataFile)
    for row in dreader:
        print(row)

OrderedDict([('name', 'John'), ('age', '52'), ('postcode', '5002')])
OrderedDict([('name', 'Ye'), ('age', '18'), ('postcode', '3005')])
OrderedDict([('name', 'Siobhan'), ('age', '34'), ('postcode', '2356')])
