# CSV Files

A file with the CSV file extension is a Comma Separated Values file. All CSV files are plain text, contain alphanumeric characters, and structure the data contained within them in a tabular form.

Files in the CSV format are generally used to exchange data, usually when there's a large amount, between different applications. Database programs, analytical software, and other applications that store massive amounts of information (like contacts and customer data), will usually support the CSV format.

In this notebook, we will work with the built-in Python library. The most popular 3rd party library is `pandas`.

In [1]:
import csv

# Reading CSV Files

When passing in the file path, make sure to include the extension if it has one

In [2]:
example = open('../input/pdfs-and-csvs/example.csv')
example

<_io.TextIOWrapper name='../input/pdfs-and-csvs/example.csv' mode='r' encoding='UTF-8'>

After opening the file, Python must read it using the `read` method.

In [3]:
# Reading the 

csv_data = csv.reader(example)

# Encoding

Often csv files may contain characters that you can't interpret with standard python, this could be something like an @ symbol, or even foreign characters. 

In [4]:
# Will not work

data_lines = list(csv_data)

# Might result in an error or an empty list
data_lines[:5]

[['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city'],
 ['1',
  'Joseph',
  'Zaniolini',
  'jzaniolini0@simplemachines.org',
  'Male',
  '163.168.68.132',
  'Pedro Leopoldo'],
 ['2',
  'Freida',
  'Drillingcourt',
  'fdrillingcourt1@umich.edu',
  'Female',
  '97.212.102.79',
  'Buri'],
 ['3',
  'Nanni',
  'Herity',
  'nherity2@statcounter.com',
  'Female',
  '145.151.178.98',
  'Claver'],
 ['4',
  'Orazio',
  'Frayling',
  'ofrayling3@economist.com',
  'Male',
  '25.199.143.143',
  'Kungur']]

Let's not try reading it with a `utf-8 encoding`, as mentioned after opening the csv file.

In [5]:
example = open('../input/pdfs-and-csvs/example.csv',encoding="utf-8")
csv_data = csv.reader(example)
data_lines = list(csv_data)

In [6]:
# Lets see if it worked

data_lines[:3]

[['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city'],
 ['1',
  'Joseph',
  'Zaniolini',
  'jzaniolini0@simplemachines.org',
  'Male',
  '163.168.68.132',
  'Pedro Leopoldo'],
 ['2',
  'Freida',
  'Drillingcourt',
  'fdrillingcourt1@umich.edu',
  'Female',
  '97.212.102.79',
  'Buri']]

Note the first item in the list is the header line, this contains the information about what each column represents. Let's format our printing just a bit:

In [7]:
for line in data_lines[:5]:
    print(line)

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']
['1', 'Joseph', 'Zaniolini', 'jzaniolini0@simplemachines.org', 'Male', '163.168.68.132', 'Pedro Leopoldo']
['2', 'Freida', 'Drillingcourt', 'fdrillingcourt1@umich.edu', 'Female', '97.212.102.79', 'Buri']
['3', 'Nanni', 'Herity', 'nherity2@statcounter.com', 'Female', '145.151.178.98', 'Claver']
['4', 'Orazio', 'Frayling', 'ofrayling3@economist.com', 'Male', '25.199.143.143', 'Kungur']


In [8]:
# Grabbing headers

data_lines[0]

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']

In [9]:
# Grabbing an e-mail

data_lines[10][3]

'hgasquoine9@google.ru'

In [10]:
len(data_lines)

1001

In [11]:
# Lets try and grab all the e-mails in the list [1:15].

all_emails = []
for line in data_lines[1:15]:
    
    #line[3] since emails are on the 4th "column"
    all_emails.append(line[3])
    
all_emails

['jzaniolini0@simplemachines.org',
 'fdrillingcourt1@umich.edu',
 'nherity2@statcounter.com',
 'ofrayling3@economist.com',
 'jmurrison4@cbslocal.com',
 'lgamet5@list-manage.com',
 'dhowatt6@amazon.com',
 'kherion7@amazon.com',
 'chedworth8@china.com.cn',
 'hgasquoine9@google.ru',
 'ftarra@shareasale.com',
 'abathb@umn.edu',
 'lchastangc@goo.gl',
 'cceried@yale.edu']

We can concatenante two lines in order to create a single list:

In [12]:
full_names = []

for line in data_lines[1:15]:
    full_names.append(line[1]+' '+line[2])
    
full_names

['Joseph Zaniolini',
 'Freida Drillingcourt',
 'Nanni Herity',
 'Orazio Frayling',
 'Julianne Murrison',
 'Lucy Gamet',
 'Dyana Howatt',
 'Kassey Herion',
 'Chrissy Hedworth',
 'Hyatt Gasquoine',
 'Felicdad Tarr',
 'Andrew Bath',
 'Lucais Chastang',
 'Car Cerie']

# Writing to CSV Files

We can also write csv files, either new ones or add on to existing ones.

For this exercise, we will create a new file.

In [13]:
# newline controls how universal newlines works (it only applies to text mode).
# It can be None, '', '\n', '\r', and '\r\n'. 
file_to_output = open('to_save_file.csv',mode = "w" ,newline="")

In [14]:
csv_writer = csv.writer(file_to_output,delimiter=',')

In [15]:
# Write a single row

csv_writer.writerow(['a','b','c'])

7

In [16]:
# Write multiple rows
# It has to match up with the ['a','b','c'] list already created

csv_writer.writerows([['1','2','3'],['4','5','6']])

In [17]:
# Don't forget to close

file_to_output.close()

In order to write to a file, we can simply use mode `a` to append.

In [18]:
f = open('to_save_file.csv','a',newline='')

csv_writer = csv.writer(f)

csv_writer.writerow(['new','new','new'])

13

In [19]:
f.close()