# Reading/Writing data
- Now that we know the basics of handling data, let's learn how to import and export data in Python.
- **If you have any questions over the course of this lecture, please post them to the 'Day 2 Lecture Questions' assignment on the Canvas course page.**

## Reading files

In [1]:
file = open('data/hi.txt') # Navigate to the directory and open the file object
hello = file.read() #then use the method 'read' and save that object
file.close() # now close the object, as we have the data
print(hello)

This is how you load a file.

You are doing great!


In [2]:
 # the with function only keeps the file open as long as we are in the block. so there is no need to close it
    
with open('data/hi.txt','r') as file: # we don't need the 'r', but it is a common way to declare you are reading a file
    hi = file.read()

print(hi)

This is how you load a file.

You are doing great!


In [3]:
#file.read()

## Writing files

In [4]:
new_file = "Let's write a file!"

with open('new-file.txt', 'w') as f: # the 'w' in open let's the function know we are writing
    f.write(new_file)


In [5]:
with open('new-file.txt') as read_file:
    read = read_file.read()

print(read)

Let's write a file!


## Reading CSV

- There are multiple ways we can open CSV files.
    - Manually
    - And with packages
        - csv package
        - pandas package
- Our focus will stay on CSV files because almost any data file you get (.xlsx, .json, .xml) can easily be converted to a .csv using online tools.

### Manual loading

- Extra steps are involved when you manually load data.
    - You have to use a `for loop` to load data line by line.
    - You have to perform functions on the file so Python knows it is comma separated.
    - You need to perform functions on your file to make them clean.

- We are looking at data from the census bureau. 
    - 2019 Census estimates.
    - Looking at the estimated population of US states and territories. 

In [6]:
with open('data\SCPRC-EST2019-18+POP-RES.csv') as csv: # use the with function on the open function so our file closes automatically
    line = csv.readline() #create an object with the file we want
    pop_est = line.split(",") # tell Python it is comma separated values
    print(pop_est) # here is the first line

# notice it includes '\n' this a coding character which means new line. We need another function on our file to get rid of it

['SUMLEV', 'REGION', 'DIVISION', 'STATE', 'NAME', 'POPESTIMATE2019', 'POPEST18PLUS2019', 'PCNT_POPEST18PLUS\n']


In [7]:
with open('data\SCPRC-EST2019-18+POP-RES.csv') as csv:
    line = csv.readline().strip() # by adding the line .strip() we can get rid of the '\n'
    values = line.split(",")
    print(values)

['SUMLEV', 'REGION', 'DIVISION', 'STATE', 'NAME', 'POPESTIMATE2019', 'POPEST18PLUS2019', 'PCNT_POPEST18PLUS']


In [8]:
with open('data\SCPRC-EST2019-18+POP-RES.csv') as csv:
    for line in csv:    #create a for loop to go through every line in the document. 
        values = line.strip().split(",") # we can use multiple .functions at once
        print(values)


['SUMLEV', 'REGION', 'DIVISION', 'STATE', 'NAME', 'POPESTIMATE2019', 'POPEST18PLUS2019', 'PCNT_POPEST18PLUS']
['040', '3', '6', '01', 'Alabama', '4903185', '3814879', '77.8']
['040', '4', '9', '02', 'Alaska', '731545', '551562', '75.4']
['040', '4', '8', '04', 'Arizona', '7278717', '5638481', '77.5']
['040', '3', '7', '05', 'Arkansas', '3017804', '2317649', '76.8']
['040', '4', '9', '06', 'California', '39512223', '30617582', '77.5']
['040', '4', '8', '08', 'Colorado', '5758736', '4499217', '78.1']
['040', '1', '1', '09', 'Connecticut', '3565287', '2837847', '79.6']
['040', '3', '5', '10', 'Delaware', '973764', '770192', '79.1']
['040', '3', '5', '11', 'District of Columbia', '705749', '577581', '81.8']
['040', '3', '5', '12', 'Florida', '21477737', '17247808', '80.3']
['040', '3', '5', '13', 'Georgia', '10617423', '8113542', '76.4']
['040', '4', '9', '15', 'Hawaii', '1415872', '1116004', '78.8']
['040', '4', '8', '16', 'Idaho', '1787065', '1338864', '74.9']
['040', '2', '3', '17', 'Il

### Using CSV package

In [9]:
import csv
with open('data\SCPRC-EST2019-18+POP-RES.csv') as file: # open the file
    for i in csv.reader(file): # use for loop to have the csv function read each line
        print(i) # print to view
        
# you still have to use for loop which is not ideal

['SUMLEV', 'REGION', 'DIVISION', 'STATE', 'NAME', 'POPESTIMATE2019', 'POPEST18PLUS2019', 'PCNT_POPEST18PLUS']
['040', '3', '6', '01', 'Alabama', '4903185', '3814879', '77.8']
['040', '4', '9', '02', 'Alaska', '731545', '551562', '75.4']
['040', '4', '8', '04', 'Arizona', '7278717', '5638481', '77.5']
['040', '3', '7', '05', 'Arkansas', '3017804', '2317649', '76.8']
['040', '4', '9', '06', 'California', '39512223', '30617582', '77.5']
['040', '4', '8', '08', 'Colorado', '5758736', '4499217', '78.1']
['040', '1', '1', '09', 'Connecticut', '3565287', '2837847', '79.6']
['040', '3', '5', '10', 'Delaware', '973764', '770192', '79.1']
['040', '3', '5', '11', 'District of Columbia', '705749', '577581', '81.8']
['040', '3', '5', '12', 'Florida', '21477737', '17247808', '80.3']
['040', '3', '5', '13', 'Georgia', '10617423', '8113542', '76.4']
['040', '4', '9', '15', 'Hawaii', '1415872', '1116004', '78.8']
['040', '4', '8', '16', 'Idaho', '1787065', '1338864', '74.9']
['040', '2', '3', '17', 'Il

### pandas package

- As mentioned in the Python-Basics lecture, you must install this package using your Command Prompt.

- If you have not already, go to the your Command Prompt and input `pip install pandas`.

- We will explore pandas more soon.
    - As you can see it looks a lot cleaner than dictionaries

In [10]:
import pandas as pd

In [11]:
data = pd.read_csv('data\SCPRC-EST2019-18+POP-RES.csv') #this is a lot easier
data.head() #panda has a method for datasets that allows you to look at first ten observations



Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,NAME,POPESTIMATE2019,POPEST18PLUS2019,PCNT_POPEST18PLUS
0,40,3,6,1,Alabama,4903185,3814879,77.8
1,40,4,9,2,Alaska,731545,551562,75.4
2,40,4,8,4,Arizona,7278717,5638481,77.5
3,40,3,7,5,Arkansas,3017804,2317649,76.8
4,40,4,9,6,California,39512223,30617582,77.5


In [12]:
data

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,NAME,POPESTIMATE2019,POPEST18PLUS2019,PCNT_POPEST18PLUS
0,40,3,6,1,Alabama,4903185,3814879,77.8
1,40,4,9,2,Alaska,731545,551562,75.4
2,40,4,8,4,Arizona,7278717,5638481,77.5
3,40,3,7,5,Arkansas,3017804,2317649,76.8
4,40,4,9,6,California,39512223,30617582,77.5
5,40,4,8,8,Colorado,5758736,4499217,78.1
6,40,1,1,9,Connecticut,3565287,2837847,79.6
7,40,3,5,10,Delaware,973764,770192,79.1
8,40,3,5,11,District of Columbia,705749,577581,81.8
9,40,3,5,12,Florida,21477737,17247808,80.3


In [13]:
# you can read csv files from online sources
life = pd.read_csv("http://domlockett.github.io/website/Life_Expectancy.csv") 

life

Unnamed: 0,Time,Time Code,Country Name,Country Code,life_expect
0,1990,YR1990,Afghanistan,AFG,50.331
1,1990,YR1990,Albania,ALB,71.836
2,1990,YR1990,Algeria,DZA,66.938
3,1990,YR1990,American Samoa,ASM,
4,1990,YR1990,Andorra,AND,
...,...,...,...,...,...
3163,2019,YR2019,Sub-Saharan Africa,SSF,
3164,2019,YR2019,Sub-Saharan Africa (excluding high income),SSA,
3165,2019,YR2019,Sub-Saharan Africa (IDA & IBRD countries),TSS,
3166,2019,YR2019,Upper middle income,UMC,


In [14]:
#we can save files to csv
life[1:10].to_csv('new_file.csv')