# Chapter 7 How to Work with File I/O

# Objectives

![image.png](attachment:image.png)

# Two types of files

![image.png](attachment:image.png)

# The sequence of file operations

![image.png](attachment:image.png)

# The built-in open() function

![image.png](attachment:image.png)

# How to open a file in write mode and close the file manually

In [None]:
outfile = open("test.txt", "w") 
outfile.write("Test") 
outfile.close()

# Code that opens a text file in write mode and automatically closes it

In [None]:
with open("test.txt", "w") as outfile: 
    outfile.write("New info")

# Code that opens a text file in read mode and automatically closes it 

In [None]:
with open("test.txt", "r") as infile: 
    print(infile.readline())

# How to write one line to a text file

In [None]:
with open("members.txt", "w") as file: file.write("John Cleese\n")

# How to append one line to a text file

In [None]:
with open("members.txt", "a") as file: file.write("Eric Idle\n")

# How to use a for statement to read each line of the file

In [None]:
with open("members.txt") as file:
    for line in file: print(line, end="") 
    print()

#  Three read methods of a file object

![image.png](attachment:image.png)

# How to read the entire file as a string

In [None]:
with open("members.txt") as file: 
    contents = file.read() 
    print(contents)

In [None]:
with open("members.txt") as file: 
    contents = file.read() 
    print(contents)

# How to read the entire file as a list

In [None]:
with open("members.txt") as file: 
    members = file.readlines(); 
    print(members[0], end="") 
    print(members[1])

# How to read each line of the file

In [None]:
with open("members.txt") as file: 
    members = file.readlines(); 
    print(members[0], end="") 
    print(members[1])

# How to write the items in a list to a file

In [None]:
members = ["John Cleese", "Eric Idle"] 
with open("members.txt", "w") as file: 
    for m in members: 
        file.write(m + "\n")

# How to read the lines in a file into a list

In [None]:
members = [] 
with open("members.txt") as file: 
    for line in file: 
        line = line.replace("\n", "") 
        members.append(line) 
        print(members)

# How to write and read a list of numbers in a list to a file

In [None]:
years = [1975, 1979, 1983] 
with open("years.txt", "w") as years_file: 
    for year in years: 
        years_file.write(str(year) + "\n")

In [None]:
years = [1975, 1979, 1983] 
with open("years.txt", "w") as years_file: 
    for year in years: 
        years_file.write(year)

# How to read the items in a list from a file

In [None]:
years = [] 
with open("years.txt") as file: 
    for line in file: 
        line = line.replace("\n", "") 
        years.append(int(line)) 
        print(years)

# How to use csv files

# A 2-dimensional list with 3 rows and 2 columns

In [None]:
movies = [["Monty Python and the Holy Grail", 1975], ["Cat on a Hot Tin Roof", 1958], ["On the Waterfront", 1954]]

# How to import the csv module

In [None]:
import csv

# How to write the list to a CSV file

In [None]:
with open("movies.csv", "w", newline="") as file: 
    writer = csv.writer(file) 
    writer.writerows(movies)

# How to read the CSV file

In [None]:
with open("movies.csv", newline="") as file: 
    reader = csv.reader(file) 
    for row in reader: 
        print(row[0] + " (" + str(row[1]) + ")")

# Some optional arguments that can be used to change the CSV format

![image.png](attachment:image.png)

# How to work with a binary file

# Using the pickle module

Pickle is used for serializing and de-serializing Python object structures, also called
marshalling. Serialization refers to the process of converting an object in
memory to a byte stream that can be stored on disk or sent over a network. Later on, this
character stream can then be retrieved and de-serialized back to a Python object.

If you want to use data across different programming languages, pickle is not recommended.
Its protocol is speci􀁸c to Python, thus, cross-language compatibility is not guaranteed. The
same holds for different versions of Python itself. Unpickling a file that was pickled in a
different version of Python may not always work properly.

https://www.datacamp.com/community/tutorials/pickle-python-tutorial

# A 2-dimensional list with 3 rows and 2 columns

In [None]:
movies = [["Monty Python and the Holy Grail", 1975], 
          ["Cat on a Hot Tin Roof", 1958], 
          ["On the Waterfront", 1954]]

# How to import the pickle module

In [None]:
import pickle

# How to write an object to a binary file

In [None]:
with open("movies.bin", "wb") as file: # use wb mode for write binary 
    pickle.dump(movies, file)

In [None]:
with open("movies.csv", "wb") as file: # use wb mode for write binary 
    pickle.dump(movies, file)

# How to read an object from a binary file

In [None]:
with open("movies.bin", "rb") as file: # use rb mode for read binary 
    movie_list = pickle.load(file) 
    print(movie_list)

In [26]:
#Get access to the panda library from data on a web-site - the data is in a csv file
#
import pandas as pd

In [27]:
#define a string that contains the url so it can be read into a dataframe
#
poll_url = 'http://projects.fivethirtyeight.com/general-model/president_general_polls_2016.csv'
#

In [28]:
#create a dataframe to contain and display the data read in from the csv file
#
polls = pd.read_csv(poll_url)

In [29]:
# # It'll take a few seconds for the data to appear, get used to it as it's true for several other code examples
polls

Unnamed: 0,cycle,branch,type,matchup,forecastdate,state,startdate,enddate,pollster,grade,...,adjpoll_clinton,adjpoll_trump,adjpoll_johnson,adjpoll_mcmullin,multiversions,url,poll_id,question_id,createddate,timestamp
0,2016,President,polls-plus,Clinton vs. Trump vs. Johnson,11/8/16,U.S.,11/3/2016,11/6/2016,ABC News/Washington Post,A+,...,45.20163,41.72430,4.626221,,,https://www.washingtonpost.com/news/the-fix/wp...,48630,76192,11/7/16,09:35:33 8 Nov 2016
1,2016,President,polls-plus,Clinton vs. Trump vs. Johnson,11/8/16,U.S.,11/1/2016,11/7/2016,Google Consumer Surveys,B,...,43.34557,41.21439,5.175792,,,https://datastudio.google.com/u/0/#/org//repor...,48847,76443,11/7/16,09:35:33 8 Nov 2016
2,2016,President,polls-plus,Clinton vs. Trump vs. Johnson,11/8/16,U.S.,11/2/2016,11/6/2016,Ipsos,A-,...,42.02638,38.81620,6.844734,,,http://projects.fivethirtyeight.com/polls/2016...,48922,76636,11/8/16,09:35:33 8 Nov 2016
3,2016,President,polls-plus,Clinton vs. Trump vs. Johnson,11/8/16,U.S.,11/4/2016,11/7/2016,YouGov,B,...,45.65676,40.92004,6.069454,,,https://d25d2506sfb94s.cloudfront.net/cumulus_...,48687,76262,11/7/16,09:35:33 8 Nov 2016
4,2016,President,polls-plus,Clinton vs. Trump vs. Johnson,11/8/16,U.S.,11/3/2016,11/6/2016,Gravis Marketing,B-,...,46.84089,42.33184,3.726098,,,http://www.gravispolls.com/2016/11/final-natio...,48848,76444,11/7/16,09:35:33 8 Nov 2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12619,2016,President,polls-only,Clinton vs. Trump vs. Johnson,11/8/16,New Hampshire,7/9/2016,7/18/2016,University of New Hampshire,B+,...,40.24983,43.04717,6.924110,,,https://cola.unh.edu/sites/cola.unh.edu/files/...,44650,68189,7/21/16,09:14:14 8 Nov 2016
12620,2016,President,polls-only,Clinton vs. Trump vs. Johnson,11/8/16,Wisconsin,10/21/2016,11/2/2016,Ipsos,A-,...,46.54218,38.96884,,,,http://www.reuters.com/statesofthenation/,48259,75560,11/3/16,09:14:14 8 Nov 2016
12621,2016,President,polls-only,Clinton vs. Trump vs. Johnson,11/8/16,New York,8/7/2016,8/10/2016,Siena College,A,...,53.83622,32.47939,3.881193,,,https://www.siena.edu/assets/files/news/SNY081...,44852,68743,8/15/16,09:14:14 8 Nov 2016
12622,2016,President,polls-only,Clinton vs. Trump vs. Johnson,11/8/16,Virginia,9/30/2016,10/6/2016,Ipsos,A-,...,49.57558,39.96954,,,,http://www.reuters.com/statesofthenation/,46675,72969,10/10/16,09:14:14 8 Nov 2016


https://data.fs.usda.gov/geodata/edw/datasets.php

https://dev.socrata.com/foundry/data.cdc.gov/unsk-b7fc

https://data.fivethirtyeight.com/

Good places to look for external datasets
Google Dataset Search: https://toolbox.google.com/datasetsearch

Kaggle: https://www.kaggle.com/datasets

Registry of Open Data on AWS: https://registry.opendata.aws