# File Input/Output (File IO) in Python

This notebook is an overview of how to read in data from CSV, JSON, and .txt files, and how to write data collected from a program to similar files. The demo files to read in are provided in this repository as "practice_open" with all of the file types covered in the notebook. If cloning this repository, the files should be kept in the same folder as this notebook when running.

The demo JSON file was downloaded from Citrination Dataset #156839 (https://citrination.com/datasets/156839/show_search?searchMatchOption=fuzzyMatch)

Last updated on 7/4/2018 by Vanessa Nilsen (vmeschke@wisc.edu)

## Objects Created by Reading in Files

For each of the examples below, the demo files being used are being opened with a method that will create objects of some kind. If you're unfamiliar with object oriented programming (OOP), you can watch a quick overview on objects & classes here: https://www.youtube.com/watch?v=K8eOkzQ_o9w

Depending on the specific method being used to extract the data from the file, the object type being returned will be a bit different. For example, when reading in data from our txt file below, the method we will use will create a Python File object. However, when we read in the CSV using pandas below the txt example, the object type we will have there will be a pandas dataframe object.

If the idea of objects is still a bit fuzzy, don't worry. The main goal of this notebook is not to explain OOP in its entirety, but just to help explain what the code below is doing. The most important piece of information to note is that specifying a file path is not the same thing as creating a file object. File paths are just strings indicating where a particular file is stored and don't have the same type of operations available to them as file objects. A more concrete example of the difference between file paths (strings) and file objects will be shown in the .txt read in example.

## Reading in Data

External files can be read into a program using a variety of different commands depending on the type of file being openend and the type of operations to be performed on the data later in the program. This notebook will run through opening .txt files, CSV files, and JSON files. For each file being worked with, we'll open the file in some way that creates a file object. You technically don't need to import any modules to perform operations on file objects in Python, but we'll be using the pandas module with CSV's, which can be installed by running the commands 'pip install pandas' if you don't already have it installed.

#### .txt Files

The first file type we'll practice reading in is a simple .txt file. The file below has a few lines of text in it that we can manipulate or use depending on the program you're running. Before we do anything with the data inside the file, though, we need to read it in. In the code block below, txt_file_path is a string specifying the path to our .txt file, and txt_file is a TextIOWrapper object. One of the largest differences between these two different variables is that we'll be able to read information from the TextIOWrapper object, not the string. 

In [10]:
# String specifying the path to the txt file we want to read in
txt_file_path = "practice_open.txt"
# TextIOWrapper object that can be used to read in data from the file
txt_file = open(txt_file_path)

# Check the type of each of the variables above
print("txt_file_path is a %s"%type(txt_file_path))
print("txt_file is a %s\n"%type(txt_file))

# Print the first line of the file
print(txt_file.readline())

txt_file_path is a <class 'str'>
txt_file is a <class '_io.TextIOWrapper'>

Hello, world!



However, if we'd like to read more than one line of the file at a time, we can also load the whole file at one time, such as with the readlines() method, shown below. The content variable will hold every line in the .txt file, and we'll print the lines out one by one with a for loop.

In [11]:
# String specifying the path to the txt file we want to read in
txt_file_path = "practice_open.txt"
# TextIOWrapper object that can be used to read in data from the file
with open(txt_file_path) as new_txt_file:
    content = new_txt_file.readlines()

# Print out every line in the file
for line in content:
    print(line)

Hello, world!



This is our example of reading in a txt file.

We can read in a file line by line.

Or we can read the whole thing in one go.


#### CSV Files

Next, we'll work with reading in CSV files. CSV files can be read in using the same readlines() method as the txt file was read in above, which is shown in the cell block below.

In [4]:
# Specify the path to the CSV with a string
csv_file_path = "practice_open.csv"

# Open the CSV file and read it in. The second argument in open() indicates what mode we'd like to open the file we passed
# as the first argument. Here, we use 'r' to indicate we'd like to read the file. Other modes for opening files include rb
# (read bytes), w (write), and wb (write bytes)
with open(csv_file_path, 'r') as csv_file:
    csv_content = csv_file.readlines()

# Print each line of the csv
for line in csv_content:
    print(line)

Name,Phone Number,Birthday

Jack,555-9080,June 11

Dash,555-7761,February 28

Violet,555-8126,August 12

Robert,555-3377,November 19

Helen,555-3846,September 14



As you can see above, this read in the entire CSV file line by line, which was then printed to the console. However, that doesn't help us store the data very well. Say, for example, we'd like to store each column of the csv to its own variable. One way to do that is with the csv module, which we'll need to import as the first line in this block.

In [14]:
import csv

# Specify the path to the CSV with a string
csv_file_path = "practice_open.csv"

# Open the CSV file in read mode
with open(csv_file_path, 'r') as csv_file:
    # Create a reader for the csv
    reader = csv.reader(csv_file)
    # Read in the names column from the CSV. The zip function helps create a single list from the output of the csv reader
    # and we specify to read in column 0 and zip to 1 list. 
    names = list(zip(*reader))[0]
    # Print each 
    for n in names:
        print(n)

Name
Jack
Dash
Violet
Robert
Helen


The code in the above works, but it takes quite a few lines to read in each column of the csv. There's an additional method we can use to read in csv data known as pandas. Reading in the name column of the csv using pandas is shown below.

In [19]:
import pandas as pd

csv_file_path = "practice_open.csv"

# Read in the whole csv as a pandas dataframe
csv_df = pd.read_csv(csv_file_path)

# Access the names column of the dataframe
names = csv_df['Name']

# Print each name
for n in names:
    print(n)

Jack
Dash
Violet
Robert
Helen


#### JSON Files

In [5]:
import json

with open("practice_open.json", "r") as json_demo:
    json.load(json_demo)

## Writing Data

#### .txt Files

#### CSV Files

#### JSON Files