### Credits:

<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

This notebook is created by Zhuo Chen based on the notebooks created by [Nathan Kelber](http://nkelber.com) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)<br />
For questions/comments/improvements, email zhuo.chen@ithaka.org or nathan.kelber@ithaka.org<br />

Reused and modified for internal use at Università Cattolica del Sacro Cuore di Milano, by Deborah Grbac, email deborah.grbac@unicatt.it and Valentina Schiariti, email valentina.schiariti-collaboratore@unicatt.it, released under CC BY License.

This repository is founded on **Constellate notebooks**. The original Jupyter notebooks repository was designed by the educators at **ITHAKA's Constellate project**. The project was sunset on July 1, 2025. This current repository uses and resuses Constellate notebooks as Open Educational Resources (OER), free for re-use under a Creative Commons CC BY License.
___


# Python Intermediate 2

**Description:** This notebook describes how to:
* Some of the files you can use in Python (.txt, .csv, .json)
* How to read files in Python
* How to write and modify files in Python

This is part 2 of 5 in the series *Python Intermediate* that will prepare you to do text analysis using the Python programming language. 

**Note**: Running this notebook locally will give you full control to test, modify, and save your work. We strongly recommend downloading it before you begin.

____


We will import some sample files from ithaka labs in order to perform this notebook's activities.

In [2]:
### Download Sample Files for this Lesson
import urllib.request
from pathlib import Path

# Check if a data folder exists. If not, create it.
data_folder = Path('./data/') #creates a data folder in the directory
data_folder.mkdir(exist_ok=True)

download_urls = [
    'https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/sample.csv',
    'https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/sample.txt',
    'https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/adaptation.txt'
]

for url in download_urls:
    urllib.request.urlretrieve(url, './data/' + url.rsplit('/', 1)[-1])
    
print('Sample files ready.')

Sample files ready.


## Files in Python
Working with files is an essential part of Python programming. When we execute code in Python, we manipulate data through the use of variables. When the program is closed, however, any data stored in those variables is erased. To save the information stored in variables, we must learn **how to write it to a file**.

At the same time, we may have notebooks for applying specific analyses, but we need to have a way to bring data into the notebook for analysis. Otherwise, we would have to type all the data into the program ourselves! Both reading-in from files and writing-out data to files are important skills for data science and the digital humanities.

This section describes how to work with three kinds of common data files in Python:
* **Plain Text Files (.txt)**
* **Comma-Separated Value files (.csv)**
* **Javascript Object Notation files (.json)**

Each of these filetypes are in wide use in data science, digital humanities, and general programming. 

## Three Common Data File Types

### Plain Text Files (.txt)
A plain text file is one of the simplest kinds of computer files. Plain text files can be opened with a text editor like Notepad (Windows 10) or TextEdit (OS X). The file can contain only **basic textual characters** such as letters, numbers, spaces, and line breaks. Plain text files do not contain styling such as heading sizes, italic, bold, or specialized fonts. (To including styling in a text file, writers may use other file formats such as rich text format (.rtf) or markdown (.md).)

Plain text files (.txt) can be easily viewed and modified by humans by changing the text within. This is an important distinction from binary files such as images (.jpg), archives (.gzip), audio (.wav), or video (.mp4). If a binary file is opened with a text editor, the content will be largely unreadable.

### Comma-Separated Value Files (.csv)
A comma-separated value file is also a text file that can easily be modifed with a text editor. A CSV file is generally used to **store data that fits in a series or table**(like a list or spreadsheet). A spreadsheet application (like Microsoft Excel or Google Sheets) will allow you to view and edit a CSV data in the form of a table.

Each row of a CSV file represents a single row of a table. The values in a CSV are separated by commas (with no space between), but other delimiters can be chosen such as a tab or pipe (|). A tab-separated value file is called a TSV file (.tsv). Using tabs or pipes may be preferable if the data being stored contains commas (since this could make it confusing whether a comma is part of a single entry or a delimiter between entries).

#### The text contents of a sample CSV file
```
Username,Login email,Identifier,First name,Last name
booker12,rachel@example.com,9012,Rachel,Booker
grey07,,2070,Laura,Grey
johnson81,,4081,Craig,Johnson
jenkins46,mary@example.com,9346,Mary,Jenkins
smith79,jamie@example.com,5079,Jamie,Smith
```
#### The same CSV file represented in Google Sheets:

![CSV table view in Google Sheets](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/csv_in_sheets.png)

### JavaScript Object Notation (.json)
A Javascript Object Notation file is also a text file that can be modified with a text editor. A JSON file **stores data in key/value pairs**, very similar to a Python dictionary. One of the key benefits of JSON is its compactness which makes it ideal for exchanging data between web browsers and servers.

While smaller JSON files can be opened in a text editor, larger files can be difficult to read. Viewing and editing JSON is easier in specialized editors, available online at sites like: 

* [JSON Formatter](http://jsonformatter.org)
* [JSON Editor Online](https://jsoneditoronline.org/)

A JSON file has a **nested structure**, where smaller concepts are grouped under larger ones. Like extensible markup language (.xml), a JSON file can be checked to determine that it is valid (follows the proper format for JSON) and/or well-formed (follows the proper format defined in a specialized example, called a schema). 

#### The text contents of a sample JSON file

```
{
    "firstName": "Julia",
    "lastName": "Smith",
    "gender": "woman",
    "age": 57,
    "address": {
        "streetAddress": "11434",
        "city": "Detroit",
        "state": "Mi",
        "postalCode": "48202"
    },
    "phoneNumbers": [
        { "type": "home", "number": "7383627627" }
    ]
}
```

We see how keys such as *address* and *phone numbers* have multiple values attached to them.

#### The same JSON file represented in JSON Editor Online
![An image of the JSON file showing the structure](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/json_editor.png)


## Opening, Reading, and Writing Text Files (.txt) 

Before we can read or write to a text file, we must open the file. Normally, when we open a file, a new window appears where we can see the file contents. In Python, opening a file means creating a *file object* that references the particular file. When the file has been opened, we can read or write to the file using the file object. Finally, we must close the file. Here are the three steps:

1. Open the file using `with open()` using read `r`, write `w`, read+write `r+` or append `a` mode
2. Use the `.read()` or `.write()` method on the file object
3. Let the code block close which automatically closes the file

Let's practice on `sample.txt`, a sample text file.

In [1]:
# Opening a file in read mode
with open('./data/sample.txt', 'r') as f: #the path and then read mode
    print(f.read())

A text file can have many words in it
These words are written on the second line
Third line
Fourth line
Fifth line
Sixth line
Seventh line
Eighth line
Ninth line
Tenth line. This is the end of the text file!


We have created a file object called `f`. The first argument (`'data/sample.txt'`) passed into the `open()` function is a string containing the file name. You can see the sample.txt is in the data directory in the file browser to the left. If your file was called reports.txt, you would replace that argument with `'data/reports.txt'`. The second argument (`'r'`) determines that we are opening the file in "read" mode, where the file can be read but not modified. There are four main modes that can be specified when creating a file object:

|Argument|Mode Name|Description|
|---|---|---|
|'r'|read|Reads file but no writing allowed (protects file from modification)|
|'w'|write|Writes to file, overwriting any information in the file (saves over the current version of the file)|
|'r+'|read+write|Makes the file object available for both reading and writing.|
|'a'|append|Appends (or adds) information to the end of the file (new information is added below old information)|

### Reading Very Large Text Files (Line by Line)

If the file is very large, we may want to read it one line at a time. To read a single line at a time, we can use a for loop instead of the `.read()` method. This is useful when you are working with large files that approach the amount of memory in your computer. This depends on your computer, but is probably about 8 GB or larger. (In a plain text file, that would be roughly 5.5 million pages.)

In [4]:
# Opening a file in read mode
with open('./data/sample.txt', 'r') as f: #the open() has 2 arguments: the first one is the data path; the second is the mode that we use. The f indicated that it is a file object
    for line in f:
        print(line, end='') 

A text file can have many words in it
These words are written on the second line
Third line
Fourth line
Fifth line
Sixth line
Seventh line
Eighth line
Ninth line
Tenth line. This is the end of the text file!

In the data folder is a poem called Adaptation by the 2022 American Poet Laureate, Ada Limón. The file is called `adaptation.txt`. Open the file and print it out line by line.

For a more difficult challenge, print the title and author before the poem along with a number for each line.

In [2]:
# Read and print the poem found in /data/adaptation.txt
with open('./data/adaptation.txt', 'r') as f:
    i=1
    for line in f:
        print(f"{i} {line}", end="")
        i=i+1


1 It was, for a time, a loud twittering flight
2 of psychedelic-colored canaries: a cloud
3 of startle and get-out in the ornamental
4 irons of the rib cage. Nights when the moon
5 was wide like the great eye of a universal
6 beast coming close for a kill, it was a cave
7 of bitten bones and snake skins, eggshell dust,
8 and charred scraps of a frozen-over flame.
9 All the things it has been: kitchen knife
10 and the ancient carpâ€™s frown, cavern of rust
11 and worms in the airless tire swing,
12 cactus barb, cut-down tree, dead cat
13 in the plastic crate. Still, how the great middle
14 ticker marched on, and from all its four chambers
15 to all its forgiveness, unlocked the sternumâ€™s
16 door, reversed and reshaped until it was a new
17 bright carnal species, more accustomed to grief,
18 and ecstatic at the sight of you.


In [35]:
# Print the title and author before the poem along with a number for each line
with open('./data/adaptation.txt', 'r') as f:
    i=1
    print("Adaptation")
    print("Ada Limón\n") # the \n adds a blank line after
    for line in f:
        print(f"{i} {line}", end="")
        i=i+1

Adaptation
Ada Limón

1 It was, for a time, a loud twittering flight
2 of psychedelic-colored canaries: a cloud
3 of startle and get-out in the ornamental
4 irons of the rib cage. Nights when the moon
5 was wide like the great eye of a universal
6 beast coming close for a kill, it was a cave
7 of bitten bones and snake skins, eggshell dust,
8 and charred scraps of a frozen-over flame.
9 All the things it has been: kitchen knife
10 and the ancient carp’s frown, cavern of rust
11 and worms in the airless tire swing,
12 cactus barb, cut-down tree, dead cat
13 in the plastic crate. Still, how the great middle
14 ticker marched on, and from all its four chambers
15 to all its forgiveness, unlocked the sternum’s
16 door, reversed and reshaped until it was a new
17 bright carnal species, more accustomed to grief,
18 and ecstatic at the sight of you.


###  Chunking file reading with `.read()` 

We can also read large files in chunks using `.read()` passing an argument that specifies the number of bytes to read in each chunk.

In [2]:
# Reading a set number of bytes with .read()

with open('./data/sample.txt', 'r') as f:
    while True: 
        file_chunk = f.read(6) # Try changing the argument here
        if len(file_chunk) > 0:
            print(file_chunk)
        else: 
            break #you should always have a break with a while true loop

A text
 file 
can ha
ve man
y word
s in i
t
Thes
e word
s are 
writte
n on t
he sec
ond li
ne
Thi
rd lin
e
Four
th lin
e
Fift
h line

Sixth
 line

Sevent
h line

Eight
h line

Ninth
 line

Tenth 
line. 
This i
s the 
end of
 the t
ext fi
le!


### Writing to and Creating Files with `.write()`
The `.write()` method can be used with a file opened in write mode ('w'), read+write mode ('r+), or append mode ('a'). If you try to use the `.write()` method on a file opened in read mode ('r'), the write will fail and you will get a "not writable" error.

In [5]:
# Trying to use `.write()` on a file in read mode
# will create an error

with open('./data/sample.txt', 'r') as f:
    f.write('Add this text to the file') #this generates an error cause we decided to open the data as "r" read mode

UnsupportedOperation: not writable

If you want to use the `.write()` method to write to a file, we demonstrate three common modes here: write ('w'), read+write ('r+') and append ('a') mode. 

**Choose write mode ('w') if:**
* You want to create a new text file and/or write data to it
* You want to overwrite all data in the file

Be careful with write mode, since it will write over any existing file!

**Choose read+write mode ('r+') if:**
* You need to both read and write data to a file
* You are okay overwriting the file if it already exists
* You want the file pointer at the beginning of the file

Again, be careful with read+write mode ('r+') because it can overwrite existing data.

**Choose append mode ('a') if:**
* You simply need to add more data to a file
* You want to protect existing data in the file from being overwritten

There are also additional modes for working with a text file:

|Mode|Effect|
|---|---|
| w | Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing. |
| r+ | Opens a file for both reading and writing. The file pointer will be at the beginning of the file. |
| a | Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing. |
| w+ | Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, it creates a new file for reading and writing. |
| a+ | Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing. |

#### Create a new file using `.write()` method and write mode ('w')
To create a new file, open a new file object in write mode. You may then use the `.write()` method to add some initial data to the file.

In [3]:
# Create a new file and write a string to the file

with open('./data/sample_new.txt', 'w') as f: #this is creating a new file (because the file name doesn't exist yet) and opening it in write mode
    f.write('Put this string into the new file. ')

#### Modifying an existing file with using `.write()` method and read+write mode ('r+') 
You can read and write to the same file by opening it in read+write mode ('r+'). 

In [6]:
# Reading and writing to a file using read+write mode
with open('./data/sample_new.txt', 'r+') as f: #this is overwriting the file (starting from the beginning)
    f.write('Add')

In [7]:
# Writing to a file using write mode
with open('./data/sample_new.txt', "w") as f:
    f.write("Cut")

The difference between "r+" and "w" is that the "w" mode will overwrite everything with the string in the `.write( )` method; the "r+" will keep most of the original code except the part that is overwriting, starting from the beggining of the file

#### Appending to an existing file using `.write()` method and append mode ('a')

We might want to use the append mode when we want to be sure to preserve existing data in the file while adding something new: 

In [12]:
# Appending to a file using the append mode ('a')
with open('./data/sample_new.txt', 'a') as f:
    f.write("I want another string later too")


## Opening, Reading, and Writing CSV Files (.csv)
CSV file data can be easily opened, read, and written using the `pandas` library. (For large CSV files (>500 mb), you may wish to use the `csv` library to read in a single row at a time to reduce the memory footprint.) Pandas is flexible for working with tabular data, and the process for importing and exporting to CSV is simple.

In [None]:
!pip install pandas

In [16]:
# Import pandas 
import pandas as pd #import as pd gives it a shorter alias "pd".

# Create our dataframe
df = pd.read_csv('./data/sample.csv')

In [14]:
# Display the dataframe
print(df)

    Username         Login email  Identifier First name Last name
0   booker12  rachel@example.com        9012     Rachel    Booker
1     grey07                 NaN        2070      Laura      Grey
2  johnson81                 NaN        4081      Craig   Johnson
3  jenkins46    mary@example.com        9346       Mary   Jenkins
4    smith79   jamie@example.com        5079      Jamie     Smith


After you've made any necessary changes in Pandas, write the dataframe back to the CSV file. (Remember to always back up your data before writing over the file.)

In [15]:
# Write data to new file
# Keeping the Header but removing the index
df.to_csv('./data/new_sample.csv', header=True, index=False) # header=True means that we want to keep the same header, index=False means that we don't want the index number

## Opening, Reading, and Writing JSON Files (.json)

JSON files use a key/value structure very similar to a Python dictionary. We will start with a Python dictionary called `py_dict` and then write the data to a JSON file using the `json` library.

In [20]:
# Defining sample data in a Python dictionary
py_dict = {
    "firstName": "Julia",
    "lastName": "Smith",
    "gender": "woman",
    "age": 57,
    "address": {
        "streetAddress": "11434",
        "city": "Detroit",
        "state": "Mi",
        "postalCode": "48202"
    },
    "phoneNumber": "3133627627"
}

To write our dictionary to a JSON file, we will use the `with open` technique we learned that automatically closes file objects. We also need the `json` library to help dump our dictionary data into the file object. The `json.dump` function works a little differently than the write method we saw with text files. 

We need to specify two arguments: 

* The data to be dumped
* The file object where we are dumping

In [21]:
# Open/create sample.json in write mode
# as the file object `f`. The data in py_dict
# is dumped into `f` and then `f` is closed

import json
with open('./data/sample.json', 'w') as f:
    json.dump(py_dict, f) #you dump = write

To read data in from a JSON file, we can use the `json.load` function on our file object. Here we load all the content into a variable called `content`. We can then print values based on particular keys.

In [22]:
# Open the .json file in read mode
# and print the loaded contents

with open('./data/sample.json', 'r') as f:
    print(json.load(f)) #this is a dictionary

{'firstName': 'Julia', 'lastName': 'Smith', 'gender': 'woman', 'age': 57, 'address': {'streetAddress': '11434', 'city': 'Detroit', 'state': 'Mi', 'postalCode': '48202'}, 'phoneNumber': '3133627627'}


In [23]:
# Load the current data from the file into a dictionary
# and add an entry to the dictionary

with open('./data/sample.json', 'r') as f:
    file_contents = json.load(f)
    file_contents['pet'] = 'dog' #this is the same as adding an element to a dictionary

# Write the contents of the dictionary over the existing data 
    
with open('./data/sample.json', 'w') as f:  
    json.dump(file_contents, f) #this dictionary will overwrite the previous values

The `.load()` method creates a Python dictionary from the .json file object. This means that we can query a particular key/value pair after loading the data.

In [36]:
# Load the data from the json file into a dictionary
# Check to see if there is value associated with pet

with open('./data/sample.json', 'r') as f:
    json_contents = json.load(f)
    print(json_contents.get('pet'))

dog


___
## Lesson Complete