# Files, Paths, and OS

Through Python, we are able to load in, make changes to, and/or save files to our local system.

Every file has:
1. A name (e.g., `06-files.ipynb`) that ends with an extension (e.g., `.ipynb`) describing the content
2. A location or path (i.e., the directory, or drive and folders, in which you file resides)
3. (Possible empty) content storred in 8-bit bytes

In [None]:
import os # built-in package to using operating system dependent functionally, such as accessing files

## Getting to and modifying paths

Get the current directory

In [None]:
current_directory = os.getcwd() # Get the path/directory in which this file resides
print(current_directory)

Add folders or a file to a path

In [None]:
path_to_this_notebook = os.path.join(os.getcwd(), '06-files.ipynb') # add to a path with os.path
print(path_to_this_notebook)

In [None]:
path_to_data_folder = os.path.join(os.getcwd(), '..', '..', 'data') # each '..' says to go back one folder
print(path_to_data_folder) # still has the '..'
print(os.path.realpath(path_to_data_folder)) # view the actual path with os.path.realpath

## Read in a (text) file

Depending on the type of file there are different ways to read in a files content. For text or text-based files we will use `open`

Go to the *data* folder in this repository and open the *lipsum.txt* text file. You should see some general information on lipsum text. Let us bring that text into Python.

In [None]:
path_to_data_folder = os.path.join(os.getcwd(), '..', '..', 'data')
path_to_txt_file = os.path.join(path_to_data_folder, 'lipsum.txt')

print(os.path.realpath(path_to_txt_file))

In [None]:
with open(path_to_txt_file, 'r') as f: # the input 'r' denotes read
	content = f.read()

In [None]:
print(content) # the text from lipsum.txt

In [None]:
with open(path_to_txt_file) as f: # Note the input 'r' above was optional, if omitted the file, by default, is opened in read mode
	content = f.read()

print(content)

Python reads in text files as string objects. So we can work with them just like any other string. For example, lets make the following changes
1. Added another sentence to the text
2. Replace all occurances of *is* with *is not*

In [None]:
new_sentence = 'Here is another sentence we want to add to the text document.'
modified_content = content + '\n\n' + new_sentence # we can combine strings with the + operator. Remember each \n adds a line break to a string
print(modified_content)

In [None]:
modified_content = f'{content}\n\n{new_sentence}' # We can also combine strings by putting each into a single formatted string (this modification is equivallent to the above)
print(modified_content)

In [None]:
modified_content = modified_content.replace('is', 'is not') # strings have a replace method that will change all occurances of the first input to the second input
print(modified_content)

## Saving a (text) file

Similar to reading in a file, there are different ways to save a filed depending on the type of file. For text or text-based files we still use `open`

Lets save the changes we made to the *lipsum.txt* text that we read in

In [None]:
with open(path_to_txt_file, 'w') as f: # the input 'w' denotes write
	f.write(modified_content)

Open the *lipsum.txt* text file again to see the new modified text

We can change the specific file path to save our modified text in another document instead

In [None]:
path_to_data_folder = os.path.join(os.getcwd(), '..', '..', 'data')
path_to_new_txt_file = os.path.join(path_to_data_folder, 'lipsum2.txt')

with open(path_to_new_txt_file, 'w') as f: # the input 'w' denotes write
	f.write(modified_content)

A new file *lipsum2.txt* should have been created. If you open it we can view the new modified text

We can also append new text to a document

In [None]:
with open(path_to_new_txt_file, 'a') as f: # the input 'a' denotes append
	f.write(content)

*lipsum2.txt* should now have text from the `modified_content` string followed by the text from the `content` string

## Reading in a CSV

Python does have a built-in module `csv` to read in data from a CSV file. However, I strongly recommend using the `pandas` package instead

Check out the [into pandas tutorial notebook](../../pandas-tutorial/1-pandas-beginnings.ipynb) for an introduction to some pandas basics

In [None]:
import pandas as pd

In [None]:
path_to_data_folder = os.path.join(os.getcwd(), '..', '..', 'data')
path_to_csv = os.path.join(path_to_data_folder, 'tsa_passenger_throughput.csv')
print(os.path.realpath(path_to_csv))

In [None]:
tsa_data = pd.read_csv(path_to_csv) # read in the data

In [None]:
tsa_data # view a snapshot of the data

We now have the data, storred as a `pandas.DataFrame`, within Python where we can perform any analysis or modification necessary

Lets only keep the data where the 2021 traveler throughoutput is above 1 million

In [None]:
mask = tsa_data['2021 Traveler Throughput']>=1000000 # creates a boolean mask where each row element in the '2021 Traveler Throughput' column is checked against the condition >= 1000000

In [None]:
tsa_data_2021 = tsa_data[mask] # filter out the data

In [None]:
tsa_data_2021

## Saving a CSV

Lets save the filtered data from above

In [None]:
path_to_data_folder = os.path.join(os.getcwd(), '..', '..', 'data')
path_to_new_csv = os.path.join(path_to_data_folder, 'tsa_passenger_1mil_throughput.csv')
print(os.path.realpath(path_to_csv))

In [None]:
tsa_data_2021.to_csv(path_to_new_csv, index=False)

In the *data* folder you should now have a new csv file *tsa_passenger_1mil_throughput.csv*

## `Pickle`: Saving and reading in other Python objects

Python has a built-in module named `pickle` that allows for saving, and subsequently reading in, almost any object in Python including an executible Python script itself!

We will again use the `open` function from earlier in a very similar way

In [None]:
import pickle

In [None]:
example_dict = {
  "name": "Alice",
  "age": 20,
  "major": "Computer Science",
  "courses": ["Calculus", "Data Structures", "Algorithms"]
}

In [None]:
print(example_dict)

Save the dictionary

In [None]:
path_to_save = os.path.join(os.getcwd(), 'example_dictionary.pkl') # pickle files have the .pkl extension
print(path_to_save)

In [None]:
with open(path_to_save, 'wb') as f: # 'wb' is short for write binary. The b is necessary to write to a pickle file
	pickle.dump(example_dict, f)

Read in the dictionary

In [None]:
with open(path_to_save, 'rb') as f: # 'rb' is short for read binary. The b is necessary to read a pickle file
	loaded_dictionary = pickle.load(f)

In [None]:
print(loaded_dictionary)

A downside of pickle files is that they storred as a binary file and are not human readable. If you open the *example_dictionary.pkl* file you will see a jumbled mess of text and characters

# Exercises

1. Write a function named *read_text_file* that takes one input, the path to a text file, reads in the text from this file, and returns the content.

2. Use your function from Exercise 1 to read in the SQL text in the file *example_sql_select.sql*. Print out the sql statement.

3. Change the `where` clause in the read in SQL string to check where price is greater than $50

4. Save the modified SQL text into a file named *modified_example_sql_select.sql*