# Reading from files

You have seen how to read from a file with `pandas.read_csv` before, but python has a more low-level interface to read from any file.

Let’s write to a file

In [8]:
f = open('newfile.txt', 'w')   # Open 'newfile.txt' for writing
f.write('Concordia\n')           # '\n' adds a new line
f.write('Bootcamps!')
f.close()                      # Close the file 

Here

- The built-in function `open()` creates a file object for writing to.  
- Both `write()` and `close()` are methods of file objects.  

Where is the file that we’ve created?

Like the terminal shell, your running python instance has a concept of the pwd (present working directory).

We can access it with the jupyter `!` shell commands:

In [9]:
!pwd

/Users/mranger/Documents/Projects/Drive Syncs/Module 1 Programming/1.6 Python Intro pt.2


However, we can also get it from python directly:

In [5]:
import os

# equivalent to ls
os.listdir()
# equivalent to pwd
os.getcwd()

'/Users/mranger/Documents/Projects/Drive Syncs/Module 1 Programming/1.6 Python Intro pt.2'

If a path is not specified, then this is where Python writes to.

Normally this is next to where the notebook file is.

We can specify a specific path to write the file to by putting it in front of the file:

In [10]:
# f = open('/Users/mranger/Documents/Projects/Drive Syncs/Module 1 Programming/1.6 Python Intro pt.2/newfile.txt', 'w')
# f.close() 

We can also use Python to read the contents of `newline.txt` as follows:

In [11]:
f = open('newfile.txt', 'r')
out = f.read()
out

'Concordia\nBootcamps!'

In [12]:
print(out) # Notice the \n being read as a line now

Concordia
Bootcamps!


In [13]:
# "with block" automatically closes th block when exiting
with open('cities.csv', 'w') as f:
    f.write(
    """city, population
    new york, 8244910
    los angeles, 3819702
    chicago, 2707120
    houston, 2145146
    philadelphia, 1536471
    phoenix, 1469471
    san antonio, 1359758
    san diego, 1326179
    dallas, 1223229""")

Note the `"""string"""` syntax here that lets you write multi-line strings

In [7]:
import pandas as pd
pd.read_csv("cities.csv")

Unnamed: 0,new york,8244910
0,los angeles,3819702
1,chicago,2707120
2,houston,2145146
3,philadelphia,1536471
4,phoenix,1469471
5,san antonio,1359758
6,san diego,1326179
7,dallas,1223229


We already know that we can read these files with pandas:

In [15]:
import pandas as pd

# We could read this file with whatever extension
# as long as it's organized like a csv file
pd.read_csv('cities.csv')

Unnamed: 0,city,population
0,new york,8244910
1,los angeles,3819702
2,chicago,2707120
3,houston,2145146
4,philadelphia,1536471
5,phoenix,1469471
6,san antonio,1359758
7,san diego,1326179
8,dallas,1223229


# File types

Common operating systems use file extensions to tell programs how the file is organized. Here are a few common ones:

- `.txt` which are arbitrary text files. You can open them in text editing programs (VS Code, Sublime Text, Notepad, Micro, Vim, etc.)

- Some files are text files but have extensions to hint about how they're organized. For instance, a `.py` file is a text file which we hint contains python code.

- Files with extensions like `.exe` (in Windows) or `.bin` are **binary** files -- they're encoded directly as 1's and 0's for the operating system to read. (Try opening a `.exe` file in sublime text to see this)

### CSV files

A CSV file is a common kind of file used for data which is organized by **records** (one per line) with **fields** (separated by commas). Let's write one such file:ferent Objects

Many Python objects are “iterable”, in the sense that they can be looped over.

To give an example, let’s write the file us_cities.txt, which lists US cities and their population, to the present working directory.


In [17]:
data_file = open('cities.csv', 'r')

cities = []

lines = data_file.readlines()

for line in lines:
    fields = line.split(',')
    cities.append(fields)
    
data_file.close()

cities

[['city', ' population\n'],
 ['new york', ' 8244910\n'],
 ['los angeles', ' 3819702\n'],
 ['chicago', ' 2707120\n'],
 ['houston', ' 2145146\n'],
 ['philadelphia', ' 1536471\n'],
 ['phoenix', ' 1469471\n'],
 ['san antonio', ' 1359758\n'],
 ['san diego', ' 1326179\n'],
 ['dallas', ' 1223229']]

Note that the file **header** (first line naming the columns) is read as well as all the `\n` characters

In [24]:
### Exercise: 
### Clean them out so that we read into a dict city -> pop (int)

data_file = open('cities.csv', 'r')

cities = []
lines = data_file.readlines()

for line in lines[1:]: # Skip first line because it's headers
    fields = line.replace('\n', '').split(',') #remove the \n to clean up and easier to read
    fields[1] = int(fields[1])
    cities.append(fields)
data_file.close()

cities = cities[1:]

dict(cities)

{'los angeles': 3819702,
 'chicago': 2707120,
 'houston': 2145146,
 'philadelphia': 1536471,
 'phoenix': 1469471,
 'san antonio': 1359758,
 'san diego': 1326179,
 'dallas': 1223229}

# Exceptions

Most of the time when a Python script fails, it will raise an Exception. 

In [25]:
5 / 0

ZeroDivisionError: division by zero

When the interpreter hits one of these exceptions, information about the cause of the error can be found in the traceback, which can be accessed from within Python.

In [26]:
def f2(x):
    return 5 / x

def f1(x):
    return f2(x)

f1(0)

ZeroDivisionError: division by zero

The trace back follows the **call stack** is bottom up showing the functions called all the way from the top level code to the error.

We can **catch** exceptions to handle them instead of letting the program fail with a `try/catch` statement:

In [33]:
try:
    f1(0)
except Exception as e:
    print("-----ERROR: ", end=" ")
    print(str(e))

-----ERROR:  division by zero


We can also catch specific types of errors. Catching all errors is generally considered bad because it makes it impossible for the program to crash/stop normally.

In [34]:
try:
    z = f1(0)
except ZeroDivisionError as ze:
    z = 0

print(z)

0


We can also `raise` exceptions ourselves:

In [28]:
raise ValueError

ValueError: 

Which is useful to handle bad data:

In [35]:
def divide_safe(x):
    try:
        return 5 / x
    except ZeroDivisionError:
        return 0

# Writing Python in a .py program

We can write a program directly in a `.py` file and run it using `python my_program.py` in the terminal.

Here we can define `if __name__ == '__main__':` as the block defining the **entry point**

And we can treat the `.py` file as text or as a python module

In [6]:
# "with block" automatically closes th block when exiting
with open('test.py', 'w') as f:
    f.write(
"""
import numpy as np

def double_square(x):
    return np.square(np.square(x))
""")

We can treat it as text:

In [12]:
with open('test.py', 'r') as f:
    print(f.read())


import numpy as np

def double_square(x):
    return np.square(np.square(x))



But we can also `import`  it as a python library!

In [9]:
import test

test.double_square(5)

625