In [None]:
from IPython.core.display import HTML

def css_styling():
    styles = open("../Data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

# Synopsis

In this unit we will earn an important skill, how to read and write files. In order to do that, we will cover:

1. How files are organized on a computer and how to view/find file locations with terminal commands from the notebook
2. How to open a file and read it
3. How to open a file and write text to it. 

# Reading and writing files

Typing strings and numbers into an IPython Notebook are great ways to learn basics,
but sooner or later you will have to learn how to read data from a file, perform some analysis on that data and ideally save the analysis. 

But first we need to go over the basics of the filesystem so you know where and how things are located

# Filesystems

A filesystem is presented to you in a folder structure like this

<img src='../Data/www/images/osx_finder.png'></img>

Where each folder is inside another folder and this keeps going all the way up. In fact this continues until we reach the root of the hard drive.

One way to visualize this differently is to think of the file system as a tree. In Linux/OS X the tree generically looks like this:

<img src='../Data/www/images/linux_fs_tree.png' width='600px'></img>

On Windows it's very similar, it's just the root is `C:` instead of `/`

<img src='../Data/www/images/windows_fs_tree.png' width='400px'></img>

When we want to see what is inside a folder we can use the `ls` command (it stands for `list`). `ls` is **not** a Python command (it comes from the terminal) but Jupyter notebook allows us to use this command natively in any code cell **as long as there is no Python code in the cell**.

If we wanted to see all of the files in this directory we can just use the following command:

In [None]:
ls *

`ls` means `list` and the `*` is what we call a `wildcard`. The `*` wildcard means match to everything. 

We can use it with some text to restrict what we display though. If we only want to see Jupyter notebooks we can do

In [None]:
ls *ipynb

If we want to see where we currently are in the tree, we can use the `pwd` command (shorthand for `p`resent `w`orking `d`irectory)

In [None]:
pwd

## But now how do we move out of our current folder?

There are two ways to access a path: (i) absolute and (ii) relative.

**Absolute paths** start from the *root* of the tree that we showed. On OS X or Windows that means the path will start with `/` or `C:\`. We just string together the folder names with the path separator to get to our current path.

**Note: I have this written for OS X, if you are using Windows change the `/` to `\`

We don't always have to use the root though, the `~` symbol stands for our user directory and we can start paths from there.

In [None]:
ls /

In [None]:
ls ~/

In [None]:
ls ~/Desktop

**Relative paths** start from where you **currently** are. 

The symbol for your **current** directory is `.`

The symbol for the **parent** directory (the folder above you) is `..`

In [None]:
ls .

In [None]:
ls ../

Here are alternate ways of looking at one path:

<img src='../Data/www/images/DirTree.png'></img>

**Absolute Path:**

'/home/jono'



**Relative Paths 1:** Getting to *jono* from...            

                    /home/jono/                   ./

                    /usr/lib/                     ~/jono/

                    /home/cory/                   ../jono/

                    /home/jono/work/              ../
                    
                    
**Relative Paths 2:** Sometimes there are multiple ways of getting to *jono* ...

                    /usr/lib/                     ~/jono
                    
                    /usr/lib/                     ../../home/jono

Here is the structure of the `NICO101/` folder that we downloaded.

<img src='../Data/www/images/bootcamp_structure.svg'></img>

For this session we're going to use files that are in the `Data/Day2-Collections-and-Files` folder and right now we are using the `Day2_am2_File-IO.ipynb` notebook. 

Show me what is in the `Data/` folder

In [None]:
ls ../Data/Day2-Collections-and-Files/Roster/*

# Now for reading files

Inside the `Data/Day2-Collections-and-Files/` folder we have another folder labelled `Roster/`. The `Roster` file is full of lots of small `.txt` files (just raw text). Each file looks something like this:

    #This is a file that holds important personal information that should not be shared. You are being watched.

    Name:	Agatha A. Bailey
    Date of Birth:	1/10/75
    Email Address:	agatha.bailey@northwestern.edu
    Department:	Engineering
    Height:	6ft,0in
    Weight:	220lbs
    Favorite Color:	Lime
    Favorite Animal:	Turtle
    Zodiac Sign:	January
    
You have all of these files because you just got a new job as an administrator in a department at Northwestern University. Congratulations!

Since you're the new administrator you want to calculate some basic properties of the student body population.

When we work with **any** new data the first step is to **look** at it. Print parts of it. Make sure that you're familiar with all the data types before thinking about doing any real calculations with it.

Now-how are we going to process this information? Especially because everything will be coming in as a string???

# Thinking algorithmically

Human brains are great at reducing the complexity of problems so that the answers seem obvious. 
If I tell you my birthday and ask you to tell me how old I am, most of you can give me an answer in
almost no time, but making your thought process explicit can be difficult. 

To do any programming 
or data analysis, perhaps the most important thing that you need to learn is how to break down a problem (that might seem really simple for you to do in your head) into tiny little steps such that 
you can teach a computer how to do it.

Let's start with an exercise, how old am I?

In [None]:
### Here is my information
birth_month = 2
birth_day = 25
birth_year = 1984

current_month = 9
current_day = 9
current_year = 2015

In [None]:
##Place your code here to calculate the birthday

##How old am I?
# print(age)

So now we see how have to break down all of these problems.

Let's move onto actually reading a file.

In [None]:
myFile = open('../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt', 'r')
myFile

In [None]:
myFile = open('../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt', 'r')
Agatha = myFile.read()
Agatha

In [None]:
myFile = open('../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt', 'r')
Agatha = myFile.readlines()
print(Agatha)

In [None]:
print( type(Agatha) )
print( len(Agatha) )
print( type(Agatha[0]) )
print( len(Agatha[0]) )
print( Agatha[0] )

# That's it! 

# You read a file and it is now a data type we understand!

But let's see if you can tell me how old Agatha is, first we'll need to go from a line in that file to the variables that we used above to calculate someone's age

In [None]:
for line in Agatha:
    if 'Name' in line:
        print(line)


# A refresher on manipulating strings

In [None]:
temporary_line = 'Adam,Hockenberry,02-25-1984\n'

In [None]:
print(temporary_line)

In [None]:
print(temporary_line.strip('\n'))

In [None]:
print(temporary_line.split(','))

In [None]:
print(temporary_line.strip('\n').split(','))

In [None]:
line_as_list = temporary_line.strip('\n').split(',')
print(line_as_list)

# Back to Agatha...

In [1]:
for line in Agatha:
    if 'Date of Birth' in line:
        print(line)
        birthday_line = line

NameError: name 'Agatha' is not defined

**Exercise:** apply the string manipulations that you just learned to get Agatha's birthday as variables that we can use

In [None]:
# birth_day = 
# birth_month = 
# birth_year = 

# We have all the parts but they're pretty scattered right now, let's put it all in one place:

In [None]:
current_month = 9
current_day = 9
current_year = 2015

myFile = open('../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt', 'rU')
Agatha = myFile.readlines()
for line in Agatha:
    if 'Date of Birth' in line:
        birthday_line = line
###########################################################################
###Paste your code to get birth_month, birth_day, and birth_year here!


print(birth_month, birth_day, birth_year)

###########################################################################
#### Copy and paste the algorithm that you developed to calculate someones age here!

        
# print(age)

# Writing files

If you perform some calculation, there are a number of reasons why you should store these values somewhere. 

There are three primary ways to store data: raw text, comma separated values (csv), and json.

Typically you would use raw text if you just have text (think like a story) or you want to have some thing like our roster files initially came in.

A CSV file is still plain text, but it is formatted as if it were a spreadsheet. Typically we write a header at the top and every column is separated by a comma (that's where the name comes from). This way when we read it back in, we know to split each row whenever a comma appears. Make sure to not use commas in your values though!

The last way is a JSON [(it's an acronym for something boring)](http://json.org/). JSON files let us store python dictionaries as text! When we read/write files it goes instantly from the raw text to a python dictionary.

In this example we're going to make a file that contains the names of the Teaching Assistants and their ages. Since this is data where we might have more columns in the future what file type should we use?

In [None]:
### Here is a dictionary of TA's names/ages
names = ['Adam H', 'Peter W', 'Joao M', 'Hyojun L']
ages = [21, 31, 24, 19]
# for x in zip(names, ages):
#     print(x)

In [None]:
delimiter = ','

# output_file = open('../Data/TA_ages.csv', 'w')

####Here is yet another way that should also do the exact same thing
# with open('../Data/TA_ages.csv', 'w') as output_file:
#     for name, age in age_dictionary.items():
#         output_file.write(name + delimiter + str(age) + '\n')
# output_file.close()
    

# What if we forgot someone?

Take a deep breath. All is not yet lost.

When opening files, we have used 'r' and 'w' for read and write but there is one more that I haven't told you about: 'a' for append.

In [None]:
new_ages = [['Adam P', 38], ['Nick T', 28]]
delimiter = ','

output_file = open('../Data/Day2-Collections-and-Files/TA_ages.csv', 'a')
for name, age in new_ages:
    output_file.write(name + delimiter + str(age) + '\n')    
output_file.close()

# Final exercise
Let's read that same file again but instead of calculating Agatha's age, I'd like to know her height in centimeters (it's for a collaboration with europeans). When you're finished, please write her name and her height into a new file that looks like this (it will be pretty boring for now, with only two lines):

    Name, Height (cm)
    Agathas_full_name, Agathas_height
    


***If you get stuck, remember to break the problem down into small steps:***
    
    1) Read the file and find the lines that we care about:
        a) name_line
        b) height_line
    2) Strip those lines apart so that we have three variables:
        a) name
        b) height_component_feet
        c) height_component_inches
    3) Get her height in inches:
        a) total_height_inches
    4) Then convert it into centimeters
    5) Write everything into a new file

In [None]:
####Load this file
input_file_name = '../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt'

########################
###Place your code here:


###Write data into this file
output_file_name = '../Data/Day2-Collections-and-Files/roster_heights.csv'