In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../Data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

In [2]:
# Write your imports here, so you can easily find and run them

import glob
import datetime

# Synopsis

In this unit we will learn that:

1. **Modular** code is more readable, easier to maintain, and less prone to the creeping in of bugs.

2. **Re-factoring** of code, that is, the re-writing and re-organizing of code, is a critical part of developing modular code.

3. Functions are the underpins of modular code. They enable a programmer to avoid repeating lines of code across a project.

    1. Descriptive function names increase code readability.
    
    2. Appropriate documentation of a function makes it easier to avoid logical errors.
    

+++

# Functions

**Writing modular code is good!**

**Functions** are the workhorses of modular programming in Python! So, what's a function?

You were actually exposed to functions when you filled in your answers to the homework questions **inside** of a function structure. So whenever you see this syntax:

>    def function_name():
>
>        statements
>
>        return something
        
that block of code is a function. 

Functions help us avoid repeating the same set of statements everytime we want to repeat a task. Functions increase code readibility. Functions make code revision and updating easier (you do not have to re-do revisions in all the places of your code where the task is needed. Functions make testing of your code easier and more reliable.

In order to execute the code in a function, you use the syntax `function_name()`. If you do not "call" your function in your code, then it is never executed.  However, the Python interpreter will still check its code for synthax errors.


In [3]:
# Let's write a really simple function -- a function that "says hello".

def says_hello():
    '''
    Prints the word "Hello"
    
    input:
        - None
    output:
        - None
    '''
    
    print('Hello!')

You just wrote a simple function! Notice that after writing it nothing was printed. That is because you didn't *call* the function, You only defined it so Python will know what on earth you're talking about should you so choose to write `says_hello` anywhere.

You *call* a function just by writing its name along with the parentheses:

In [4]:
says_hello()

Hello!


There you go!

Now let's diagram the basic parts of the syntax to give you an idea of what we are dealing with.

<img src = '../images/function_annotation.png'></img>

Here I allude to function inputs. Functions take in *variables* inside the parentheses. Those variables and variable names are only defined within the function. This creates a **Namespace**.  This compartmentalization of variable names makes it much easier to avoid errors due to using the name of variables defined in other parts of the code about which we may not be aware.

You've already been using functions and inputs, you just didn't know it yet. Does that syntax `say_hello()` look familiar to anything that you know?

How about:

In [5]:
print('Hello')

Hello


`print` is a **built-in** function. Given whatever input you pass it, the `print()` function prints that input to the screen. What happens if you do not provide an input?

In [6]:
print()




It printed nothing, as expected.

Let's create functions with more exciting transfers of data.  We will first write a `say_anything_twice()` function. It'll take as input a string and multiply it by 2 (have to say it twice!). It will then return that new string as an output. 

In [7]:
def say_anything_twice(anything):
    '''
    Multiplies a string by 2
    
    input:
        anything - str
    output:
        anything_twice - str
    '''
    
    anything_twice = anything * 2
    return anything_twice

In [8]:
print( say_anything_twice('hello') )

hellohello


**What's actually going on here?** 

The function took the string 'hello' as an input and assigned it **internally** to the variable `anything`. It then performed an operation on `anything` (in this case it multiplied it by 2) and assigned that value to a new variable called `anything_twice`. Finally, it **returned** `anything_twice`:
    
>    return anything_twice

This statement returns the result of the functions operations to the place in the your code where the functions was called.

The beauty of funtions, though not of this particular function which pretty much executes a single statement, is that we can call it on any string input without having to re-write a lot of code.

In [9]:
print( say_anything_twice('hi') )
print( say_anything_twice('hi ') )
print(say_anything_twice('hello again!'))
print(say_anything_twice('goodbye! '))

hihi
hi hi 
hello again!hello again!
goodbye! goodbye! 


**Recall that the variables defined inside the function do not exist outside!** If we try to access `anything_twice` we get a `NameError` exception.

In [10]:
anything_twice

NameError: name 'anything_twice' is not defined

This is and amazingly useful properties of Namespaces. We do not need to keep track of what variables are defined within the  functions we call in our code because as soon as the execution of those functions is concluded all variable defined internally to the function will be deleted.  Thus, we do not need to keep inventing new variable names.  

### Sidebar: Namespaces

Generally speaking, a Namespace (sometimes also called a context) is a naming system for making names unique to avoid ambiguity. A (not very good) namespacing system used in daily life is the naming of people with a firstname and a surname. A much better namespacing system is the  directory structure of file systems. The same file name can be used in different directories, the files can be uniquely accessed via the pathnames. 

Many programming languages use Namespaces or contexts for identifiers. An identifier defined in a Namespace is associated with that namespace. This way, the same identifier can be independently defined in multiple Namespaces. (Like the same file names in different directories) 

**Note that in Python we can define variables that are not limited to a given Namespace.  We can define `global` variables, and those variables will be accessible and known across all Namespaces created during the execution of your code.**




In [11]:
special_text = 'This is the best ever '

def say_something_special(anything):
    '''
    Adds a special_text to string

    input:
        anything - str
    output:
        anything_twice - str
    '''
    
    output = special_text + anything.upper()
    return output

In [12]:
say_something_special('hello')

'This is the best ever HELLO'

In the above code, we defined a global variable `special_text`, which was accessible inside the function `say_something_special()`. 

Notice how this is dangerous.  What if somewhere else in the code we change the value of `special_text`? Then things could look very different.

Because it may he difficult to keep track of the value of global variables, **it is not a good practice to make use of global variables in functions**.


In [13]:
special_text = "This is the WORSE EVER "
say_something_special('hello')

'This is the WORSE EVER HELLO'

## So how do we use functions?

A simple example is one thing, but it can be hard to figure out how we use functions in **real** code. 

The most powerful away in which functions can be used is as a device to sketch out what our program intends to do. To make this clear, let's go back to your administrative job and work with the `Roster` data.

There are 800 student data files in the `Data/Roster` folder that need to be parsed. What we want to accomplish is something like this:

    Find all files with student records
    For each file repeat the following actions:
            parse student record
            calculate age of student
            
Each instruction in the "code" above is actually a function that we can write to perform the task described. Our pseudo-code also makes it clear how we could add additional calculations to our program or to take actions based on the values we calculate.

Let's consider the first instruction in our pseudo-code: `Find all files with student records`.  If we are writing modular code, we are not writing all the necessary commands in our main program. Instead, we will define and write a function that will fund all student records. Since we want our code to be readable, we will call the function `find_student_records()`.

### Sketching out a function

There are three basic questions that help you sketch a function:

* What does this function do?
* What inputs does this function need to do that?
* What value(s) does the function return?

The answers to these questions need to be answered before you start writing any code.  They are also the question that someone planning to use your code later will want answered.  For this reason, This is the information you **must** provide in the function's `docstring`. 

#### What task is the function accomplishing?

The student files in our project are all in a single directory and all end with the `txt` extension.  Thus, we should write a  function that returns all of the filenames with the `txt` extension in a given directory.

#### What inputs does the function require?

The function needs to be given as input a directory and, if we want to be more general, an extension type.

#### How should the information returned be formatted?

Our function is collecting the file names of the student records. Our pseudo-code also tells us that we will be iterating through the elements in our collection.  Putting these facts together, it seems pretty clear that a good option for the output of our function is a list of file names in string form .

Let's implement these decisions!


In [14]:
def find_student_records(directory):
    """
    Return all roster filenames in directory
    
    input:
        directory - str, Directory that contains the roster files
    output:
        filenames - list, List of roster filenames in directory
    """
    # Statements here!
    
    return 

#assert()

Good! If  we look at our pseudo-code for guidance, we see that we need to write two other functions: `parse_student_record()` and `calculate_age()`.  It is your turn to sketch these functions! 

In [15]:
def parse_student_record( ):
    """
    
    """
    # Statements here!
    
    return

In [16]:
def calculate_age( ):
    """
    
    """
    # Statements here!
    
    return

A very cool thing is that these sketched functions enable us to write our pseudo-code for real and to run it! Of course the code returns nothing useful but it enables us to test it as we go along. 

In [18]:
filenames = find_student_records('./')

for name in filenames:
    data = parse_student_record(name)
    age = calculate_age(data)
    


TypeError: 'NoneType' object is not iterable

### Adding details to the sketched out function

With separate functions, it can be easy to develop and test each one without having to process the entire data set every time. Lets start by finding all the records.

A powerful capability of the notebook is that it enables us to write and test a piece of code in a single cell until we are confident that it does what we want. We can then copy that code into the cell where we want to define the function.

As before, let's do first things first.  Our first task is to find all student records. As any good programmer, we do not want to re-invent the wheel.  We know that `glob` is great for getting a list of filenames from a directory, so we will use it. 

In [19]:
glob.glob('../Data/Day2-Collections-and-Files/Roster/*.txt')

['../Data/Day2-Collections-and-Files/Roster/Victoria_Ross_528.txt',
 '../Data/Day2-Collections-and-Files/Roster/Agatha_Young_172.txt',
 '../Data/Day2-Collections-and-Files/Roster/Lorelei_williams_221.txt',
 '../Data/Day2-Collections-and-Files/Roster/Yvonne_Butler_729.txt',
 '../Data/Day2-Collections-and-Files/Roster/Ezekiel_James_796.txt',
 '../Data/Day2-Collections-and-Files/Roster/Ernest_Wood_735.txt',
 '../Data/Day2-Collections-and-Files/Roster/Yvonne_Brooks_269.txt',
 '../Data/Day2-Collections-and-Files/Roster/Tabitha_Green_530.txt',
 '../Data/Day2-Collections-and-Files/Roster/May-Sue_Flores_464.txt',
 '../Data/Day2-Collections-and-Files/Roster/Victoria_Cox_33.txt',
 '../Data/Day2-Collections-and-Files/Roster/May-Sue_Bennett_186.txt',
 '../Data/Day2-Collections-and-Files/Roster/Ernest_Wright_278.txt',
 '../Data/Day2-Collections-and-Files/Roster/May-Sue_Clark_784.txt',
 '../Data/Day2-Collections-and-Files/Roster/Tabitha_Price_661.txt',
 '../Data/Day2-Collections-and-Files/Roster/Aga

The command above works fine. However, it is **not general**, and it does not store its output. We want to make inputs and outputs very clear. The **input** is the directory and the **output** must be stored in a list.

In [20]:
directory = '../Data/Day2-Collections-and-Files/Roster'

filenames = glob.glob(directory + '/*.txt')

filenames[:5]

['../Data/Day2-Collections-and-Files/Roster/Victoria_Ross_528.txt',
 '../Data/Day2-Collections-and-Files/Roster/Agatha_Young_172.txt',
 '../Data/Day2-Collections-and-Files/Roster/Lorelei_williams_221.txt',
 '../Data/Day2-Collections-and-Files/Roster/Yvonne_Butler_729.txt',
 '../Data/Day2-Collections-and-Files/Roster/Ezekiel_James_796.txt']

Now just wrap the code that we've written into a function

In [21]:
def find_student_records(directory):
    """
    Return all roster filenames in a directory
    input:
        directory - str, Directory that contains the roster files
    output:
        filenames - list, List of roster filenames in directory
    """
    # Statements here!
    
    return filenames

Let's make sure that we get the answer we expect when we run the function using the directory string `../Data/Day2-Collections-and-Files/Roster/` as the input.

In [22]:
find_student_records('../Data/Day2-Collections-and-Files/Roster/')[:5]

['../Data/Day2-Collections-and-Files/Roster/Victoria_Ross_528.txt',
 '../Data/Day2-Collections-and-Files/Roster/Agatha_Young_172.txt',
 '../Data/Day2-Collections-and-Files/Roster/Lorelei_williams_221.txt',
 '../Data/Day2-Collections-and-Files/Roster/Yvonne_Butler_729.txt',
 '../Data/Day2-Collections-and-Files/Roster/Ezekiel_James_796.txt']

### Doing it for the other functions now!

In order to get the information we need for each student, we need to parse the content of each file.  In order to do this, it is important to recall what the content of the files looks like:


    #This is a file that holds important personal information that should not be shared. 
    #You are being watched.



    Name:	Buzz M. Baker
    Date of Birth:	4/20/87
    Email Address:	buzz.baker@northwestern.edu
    Department:	Engineering
    Height:	5ft,3in
    Weight:	194lbs
    Favorite Color:	Pink
    Favorite Animal:	Snake
    Zodiac Sign:	April


As before, we first sketch out the function `parse_student_record()`. We get the `path` for the file from the list `filenames`. We must open the file and read its lines.

In [23]:
path = '../Data/Day2-Collections-and-Files/Roster/Agatha_Bailey_798.txt'

# open the file and read the lines
# for each line in the file:
    # check if the line should be ignored
    # else:
        # split line
        # get field name
        # get field value
        # store field name and value in 'data holder'

        
# Check that it does what we want it to do!

Once our code is working and doing what we want it to do, we can incorporate it within the function `parse_student_record()`.

In [24]:
def parse_student_record(filename):
    '''
    Parses a student record file into a dictionary
    input:
        filename - str, path to the file
    output:
        data - dict, student attribute data
    '''
    # Statements here!
    
    return data


In [27]:
parse_student_record('../Data/Day2-Collections-and-Files/Roster/Agatha_Lee_11.txt')

NameError: name 'data' is not defined

In [28]:
parse_student_record('../Data/Day2-Collections-and-Files/Roster/Buzz_Baker_618.txt')

NameError: name 'data' is not defined

We are getting close. We have two of the three functions that we thought would be needed. We can already put the first two functions together to check things out.

In [29]:
filenames = find_student_records('../Data/Day2-Collections-and-Files/Roster/')
a_filename = filenames[0]
print( a_filename )
parse_student_record(a_filename)

../Data/Day2-Collections-and-Files/Roster/Victoria_Ross_528.txt


NameError: name 'data' is not defined

In order to calculate a student's age, we need to take her/his `Date of Birth` from the student's record. A difficulty is that the `Date of Birth` is stored as a `string`.  It would be much better to have a `Date of Birth` organized as a **tuple** with three elements (one's `Date of Birth` does not change, after all). 

We could write our function `calculate_age()` to take as an input a string, but that would be **bad practice**, since it would not be clear what the right format for the `string` would be be.  It is much better to do the conversion from `string` to `tuple` using a separate function. 

In [30]:
def string2tuple_dob(dob_string):
    '''
    Takes a date string of "M/D/YY" and converts it to a tuple with month, day, and year as integers

    input:
        * dob_string - str, birthday string of form "M/D/YY"
    output:
        * dob - tuple, ("month", "day", "year")
    '''
    #Statements
    
    return dob

In [31]:
clean_dob('8/7/1985')

NameError: name 'clean_dob' is not defined

The question now is where should we use that function.  Let's consider our pseudo-code again:

    Find all files with student records
    For each file repeat the following actions:
            parse student record
            calculate age of student

The logical place to do the conversion from `string` to `tuple` is inside the `parse student record` command.  If we are going to parse the data we should do it right the first time around.  So,let's add the conversion function to the `parse_student_record()` function.


In [None]:
def parse_student_record(filename):
    '''
    Parses a student record file into a list of tuples
    
    input:
        filename - str, path to the file
    output:
        data - list of tuples, student attribute data
    '''
    #Copy our code from above
    
    if field == 'Date of Birth':
        data.append( string2tuple_dob( value ) ) 
    
    return data

We can now take the code you originally wrote to calculate someone's age from the `File IO` notebook and turn it into a function.

In [None]:
def calculate_age( ):
    """
    
    """
    
    # Statements
    
    return

In [None]:
#Input is Month, Day, Year
calculate_age(3, 7, 1986)

In [None]:
buzz_baker = parse_student_record('../Data/Day2-Collections-and-Files/Roster/Buzz_Baker_618.txt')

for field, value in buzz_baker:
    if field == 'Date of Birth':
        #Check to make sure that the month is correct
        if value[0] == 4:
            print("Success! You are a rock star!")

+++

# Exercises

Find the number of students born in 1975:

In [None]:
#This is your list of people born in 1975
born_in_1975 = []

#This is your file location
data_dir = '../Data/Day2-Collections-and-Files/Roster'

#First you need to get all of the student files
data_files = find_student_records(data_dir)

#Now go through each file
for filename in data_files:
    #First we parse the student record into the dictionary
    data = parse_student_record(filename)
    
    #Now what field will let us check if a person is born in 1975?
    if ________ == 1975:
        #If you want to keep track of a person in 1975
        #You should keep track of it outside of this for loop
        #Remember that we lose anything inside the for loop once the loop finishes
        #So where could you possibly store someone's data that is born in 1975...
        #hmmm....................
        
#Now I want to know the number of people born in 1975
#What was that number
        

Excellent! But how many of those people born in 1975 were also born in March?

In [None]:
born_in_march = []
#I'll help you out here, you should iterate through the list of people born in 1975
for person in born_in_1975:
    #I'll let you figure out this part here
    
#Don't forget to tell me your answer!!!!
len(born_in_march)