In [3]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

In [2]:
import glob
import datetime

# What's a function?

You've actually already been exposed to functions because we had you fill in your answers in the homework **inside** a function. So whenever you see this syntax:

    def function_name():
        function_code
        return function_answer
        
that block of code is a function. 

The basic definition of a function is simple, it's a block of code packaged as a single unit that performs a specific task. It is executed by calling `function_name`. If we never actually call `function_name` anywhere in our code then it is never executed (although it is checked for errors by Python).

Let's start with the simplest function possible, you'll make a simple function that just says `hello`.

In [5]:
def say_hello():
    '''
    Prints "Hello"
    input:
        - None
    output:
        - None
    '''
    print('Hello!')

You just wrote a simple function! Notice that after writing it nothing was printed. That is because you didn't *call* the function. 

You *call* a function just by writing its name along with the parentheses.

In [4]:
say_hello()

Hello!


There you go!

Now let's diagram the basic parts of the syntax to give you an idea of what we are dealing with.

<img src = '../images/function_annotation.png'></img>

Here I allude to function inputs. Functions take in *variables* inside the parentheses. Those variables and variable names are then only used inside the function. This creates a little world inside the function.

You can show that here, you'll start by making a `say_anything_twice()` function. It'll take any text and multiply it by 2 (have to say it twice!). It will then return that new string as an output.

In [9]:
def say_anything_twice(anything):
    '''
    Multiplies a string by 2
    input:
        anything - str
    output:
        anything_twice - str
    '''
    anything_twice = anything * 2
    return anything_twice

say_anything_twice('hello')

'hellohello'

Now notice that we can't actually use the anything_twice variable outside of the function

In [10]:
anything_twice

NameError: name 'anything_twice' is not defined

If you try to use a function variable outside of a function you will get a `NameError`. That is because that variable only exists *inside* the function and is created when we call the function.

However, functions can use variables that are created outside of any function (this is called a `global` variable)

In [13]:
special_text = 'This is special text'

def special_text_twice(anything):
    '''
    Multiplies a string by 2
    input:
        anything - str
    output:
        anything_twice - str
    '''
    anything_twice = special_text * 2
    return anything_twice

special_text_twice('hello')

'This is special textThis is special text'

I tell you about this so you know, but **relying on `global` variables in functions is not a good practice.**

So try not to do it!

If you were curious, we call variables *inside* the function `local` variables.

# So how do we use functions?

A simple example is one thing, but it can be hard to figure out how we use functions in **real** code. To make this clear, let's go back to your administrative job and work with the `Roster` data.

There are 800 student data files in the `Data/Roster` folder that need to be parsed. Previously we put all of this code into a `for` loop, which easily wne through every single file in turn. However, if you want to do anything else when you parse the file, like calculate the height, many modifications would need to be made to the `for` loop.

Continually adding these small changes will make the amount of code in the `for` loop grow quite a bit, making it more and more unreadable. When something is unreadable it's hard to quickly understand how a specific value is calculated or what values are used. This means that when you need to make a change it takes you **a lot** more time to re-acquaint yourself with the code and implement even simple additions.

Also, when you add more code, you add more variables. These variables are flying around all over our code, making tracking the flow difficult. What's even **worse** is that you become far more likely to modify a variable in a way that makes another line of code **fail** (everyone shakes their head no, but everyone does it!).

**Functions** are what help us solve this problem. 

# Writing a function

When we write a function, we extract just a small, *specific* part of our overall program and write it as its own little piece of code. With a clear name it becomes easy to identify what the function does, increasing readability. 

Furthermore, if we need to change how something is calculated (say that new roster files have the height in centimeters instead of inches), we'll know **right away** where the code we need to modify is located. We'll also know that we **don't need to change it** anywhere else. That's another point of functions, if we write a function to calculate the height we should then use it everywhere we need to calculate height in the code.

So right away we can reimagine our initial code to read the `Roster` files as 
    
    #Here we write a function to grab all of the roster filenames
    data_files = find_student_records(directory)
    #Now we iterate through each filename
    for each file in data_files:
        #Next we parse the file
        data = parse_student_record(file)
        #We simply calculate the code outside of the for loop
        age = calculate_age(data)
            
Make sense now?

Now before you start programming it's best to actually sketch out the function.

## Sketching out a function

There are three basic questions that help you sketch a function:

* What does this function do?
* What inputs does this function need to do that?
* What value(s) does the function return?

After you answer these three questions, you will then know the bounds on what code you can write inside that function.

These questions may seem like a simple exercise, but this information is actually the **same** information that you should be writing in your docstring. That way you can easily identify at any time what a function **does**, what value it **needs to run**, and what value it **returns**.

Let's start by doing this with our `find_student_records` function.

### What does the function do?

This function should return all of the roster filenames in a directory.

### What does the function need to perform this task?

It needs to know what directory to look inside for the filenames.

### What does the function return?

The function should return a list of all the filenames.

Now it's far more clear what you need!

In [None]:
def find_student_records(directory):
    """
    Return all roster filenames in a directory
    input:
        directory - str, Directory that contains the roster files
    output:
        filenames - list, List of roster filenames in directory
    """
    return filenames

Now you should sketch the function to `parse_student_record()`

In [None]:
def parse_student_record( ):
    """
    
    """
    return

And our last function to sketch is the `calculate_age()` function. 

In [14]:
def calculate_age( ):
    """
    
    """
    return

# Now let's build up to a function

With separate functions, it can be easy to develop and test each one without having to process the entire data set every time. Lets start by finding all the records.

The easiest way to do this is to just write the code outside of a function first and then wrap a function around it. You will use `glob` to get all of the filenames, so I've already added it to the top of the notebook.

Just write the simple `glob` statement like in the next notebook

In [None]:
glob.glob('../Data/Roster/*.txt')

Now separate the inputs and outputs. Your **input** should be the directory and your **output** should be filenames.

In [17]:
directory = '../Data/Roster'

filenames = # ...

filenames[:5]

Now just wrap the code that we've written into a function

In [None]:
def find_student_records(directory):
    """
    Return all roster filenames in a directory
    input:
        directory - str, Directory that contains the roster files
    output:
        filenames - list, List of roster filenames in directory
    """
    return filenames

When you execute the function with the directory string you should get back the same five filenames.

In [19]:
find_student_records('../Data/Roster/')[:5]

Now that we have a way to find all the records: the `find_student_records` function, the next piece of the puzzle we laid out in our pseudocode is a function to read an individual record. Recall that the records looked like the following:

    #This is a file that holds important personal information that should not be shared. 
    #You are being watched.



    Name:	Buzz M. Baker
    Date of Birth:	4/20/87
    Email Address:	buzz.baker@northwestern.edu
    Department:	Engineering
    Height:	5ft,3in
    Weight:	194lbs
    Favorite Color:	Pink
    Favorite Animal:	Snake
    Zodiac Sign:	April


### Processing the data file

**Question**: What is the best data type to represent the student's data?

1. string
2. list
3. dictionary
4. set
5. tuple

### Roughing it out

In [None]:
path = '../Data/Roster/Agatha_Bailey_798.txt'

# create something to hold the data
# open the file
    # for each line in the file
        # ignore comment lines (those that start with "#")

        # Exercise parts
        # --------------
        # split the line
        # make sure the line has the correct number of parts
        # clean up the parts (strip whitespace)
        # store data in the 'data holder'

### Turn it into a function

In [None]:
def parse_student_record(filename):
    '''
    Parses a student record file into a dictionary
    input:
        filename - str, path to the file
    output:
        data - dict, student attribute data
    '''

    return data


In [None]:
parse_student_record('../Data/Roster/Agatha_Lee_11.txt')

In [None]:
parse_student_record('../Data/Roster/Buzz_Baker_618.txt')

Now we can parse all of the files!

In [None]:
filenames = find_student_records('../Data/Roster/')
a_filename = filenames[0]
a_filename

In [21]:
parse_student_record(a_filename)

Now take the code you originally wrote to calculate someone's age from the `File IO` notebook and turn it into a function.

In [None]:
def calculate_age( ):
    """
    
    """
    return

In [2]:
calculate_age({'month': 3, 'day':7, 'year': 1986})

Great! But one thing is that our calculate age function takes in a birth month, day, and year and then calculates the age given the current year.

Our input data is actually a `string` of the format `M/D/YY`. Turning a `string` into numerals of birth month, day, and year doesn't sound like it's a part of the mission in a function named `calculate_age()`.

So let's make another function called `clean_dob()` that will transform this string into the birth month, day, and year.

In [None]:
def clean_dob(dob_string):
    '''
    Takes a date string of "M/D/YY" and converts it to the month, day, and year parts as integers
    Returns those integer dates in a dictionary.
    input:
        * dob_string - str, birthday string of form "M/D/YY"
    output:
        * dob - dictionary, {"month": int, "day": int, "year": int}
    '''
    
    return dob

In [None]:
clean_dob('8/7/1985')

Now let's add this cleaning function back to our `parse_student_record()` function.

Copy the above code 

In [None]:
def parse_student_record(filename):
    '''
    Parses a student record file into a dictionary
    input:
        filename - str, path to the file
    output:
        data - dict, student attribute data
    '''
    #Copy our code from above
    
    data['Date of Birth'] = clean_dob(data['Date of Birth'])
    
    return data

In [None]:
buzz_baker = parse_student_record('../Data/Roster/Buzz_Baker_618.txt')

if buzz_baker['Date of Birth']['month'] == 4:
    print("Success! You are a rock star!")

Now as a little exercise, let's find the number of people that were born in 1975

In [None]:
#This is your list of people born in 1975
born_in_1975 = []
#This is your file location
data_dir = '../Data/Roster'
#First you need to get all of the student files
data_files = find_student_records(data_dir)
#Now go through each file
for filename in data_files:
    #First we parse the student record into the dictionary
    data = parse_student_record(filename)
    #Now what field will let us check if a person is born in 1975?
    if ________ == 1975:
        #If you want to keep track of a person in 1975
        #You should keep track of it outside of this for loop
        #Remember that we lose anything inside the for loop once the loop finishes
        #So where could you possibly store someone's data that is born in 1975...
        #hmmm....................
        
#Now I wanted to know the number of people born in 1975
#What was that number
        

Excellent! But how many of those people born in 1975 were also born in March?

In [None]:
born_in_march = []
#I'll help you out here, you should iterate through the list of people born in 1975
for person in born_in_1975:
    #I'll let you figure out this part here
    
#Don't forget to tell me your answer!!!!
len(born_in_march)

Excellent work! There's always more to do (like maybe calculating the tallest person in our roster), but you're really a function master now!