# Writing Custom Functions

**Learning Objectives**:

- Learn to write custom functions.
- Use `return` to provide function outputs.
- Learn to *modularize* code by creating functions.
* * * * *

We have already used *built-in* functions like `len()` , `sum()`, `pd.DataFrame()`, in our code. These are essentially shortcuts that make it so that we don't need to write many lines of code to accomplish certain tasks. 

We didn't technically need these functions to perform their computation. For example, we can calculate the sum without relying on the function by using a for loop with aggregation:

In [None]:
numbers =  [1, 3, 5, 7]

# Accumulate the numbers
total = 0
for number in numbers:
    # Add the current value to the sum of the values
    total = total + number
print('The sum is', total)
print('Using the sum function:',sum(numbers))

However, we don't want to have to use this block of code every single time we'd like to add some numbers up. The function is simpler and easier to read.

This is the motivation for *modularizing* our code. When we have an operation we expect to perform repeatedly, like summing, it's useful to have a function that is specifically built to do this.

The functions we've used so far have been **built-in functions**. Python provided them to us. However, we'll often need to create our own functions. Python has a specific syntax for creating new functions.

Let's take a look at a quick example, creating the `my_sum` function that will perform the sum we just did:

In [None]:
def my_sum(list_of_numbers):
    total = 0
    for number in numbers:
        # Add the current value to the sum of the values
        total = total + number
    return total

In [None]:
print("The sum with sum() is:", sum(numbers))
print("The sum with my_sum() is:", my_sum(numbers))

Functions save us a lot of time, but they aren't black boxes. Rather, we can think of functions as basic building blocks that we expect to use over and over again.

Using existing functions from packages, or built-in functions, is generally preferred because it saves time (and effort). For example, we wouldn't write a custom `sum()` function when one already exists. However, when a function doesn't already exist that performs the desired operation, we can write our own custom function.

Specifically, a function does three things:

1. They name pieces of code the way variables name strings and numbers.
2. They accept arguments, or inputs on which you'll operate. Arguments are also called parameters.
3. They return values that can be referred to in further operations.

The details are pretty simple, but this is one of those ideas where it's good to get lots of practice!

## Basic Function Syntax

The structure of a function is similar to what we've seen before: keyword, name, colon, and body.

*   Functions begin with the keyword `def`.
*   This keyword is followed by the function *name*.
    *   The name must obey the same rules as variable names.
*   The *arguments* or *parameters* are defined in parentheses as variable names.
    *   Use empty parentheses if the function doesn't take any inputs.
*   A colon indicates the end of the function *signature*.
*   An indented block of code denotes the start of the *body*.
*   The final line should be a `return` statement with the value(s) to be returned from the function

**Note:** Arguments and variables created within the function only exist within the function and cannot be referred to unless returned by the function using the `return` statement.

In [None]:
def feet_to_meters(feet):
    return(feet * .304)

Notice how there is no print statement from running the block of code above. This is because defining a function does not run it. You can think of it as assigning a value to a variable. The function needs to be *called* with appropriate arguments to execute the code it contains. 

Let's run this function. We can save the output to a variable and print the result.

In [None]:
meters = feet_to_meters(100)
print(meters)

## Challenge 1: My First Function

Write a function that converts Celsius temperatures to Fahrenheit. The formula for this conversion is:

$$F = 1.8 * C + 32$$

In [None]:
def ___:
    ...
    return(____)

## Function Arguments

Function **arguments** or **parameters** are specified when defining a function in the parentheses, separated by commas. 

These arguments become variables when the function is executed. The variables are assigned the values passed to the function. We do operations based on the arguments, and return the result.

Let's look at an example function in which we're performing division:

In [None]:
def divide(x, y):
    return(x / y)

print(divide(4, 6))
print(divide(6, 4))

The order of the arguments matter; we got different results because each argument had a different role (numerator and denominator).

You can also pass in **keyword arguments**, where each argument is given a name. In this case, the order of the arguments doesn't matter, since each has a name associated with it. For example:

In [None]:
print(divide(x=4, y=6))
print(divide(y=6, x=4))

Are the arguments named appropriately? What does x and y stand for? What could be more clear?

Generally, it's good practice to both use well-named arguments and use them in the same order. This is easier to read.

## Challenge 2: Calling Keyword Arguments

Call the following function using the keyword arguments. Specifically, print the date corresponding to January 1, 2003.

In [None]:
def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

## Default Arguments

We can also specify **default arguments** in functions. When we provide a default argument, the function will use that value when the user does not pass in a value. Default arguments are specified in the function signature.

In [None]:
# y has default value equal to 10
def divide(x, y=10):
    return(x / y)

In [None]:
# User inputs on both values
print(divide(x=4, y=10))
# No input on y
print(divide(x=4))
# Unnamed inputs for both values
print(divide(4, 10))
# Unnamed input, with no second value passed in
print(divide(4))
# Unnamed first input, named second input
print(divide(4, y=10))

## Challenge 3: More Errors!

Why do the following lines return errors?

**Hint**: Think about what happens inside the function, and how the arguments plug into the function.

In [None]:
divide(y=10, 4)

In [None]:
divide(4, y='10')

There's a lot of different permutations of arguments in functions, so keeping them organized 

## Principles of Writing Your Own Functions

Function writing is one of the most important skills you can develop as a programmer.  

Here are some guidelines that can help minimize errors and make the process less painful:

1. **Plan**
    1. What is the overall goal of the function? Is there a function that exists already that does the same thing? 
    2. What is going to be the output of the function? (what data type, how many items)?
    3. What arguments will you need? What pieces of the function do you need to control?
    4. What are the general steps of the program? This can be written in bullet points or "pseudocode".
2. **Write**
    1. Write the code without the function wrapper.
    2. Start small. Write small self-contained blocks of code and put the pieces together. You can also consider sub-functions.
    3. Test each part of the function as it is added. Track the input of the function and how it changes at each step. 
    4. Wrap the code in the function syntax.
3. **Test**
    1. Take the function and test *several* cases.
    2. Before running test cases, form an expectation of the result. 
    3. Test the function. Pay attention to both errors and strange results. Make adjustments to account for new cases.
    4. Integrate the function with the rest of the code. Are the input arguments the right type? Does the output flow into the rest of the code?

Let's go through an example of the function development process.

Let's say we have a list of filenames from an experiment. Each filename has two parts, a county and a year, separated by an underscore (e.g., `Alameda_2020.csv`). We are interested in parsing these names into to a data frame with two columns, one containing the county (lowercase) and one with the year. 

1. **Plan**
    1. Parse a list of strings into two parts.
    2. Input: list of strings
    3. Output: two lists, one of strings, one of ints
    4. The pseudocode might look like this: 
        ``` 
        function
            for file in filelist:
                split file into parts
                process each part
                append to list
            make a dataframe
            return
        ```

2. **Write**

Let's start with a single file inside the loop. First, we'll test the splitting component of the function on a single file.

In [None]:
files = ['Alameda_2020.csv',
         'Marin_2020.csv',
         'Contra Costa_2020.csv',
         'Alameda_2021.csv',
         'San Francisco_2021.csv']

test_file = files[1]

In [None]:
file_parts = test_file.split('.')[0].split('_')
print(file_parts)

Once we have the parts parsed out, we can do the next step and process each part appropriately. For the county, we want to make it lowercase, and for the year, we can convert it to an integer.

In [None]:
file_parts = test_file.split('.')[0].split('_')
county = file_parts[0]
year = file_parts[1]
# Lowercase county
county = county.lower()
# Convert year to int
year = int(year)
# Check output
print(county)
print(type(year))

The next step is to wrap this bit of code in a for-loop to process all files:

In [None]:
files = ['Alameda_2020.csv',
         'Marin_2020.csv',
         'Contra Costa_2020.csv',
         'Alameda_2021.csv',
         'San Francisco_2021.csv']
county_list = []
year_list = []

# Iterate over files
for file in files:
    file_parts = test_file.split('.')[0].split('_')
    county = file_parts[0]
    year = file_parts[1]
    # Lowercase county
    county = county.lower()
    # Convert year to int
    year = int(year)
    # Store outputs
    county_list.append(county)
    year_list.append(year)
# Check outputs
print(county_list)
print(year_list)

What happened? How do we fix it? When we run the code on the whole loop, do you notice anything about the other county names? What might we want to change?

Once the full code works, we can do the final steps: convert the output to a DataFrame and place everything into a function.

In [None]:
import pandas as pd

def parse_files(filelist):
    county_list = []
    year_list = []
    for file in files:
        file_parts = file.split('.')[0].split('_')
        county = file_parts[0]
        year = file_parts[1]
        # Lowercase county
        county = county.lower()
        # Convert year to int
        year = int(year)
        # Store outputs
        county_list.append(county)
        year_list.append(year)

    df = pd.DataFrame({'county': county_list,
                       'year': year_list})
    return df

In [None]:
files = ['Alameda_2020.csv',
         'Marin_2020.csv',
         'Contra Costa_2020.csv',
         'Alameda_2021.csv',
         'San Francisco_2021.csv']
output = parse_files(files)
output

## Challenge 4: Advanced Conversion Function

Let's take our conversion function from before make it more flexible.

Let's say we want to convert from feet to other units as well. Change the original conversion function to a more generalized version `convert_from_feet(x,unit='meters')` by adding a keyword argument `unit` that defaults to meters. Use if statements within the body of the function to do the appropriate conversion based on this keyword argument and return the value. Choose two additional units to meters (such as inches, miles, or centimeters), and add them to the conversion function.

Follow the steps:

1. Plan your function. 
2. Write your function. 
3. Test the function.

**Bonus**: What if you wanted to convert several values at once? What if you want to convert to several other units at once?

In [None]:
# Original conversion function
def convert_feet_to_meters(feet):
    meters = feet * .304
    return(meters)

In [None]:
# YOUR CODE HERE
