# Tutorial 5: Functions and Loops

Often in coding, we have repetitive tasks to do. We could do the tasks by hand each time, but what if we had to do the task for 100 samples? Or 1000 samples? It would be a lot of work to do the task by hand each time. There is also a lot more room for error if we repeat the task by hand 100 or 1000 times.  
Instead of doing a task by hand repeatedly, we can use some Python tools to streamline our tasks. We will learn about two of these tools in this module: functions and for loops.   

By the end of this tutorial, you will be able to:
* recognize the parts of a function and a loop
* create a function that will streamline a repetitive task
* process data efficiently in a loop
* identify which parts of code goes in a loop and which parts stay outside the loop

## What is a function?

A Python function is a group of code that explains how to complete a task for a given set of inputs. </font> Just like in math, we can think of a function in Python as code where you enter an input and it returns an output. f(x) = y, where x is the input, f( ) is the function, and y is the output. </font>

In Python, we must name the function, list the possible inputs, describe how we would like the inputs to be modified, then list the outputs. To do this, we must *define* a function. </font> For example, let's look at the function below. At first glance, what do you think this function does?

In [None]:
def square_function(input):
    output = input ** 2
    return output

This is a simple function that takes the square of the input and then gives that new value as the output. But let's break this function down into its components. </font>

In [None]:
def square_function(input): # 1 - define the function
    output = input ** 2     # 2 - function body
    return output           # 3 - return statement

Line 1: This is where we define the function and what our input will be. 
- **def:** the keyword that means defining a function.
- **function name:** it is helpful to name the function something that will indicate its purpose.
- **parameter:** this is our input, it is what we use to pass values to the function. We can name it whatever we want and can have multiple parameters seperated by commas.
- **colon:** this is part of the function syntax, don't forget it!

Line 2: This is the **function body**, where we enter one or more lines of code that perform some task. They must be indented. 
- in our example, this is where we tell the function to make the ouput the square of the input.

Line 3: This is where we tell the function to end, and what we want the output to be.
- **return statement:** returns a value from the function.

Let's use it. To use a function, you *call* it by typing the name and an input in the partentheses. 

In [None]:
input = 10             #replace input with different numbers to test this out
square_function(input)

In [None]:
# What happens when you run this cell?
square_function("two")

**Knowledge Check:** The previous line of code gave an error. Why did that happen? How can you fix the problem? </font> 

### Why do we use functions?

By now you can probably see that functions make our lives easier! 
We don't have to type the same code over and over to perform a task. Instead, we can just call the function each time we want to perform that task.

**Knowledge Check:** Think of a situation in a dataset where you might want to use a function. Try outlining the function in a code cell below. It doesn't need to work, just define the function, put some notes in the function body, and include a return statement.

## Why do we use loops?

Often when we use functions, we have a lot of data that we want to perform the same task on. For example, say we have a list of 5 numbers and we want to find the square of each:

In [None]:
number_list = [1,2,7,9,12]

We could call the square_function on each number in the list individually. However, if we had a really large dataset, this would take a really long time.

In [None]:
print(square_function(1))
print(square_function(2))
print(square_function(7))
# and so on.....

Instead of writing out the function each time, we can use a loop. A *for loop* allows us to *iterate* through the list, performing the same task for each value in the list. 

In [None]:
for number in number_list:         # go through each item in list
    print(square_function(number)) # print the square of each item using our square function

For loops are really flexible tools because you can write anything you want in the loop. And you can iterate through any sequence you want. For example, instead of saying number_list, you could tell Python to iterate through a range of numbers instead. </font>

In [None]:
for x in range(5):                 # go through each instance in the sequence
    number = number_list[x]        # we are calling on each item in the list using the index
    print(square_function(number)) # print the square of each item using our square function

Let's break down the above loop a little more. There are 5 main parts to a for loop in Python.
1. the "for" keyword. If you don't use for at the beginning of your code, then Python won't know to loop through the elements in the range.
2. a variable. In the loop above, the variable is x. This variable is what is iterated in the loop.
3. the "in" keyword. The in keyword tells Python to generate the variables (in this case, x) from the range provided. 
4. the range. The range is a list or sequence of all the possible versions of the variable you want. In the loop above, the range is range(5)=[0,1,2,3,4]. This means we want x to equal 0, 1, 2, 3, and 4. 
5. The repeated code body. This is everything after the colon. It is all the actions performed on the variable x in each iteration.

For the above for loop, the first line "for x in range(5)" means "repeat the following code body for x=0, x=1, x=2, x=3, and x=4". Then the next two lines are the actions performed on each of those values of x. Specifically, we use x as the index of the number_list. We then use the selected value from number_list and square it. Then, we print out the final value. 

The examples we've seen so far of for loops have been pretty straightforward. However, loops can be used in many complex and unique ways that actually end up simplifying our coding. For example, instead of printing out each item in the for loop above, perhaps we want to make a new list of squared values. This means we could have a new list of values and we wouldn't have to copy all the printed values down. To do this, we could tell the for loop to add each item to a new list using *append*. 

In [None]:
squared_list = []                                # create an empty list
for number in number_list:                       # go through each item in the sequence
    squared_list.append(square_function(number)) # square each item, append to the new list
    print(squared_list)                          # see our list of squared numbers

Uh oh! What is happening here? Why do we have multiple lists?

Currently, the print statement is inside the for loop. It is part of the repeated code body. Our code is telling the loop to print the squared list for every iteration. So we are seeing a new list printed everytime we square the next number.

To fix this, we want print statement *outside* of the for loop. To do this, we unindent it:

In [None]:
squared_list = []                                # create an empty list
for number in number_list:                       # go through each item in list
    squared_list.append(square_function(number)) # square each item, append to the new list

print(squared_list)                              # see our list of squared numbers

**Indentations are key for for loops!** If you don't want a step repeated for every iteration, you must make sure it is outside the for loop.

**Note:** We can also use for loops within functions or functions within loops. We will see an example of this later.

## Building functions and loops

Let's try a simple example of building a function and using a loop.

Create a function to find the area of a circle given the radius. Then test the function on a range of possible radii from 1 to 10.

Here are some guiding questions:

In [None]:
def                         # What should you name this function? What will be the input?
                            # What will be the function body? 
    
    
    return                  # What will be the output?                         

In [None]:
# To practice, try creating an equation for the area of the circle and plugging in numbers

Now, test this function on a range of radii from 1-10.

In [None]:
# Remember, we can use a for loop to iterate over a range
for 

If you are having trouble, or want to check your work, the answer to this example is at the bottom of the tutorial. Remember, your function doesn't have to be formatted the same as the answer, as long is it gets the correct values.

## Putting it all together

Here, we'll go through an example situation where using functions and for loops is really helpful for saving time and preventing boredom! </font>

In the atmospheric sciences, we often use data from really large datasets of monitoring data, satellite data, model data, etc. This data is often publically available online, and can be downloaded for free. However, it is not always stored in a way that makes it easy to download the exact time or place of interest to you. For example, air quality data from the EPA can be selected based on location (state and county), species of interest, and time period. You could click through the website and download each individual file you want, or you could create a function to do the work for you, and then use a for loop to iterate over all the different time, species, and location combinations you want. </font>

In order to access the EPA's hourly data from The Air Quality System (AQS), we need to create a url using the correct parameters. **We want to create a function that can make building each url easier.**

First, sign up for the AQS here: https://aqs.epa.gov/aqsweb/documents/data_api.html. You will need a user key for each url, and can't access data until you have that code. 

Next, scroll down the page to the section that says "Sample Data". There are many different levels of data you can access. We want to access sample data by county. To do so, we will need to make a url that has the required variables (email, key, param, bdate, edate, state, county) and format given. See image below.

You can find the codes for the variables here: https://www.epa.gov/aqs/aqs-code-list

![Screen%20Shot%202021-10-30%20at%202.22.53%20PM.png](attachment:Screen%20Shot%202021-10-30%20at%202.22.53%20PM.png)

### Part 1: building the function

We know that each url will start with the same beginning because your email and key won't change. Let's call this the base. **Make sure to enter your own email and key into the base.**

In [None]:
base = "https://aqs.epa.gov/data/api/sampleData/byCounty?email=youremail@wisc.edu&key=yourkey"

Next, what are the variables that we still need for the url? 
- param
- bdate
- edate
- state
- county

These are the paramaters(inputs) that we will want to pass new values into depending on what data we want. These will go in the function's parentheses. 

As you can see in the url example in the image above, each variable has the format "&variable=" before the value. Therefore, we will need to add these fillers as strings for each variable. So, our url will be:

url = base + "&param=" + param + "&bdate=" + bdate + "&edate=" + edate + "&state=" + state + "&county=" + county

Putting it together:

In [None]:
def make_url(param, bdate, edate, state, county): # be sure to include all necessary inputs!
    base = "https://aqs.epa.gov/data/api/sampleData/byCounty?email=youremail@wisc.edu&key=yourkey" 
    # make sure to edit the base with your email and key
    var = "&param=" + param + "&bdate=" + bdate + "&edate=" + edate + "&state=" + state + "&county=" + county
    url = base + var
    return url

### Part 2: calling the function

For this example, we want to access ozone data from Milwaukee, WI, for the month of July in 2018. 

Using the code list, we find that:
- code for param ozone = 44201
- code for state = 55
- code for county = 079

We know:
- bdate = 20180701
- edate = 20180731

In [None]:
# create the inputs for your function
param = "44201"
bdate = "20130701"
edate = "20130731"
state = "55"
county = "079"

url = make_url(param, bdate, edate, state, county) # plug values in
print(url) # see url

Copy and paste the url into your browser to see the data. What do you notice about the data? Do you recognize the format that it comes in?

The data comes in a **JSON format**, which you may have never seen before. JSON is a way of storing data that is often used in transferring data from a web server to a local computer. It is not as easy to visualize and work with as other file types, so **we're going to convert it to a Pandas DataFrame.**

Let's import the packages we will need:

In [None]:
import requests
import json
import pandas as pd

First, let's get all of the contents from the web page.

In [None]:
# use requests.get function to retreive contents from a given url
r = requests.get(url)

In [None]:
# use json.loads function to tell the code that the content we want is in json format
contents = json.loads(r.text)

In [None]:
 # see the contents of the web page
contents

Look at the first few lines of the above data. Notice that when loading in the contents of the web page, we also loaded the "header", which contains information about our request time, url, number of rows, etc. **This information is not a part of our data, so we want to exclude it from our dataframe!**

To do so, let's look at where our data actually starts. After the header, we see 'Data' followed by a colon and square bracket. That's probably a good sign that the data starts there! To access this, we can "index by key", where "Data" is our key.

In [None]:
# index by key to only get the contents in "Data" section
data = contents["Data"]
data

Great! Now we don't have that pesky header and can put our data into a Pandas DataFrame. Panda's will do the work of formatting this nicely for us.

In [None]:
data_df = pd.DataFrame(data)   # put data into DataFrame
data_df

We have now successfully created urls to retrieve AQS data, downloaded the data, and converted it to a pandas DataFrame! From pandas, we can manipulate the data or turn it into a csv or excel file. 

If we had many urls for which we wanted to make DataFrames, it would be frustrating to repeat these steps over and over. We can put these steps into a function to make our lives easier:

In [None]:
def make_df(url):
    r = requests.get(url)
    contents = json.loads(r.text) # gives you a Python dictionary
    data = contents["Data"]
    data_df = pd.DataFrame(data)
    return data_df

In [None]:
df = make_df(url)
df

Now, we have a function that makes the url and a function that turns the url into a DataFrame. **Using functions, we took a several step process and turned it into a two step process!**

Hopefully, you can now see how useful functions and loops are in Python. In fact, functions and loops are so helpful, they are considered part of good coding practice. If you are ever coding something and realize you are using copy and paste to do the same thing over and over, then you should make a function instead! Using a function also keeps your code shorter, so it is more manageable to follow. 

Functions and loops are very important to coding, so let's get some practice.  

# Exercises

### 1: A brick house

**a)** Imagine you are building a house made of bricks. Each brick weighs about 5 pounds. You want to know how much all the bricks needed to build your house weigh. Write a function to determine the total weight of your house based on how many bricks are used to build it.

**b)** Let's say each brick used to build your house is 8 inches by 2.5 inches by 4 inches (8 long x 2.5 tall x 4 wide). Write a function to calculate how many bricks are needed to fill a certain volume.

**c)** Let's say your house is a cube where each wall is 20 ft in length and 20 ft in height. Each wall is 4 inches wide (the width of one brick). Calculate the volume of the walls of your house and then use your functions to determine the total weight of all the bricks you needed to build your brick house.

**d)** Turns out, a 400 square foot house is too small! Let's test out how much a house weighs in a bunch of differently sized houses. Write a function to determine the volume of a house's walls based on the length, height, and width of the walls. Then, using a loop, calculate the weight of square houses with a length/height of walls from 20 ft to 100 ft, testing each 10 ft addition (keep the width of the walls at 4 in).

### 2. Plotting in a loop

One really useful aspect to loops is the ability to plot multiple series of data on one figure, or in multiple subplots, with only a few lines of code. We haven't gotten into making subplots yet, so this question will focus on use a for loop to plot multiple series of data in one figure. 

Here, we'll use the functions we made earlier to download some data and plot it. Let's look at ozone in Wisconsin.

In [None]:
# remember to import all the packages you'll need
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# the function to make a url to download the data
# make sure to edit the base with your email and key!
def make_url(param, bdate, edate, state, county):
    base = "https://aqs.epa.gov/data/api/sampleData/byCounty?email=youremail@wisc.edu&key=yourkey" 
    var = "&param=" + param + "&bdate=" + bdate + "&edate=" + edate + "&state=" + state + "&county=" + county
    url = base + var
    return url

In [None]:
# the function to turn the url into a Dataframe
def make_df(url):
    r = requests.get(url)
    contents = json.loads(r.text)
    data = contents["Data"]
    data_df = pd.DataFrame(data)
    return data_df

**a)** Select a time range to look at. It can be anything. Then run the two functions for the time range. Finally, make a line plot of ozone over the time period of interest.

In [None]:
param = '44201'
bdate = ''# start time here
edate = ''# end time here
state = '55'
county = '025' # this is the county code for Dane County

In [None]:
url = make_url(param, bdate, edate, state, county)
df = make_df(url)

In [None]:
# look at the data and the column labels. Which column do you want to use for y-data?
print(df.columns)
df

In [None]:
# using the appropriate y data from your DataFrame, make a line plot displaying ozone versus time.
ydata = 

fig = plt.figure(figsize=(10,8)) # you can change the figure size if you want
# what plotting info goes here?
# try to make a professional-looking figure with axes labels and a legend
plt.plot()

**b)** Take a look at the plotting code you used above. If you were going to plot another line plot on the same figure, which lines of code would you repeat? Which lines of code would you not repeat? Copy and paste the lines of code into the appropriate code cells below.

In [None]:
# don't repeat

In [None]:
# repeat

The repeated lines of code are the components you would put into a for loop. The lines that you don't repeat would be outside the loop. 

**c)** For this problem, you will plot 3 lines on one figure using a for loop. To do this, you will look at 3 different years of data in the same county. For example, you might look at May 2011, May 2012, and May 2013. 

In [None]:
# first, select 3 different time periods to study
bdate1 = ''
edate1 = ''
bdate2 = ''
edate2 = ''
bdate3 = ''
edate3 = ''

In [None]:
# now arrange those start and end dates into two lists
bdates = []
edates = []

In [None]:
# copy the parameter information that isn't changing
param = '44201'
state = '55'
county = '025'

In [None]:
variable = 'sample_measurement' # this is the variable name for the y-data in the DataFrame

In [None]:
# make labels for each of the three time periods
labels = []

In [None]:
# fill in the for loop below

# is there a non-repeating item of plotting that goes here?

for x in range(3): # why is the range 3?
    bdate = bdates[x]
    edate = # what goes here?
    url = make_url(param, bdate, edate, state, county)
    df = make_df(url)
    ydata = # what goes here?
    plt.plot(ydata, label=#how do you ID the label from the labels list? )

# what are the non-repeating items that go here?

Hopefully this exercise has helped you to understand how to make a plot within a loop. You should not be able to identify which elements go in the loop and which elements stay outside the loop!

### 3: Making multi-year AQS plots

Let's say you want to analyze summer (June-August) SO2 levels in Dane county, WI over the 4 year period from 2017 through 2020. How would you go about this process?

**Note:** when we are working with larger data sets, some loops and functions may take a while to run. You can tell your code is running by looking for the hour-glass symbol on the tab at the top of the browser.

**a)** Start with the year 2017: make a url to access summertime SO2 data for Dane County, WI in the year 2017. Well define summer as June 01 to August 31. 

Remember, we can access the codes for our variables at: https://www.epa.gov/aqs/aqs-code-list

**b)** Now that you can create one url, let's make a loop to do all the urls. Create a for loop that would find the urls for all 4 years, 2017-2020. Then append these urls to a list.

**Hint:** you will have to find a way to change only the start date and end date string for each iteration so that it contains the correct year. Also, don't forget to create an empty list so that you can append to it!

**c)** Next, suppose you want a plot a bar graph of the average summer SO2 value for each year. Let's start by finding the average SO2 value for one of our links. Use the year 2017. Turn the url into a dataframe using our make_df function, then find the average SO2 concentration for 2017.

**Hint:** since our urls are in a list, we will need to index the list to use just one url. You can say "url_list[0]" to get the first url (2017 url).

**d)** Now, create a for loop that goes through each url, repeating these steps for each year. Append the average for each year to a new list.

**e)** Plot these averages as a bar chart. 

**Hint:** Think through what your axis will be to figure out what items/list you need in order to plot. Your x-axis should be the year, and your y-axis should be the sample measurement averages for each year. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

**Note:** There are a lot of ways to approach these problems. As long as you get the expected result (and test your result if possible), then your code works! However, keep in mind that **it is best practice in coding to try to make your code as simple, clear, and efficient as possible to achieve the desired result.** So, always ask yourself: even if I got the desired result, was my code efficient? What changes could I make to improve it?

# Possible answers:

### Part 3

In [None]:
def area_of_circle(radius):   # function name that defines its purpose and input = radius
    pi = 3.14159              # defining pi, you could also just input the number in the area equation
    area = pi * radius**2     # equation for area
    return area               # return area

In [None]:
area_of_circle(2)             # testing the equation

In [None]:
for radius in range(10):
    print(area_of_circle(radius))

**Note**: you may have noticed that our range of radii to 10 actually starts at 0 and ends at 9. This is a quirk of Python- counting starts with 0. If you wanted to find the area of a circle with radius 10, you should add 1 to your range.

### Exercises

#### A brick house

In [None]:
# 1a
def brick_weight(number):
    weight = number*5
    print('Weight = %.1f lbs' %weight)
    return weight

In [None]:
# 1b
def brick_volume(volume):
    vol_brick = 8*2.5*4
    num_bricks = volume/vol_brick
    print('Number of bricks: %0.f' %num_bricks)
    return num_bricks

In [None]:
# 1c
# first convert volume of walls from feet to inches
length = 20*12 # 12 inches in a foot
height = 20*12 # 12 inches in a foot
width = 4      # already in inches

# then calculate volume of one wall
wall = length*height*width
# total volume of all four walls
total = wall*4

# now plug this volume into the volume function
bricks = brick_volume(total)

# and then convert from bricks to weight
weight = brick_weight(bricks)


In [None]:
# 1d
def house_volume(length,heigth,width):
    length_in = length*12 # convert feet to inches
    height_in = height*12 # convert feet to inches
    wall = length_in*height_in*width # assume width in inches
    total = wall*4
    return total

In [None]:
for x in range(20,110,10):
    volume = house_volume(x,x,4)
    bricks = brick_volume(volume)
    weight = brick_weight(bricks)
    print('Weight of house of size %.0f ft x %.0f ft: %.0f lbs' % (x,x,weight))

#### Plotting in a loop

In [None]:
# remember to import all the packages you'll need
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
# 2a
param = '44201'
bdate = '20190501'# start time here
edate = '20190531'# end time here
state = '55'
county = '025' # this is the county code for Dane County

In [None]:
url = make_url(param, bdate, edate, state, county)
df = make_df(url)

In [None]:
# look at the data and the column labels. Which column do you want to use for y-data?
print(df.columns)
df

In [None]:
# using the appropriate y data from your DataFrame, make a line plot displaying ozone versus time.
ydata = df['sample_measurement']

fig = plt.figure(figsize=(10,8)) # you can change the figure size if you want
plt.plot(ydata,label='Ozone in 2019')
plt.legend(fontsize=12)
plt.xlabel('Sample',fontsize=12)
plt.ylabel('Ozone (ppm)',fontsize=12)
plt.plot()

In [None]:
# 2b
# don't repeat
fig = plt.figure(figsize=(10,8))
plt.legend(fontsize=12)
plt.xlabel('Sample',fontsize=12)
plt.ylabel('Ozone (ppm)',fontsize=12)
plt.plot()

In [None]:
# repeat
plt.plot(ydata,label='Ozone in 2019')

In [None]:
# 2c
bdate1 = '20110501'
edate1 = '20110531'
bdate2 = '20120501'
edate2 = '20120531'
bdate3 = '20130501'
edate3 = '20130531'

In [None]:
# now arrange those start and end dates into two lists
bdates = [bdate1,bdate2,bdate3]
edates = [edate1,edate2,edate3]

In [None]:
# copy the parameter information that isn't changing
param = '44201'
state = '55'
county = '025'

In [None]:
variable = 'sample_measurement' # this is the variable name for the y-data in the DataFrame

In [None]:
# make labels for each of the three time periods
labels = ['2011','2012','2013']

In [None]:
fig = plt.figure(figsize=(10,8))
for x in range(3): # why is the range 3?
    bdate = bdates[x]
    edate = edates[x]
    url = make_url(param, bdate, edate, state, county)
    df = make_df(url)
    ydata = df[variable]
    plt.plot(ydata, label=labels[x])
plt.legend(fontsize=12)
plt.xlabel('Sample',fontsize=12)
plt.ylabel('Ozone (ppm)',fontsize=12)
plt.plot()

#### Making multi-year AQS plots

In [None]:
# Always create your functions first!
def make_url(param, bdate, edate, state, county): 
    base = "https://aqs.epa.gov/data/api/sampleData/byCounty?email=youremail@wisc.edu&key=yourkey" 
    var = "&param=" + param + "&bdate=" + bdate + "&edate=" + edate + "&state=" + state + "&county=" + county
    url = base + var
    return url

def make_df(url):
    r = requests.get(url)
    contents = json.loads(r.text)
    data = contents["Data"]
    data_df = pd.DataFrame(data)
    return data_df

In [None]:
# 3a
# create the inputs for your function
param = "42401"
bdate = "20170601"
edate = "20170831"
state = "55"
county = "025"

url = make_url(param, bdate, edate, state, county) # plug values in
print(url) # see url

In [None]:
# 3b
url_list = []                   # create empty list

for i in range(2017,2020+1):    # don't forget to add one to your range since we want to go through 2020
    param = "42401"             # this will stay the same for each url
    state = "55"                # this will stay the same for each url
    county = "025"              # this will stay the same for each url
    bdate = str(i) + "0601"     # add each year in our range as a string plus the rest of bdate string
    edate = str(i) +"0831"      # add each year in our range as a string plus the rest of edate string
    url = make_url(param, bdate, edate, state, county)  # call our make url function and define as the url
    url_list.append(url)                                # append to our list of urls

url_list

In [None]:
# 3c
url_2017 = url_list[0]
df_2017 = make_df(url_2017)
print(df_2017)
print(df_2017["sample_measurement"].mean())

In [None]:
# 3d
averages_list = []

for url in url_list:
    df = make_df(url)
    average = df["sample_measurement"].mean()
    averages_list.append(average)

averages_list

In [None]:
# 3e

year_list = [2017,2018,2019,2020] # If you had a longer time range, this would not be efficient. 
# You could instead iterate through the range of years and add each year to the list. Or use pd.date_range()
                                
x = year_list                            # make year_list the x values
y = averages_list                        # make averages_list the y values
plt.xlabel("Year", fontsize = 14)        # x-label
plt.ylabel("SO2 (ppb)",fontsize = 14)    # y-label
plt.title("Average summer SO2 in Dane County from 2017-2020", fontsize = 18) # title
plt.xticks(np.arange(2017,2021, step = 1),fontsize = 12) # this line labels the x-ticks correclty  
plt.rcParams["figure.figsize"] = (8,6)   # this line changes the plot size

plt.bar(x,y)                             # make the plot!