<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Adapted by Sarah Connell, Dipa Desai, Avery Blankenship, Sara Morrell, and Emre Tapan from a notebook created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/). See [here](https://ithaka.github.io/tdm-notebooks/book/all-notebooks.html) for the original version. Some contents were adapted from teaching notebooks created by Laura Nelson, University of British Columbia, and from [Python for Everybody](https://www.py4e.com/). Warm thanks to Kate Kryder, Data Analysis & Visualization Specialist at Northeastern University, for helping to develop these notebooks.<br />
___

### Functions

We have used several Python [functions](https://constellate.org/docs/key-terms/#function) already, including `print()` and `type()`. As a reminder, functions are structured such that the function name is always followed by a set of parentheses `()`. Inside the parentheses are any [parameters](https://constellate.org/docs/key-terms/#parameter) that serve as placeholders for [arguments](https://constellate.org/docs/key-terms/#argument) that the function will operate on when it is run. We say that arguments are **passed** to the function when it is run.

Depending on the function (and your goals for using it), a function may accept no arguments, a single argument, or many arguments.

When we introduced functions, we noted that there are three kinds:
* Functions built into Python
* Functions others have written that you can import
* Functions you write yourself

As a quick refresher, run the code cell below to use the built-in `type()` function. What is the argument of this function?

In [None]:
# Code cell 1
type(26.2)

### Libraries and Modules
So far, we've focused mostly on using built-in functions. Now, let's talk about importing others' functions and writing them ourselves.

While Python comes with many functions, there are thousands more that others have written. Adding them all to Python would create mass confusion, since many people could use the same name for functions that do different things. The solution then is that functions are stored in [modules](https://constellate.org/docs/key-terms/#module) that can be **imported** for use. A module is a Python file (extension ".py") that contains the definitions for the functions written in Python. These modules can then be collected into even larger groups called [packages](https://constellate.org/docs/key-terms/#package) and [libraries](https://constellate.org/docs/key-terms/#library). Depending on how many functions you need for the program you are writing, you may import a single module, a package of modules, or a whole library.

The general form of importing a module is:
`import module_name`

To access one of the functions in the module, you have to specify the name of the module and the name of the function, separated by a dot (also known as a period). This format is called **dot notation**.

Python has many useful modules, packages, and libraries that you can import. Sometimes, these modules come pre-installed, and sometimes, these modules must be installed before you can use them. In this class, we will be working with the pre-installed `math` module, which
provides most of the familiar mathematical functions. Before we can use the module, we have to import it:

In [None]:
# Code cell 2
import math

Now that we have imported `math`, we can use it following the dot notation format. For example, the `sqrt` function from the `math` module will calculate the square root of its argument:

In [None]:
# Code cell 3
math.sqrt(16)

*Hint: `sqrt()` will be useful for writing the fifth function in the Dunkin 3 assignment.*

Another function in the `math` module is `ceil()` which rounds a number up to its nearest integer. `ceil()` gets its name from "ceiling" since it is finding the "ceiling" or the nearest integer greater than or equal to the value you pass to it as an argument. In the code block below, test out `ceil()` with a float number of your choice. While this function is not used in the assignments, it can be useful in other contexts.

In [None]:
# Code cell 4
# Use the `ceil()` function from the `math` module here


The `math` module, like many modules in Python, comes with a wide range of very useful functions. You can find more about the functions in the `math` module [here](https://docs.python.org/3/library/math.html).

##Writing Functions

Now we will take a look at the structure of defining your own functions, how to repeat a task with iterations, and how to use iterations within functions. Defining your own functions allows you to perform a variety of calculations that are specific to your dataset, and allows you the option to generalize code that you may find yourself likely to repeat or reuse. For more information on thinking through how to write functions, please reference the [function handout](https://github.com/NULabNortheastern/digitalassignmentshowcase/blob/15611ec96487f35a8dbfb6145505c038f1aa93c6/handouts/coding_quantitative/Handout_Python%20Functions.pdf) in the class materials.

*Hint: Writing functions will be necessary for all of the Dunkin assignments.*

In the previous examples, we **called** a function that was already written. To call our own functions, we need to define our function first with a **function definition statement** followed by a [code block](https://constellate.org/docs/key-terms/#code-block):

`def my_function():` <br />
&nbsp; &nbsp; &nbsp; &nbsp;`do this task`
    

After the function is defined, we can **call** on it whenever we need by simply executing the function like so:

`my_function()`

Below is an example function definition:


In [None]:
# Code cell 5
# Create a function that prints lyrics.
# Note that this just defines the function; we haven't run the function yet.
# Note also that this is the obligatory Monty Python reference for our Python tutorial.
def print_lyrics():
    print("I'm a lumberjack, and I'm okay.")
    print('I sleep all night and I work all day.')

`def` is a keyword that indicates to the compiler that you are defining a function. When the compiler sees the `def` keyword, it will understand that everything indented under that line is meant to belong to the function definition. For this reason, it is very important to remain consistent with your indentation as you switch between function definitions and regular code.

 In the example provided in the cell above, the name of the function is given immediately after the `def` keyword: `print_lyrics`. The rules for function names are the same as for variable names: letters, numbers and some punctuation marks are legal, but the first character can't be a number. You can't use a keyword as the name of a function, and you should avoid having a variable and a function with the same name. As with variable names, you should try to give functions meaningful names.

The empty parentheses after the name indicate that this function has no parameters and thus when called, no arguments will be passed to it. In the next example, we will build functions with parameters which take arguments as their inputs.

The first line of the function definition, the line with the `def` keyword, is called the **header**; the rest is called the **body**. The header has to end with a colon and the body has to be indented five spaces from the margin. Subsequent code within the function defintion should be indented as though the margin is now indented five spaces. The body can contain any number of statements.

The syntax for calling the new function is the same as for built-in functions:

In [None]:
# Code cell 6
# Run our new function.
print_lyrics()

## Parameters and Arguments
As we have already seen, some functions require arguments to be passed when the function is called.

When we write a function definition, we can define a **parameter** to work with the function. We use the word "parameter" to describe the variable in parentheses within a function definition:

`def my_function(input_variable):` <br />
&nbsp; &nbsp; &nbsp; &nbsp;`do this task`

In the pseudo-code above, `input_variable` is a parameter because it is being used within the context of a function *definition*. When we run our function, the actual variable or value we pass to the function is called an **argument**.

To summarize, a **parameter** is the variable used within the function definition. Parameters act as stand-ins for what will later become arguments when the function is actually used. Parameters are typically invoked in a function definition in order to allow functions to manipulate data. Without using parameters, functions can be limited in the ways they modify data. Parameters outline what kinds of data functions accept as input as well as what to do with it. Arguments, on the other hand, are the values or variables that are passed to the function when the function is actually called. Where parameters tell functions what to do with data, arguments are the actual data itself.

The cell below gives an example of a user-defined function that accepts an argument:

In [None]:
# Code cell 7
# In this function definition, the variable `p` is the parameter.
def phrase_length(p):  # This is the function header
    print("Phrase:",p)  # This is the first statement in the function body
    print("Length:",len(p))  # This is the second statement in the function body

In [None]:
# Code cell 8
# In this example, we pass the string "I am a phrase" to our new function as the argument.
phrase_length('I am a phrase')

In [None]:
# Code cell 9
# In this block, try running our new function on a few different strings.
# Uncomment the line below and fill in a string phrase in place of the variable `p`.

#phrase_length(p)

The `phrase_length` function sets the parameter `p` equal to the argument that is passed to the function when it is called. The function accepts any value or variable that is also accepted by the built-in function `len()`.

For example, we could create a variable called `my_string` and then calculate its length with our new function.

In [None]:
# Code cell 10
my_string = "I am a string!"
phrase_length(my_string)

In the example above, we chose `p` for the name of the parameter and `phrase_length` for the name of the function, but we could have named the function and parameter whatever we wanted. For example, we could have decided to name the function `p_l`. Even poorly named functions will complete the tasks outlined in their definitions. Like with naming variables, giving function and parameters meaningful names is good programming practice and helps to ensure that functions remain useful.

In [None]:
# Code cell 11
def bicycle_awesomeness(bicycle):  # Remember that function and variable naming should be intuitive. Here the variable bicycle stands in for variable p.
    print("Phrase:", bicycle)  # We are asking Python to print the word 'Phrase', colon sign, and then the parameter bicycle.
    print("Length:", len(bicycle))  # We are asking Python to print the word 'Length', colon sign, and then the length of, or number of characters in, the input parameter.
bicycle_awesomeness('I am a phrase')  # We are specifying the function argument is the string text 'I am a phrase' and run the function.

Remember to use good judgment and think about how your future self, as well as others, will need to interact with your code when you are naming things.

### Local and Global Scope

Functions make maintaining code easier by avoiding duplication. One of the most dangerous areas for duplication is variable names. As programming projects become larger, the possibility that a variable will be re-used goes up. This can cause weird errors in our programs that are hard to track down. Reusing variables can also accidentally lead us to lose data when we don't intend to. We can alleviate the problem of duplicate variable names through the concepts of [local scope](https://constellate.org/docs/key-terms/#local-scope) and [global scope](https://constellate.org/docs/key-terms/#global-scope).

We use the phrase "local scope" to describe what happens within a function. The local scope of a function may contain [local variables](https://constellate.org/docs/key-terms/#local-variable), but once that function has completed, the local variables and their contents are erased. In other words, local variables are variables that are only accessible within the function's code. The benefit of using local variables is that we can manipulate data within the function without it impacting data outside the function unless we want it to.

On the other hand, we can also create [global variables](https://constellate.org/docs/key-terms/#global-variable) that persist at the top-level of the program *and* within the local scope of a function. Global variables are accessible by all of the code.

**To reiterate:** global variables are those created outside of functions; they can be used both within functions and outside of them. Local variables exist only within the context of their functions.

In fact, you're already very familiar with global variables, because you've been using them in this lesson.

That is, as you've already seen, when we initialize a variable within a code cell, we can use that variable in any code block within the notebook. For example, we defined the value of `my_string` several cells up, but we can still print it below.

In [None]:
# Code cell 12
print(my_string)

However, the same is not true for the variables we use within our function definitions. For example, in the cell bellow is a short function definition that prints out how the user's day has been.

In [None]:
# Code cell 13
def my_day():
  day = "fun"
  print("My day has been " + day)

my_day()

But, what happens when we try to use `day` outside of the function?

In [None]:
# Code cell 14
print(day)

It's possible to use the same variable name for both local and global scopes—this is why you need to be very careful when you are naming your variables!

For example, we can create a global variable called `day` and assign it a value of "busy".

In [None]:
# Code cell 15
day = "busy"

Now that we've initialized this global variable, we can use it in our code. What do you think the outcome will be when you run the code block below?

In [None]:
# Code cell 16
print("My day has been", day)

Creating this global variable, however, doesn't change the local variable within our function.

What do you think the outcome will be when you run the code block below? Scroll up to the function definition to as a reminder of how `day` is defined within the `my_day()` function.

In [None]:
# Code cell 17
my_day()

While this might seem a bit confusing, the important thing to keep in mind is that variables you create outside of functions will be **global** and will have the same values throughout your notebook. Variables created inside of functions are **local** and cannot be used outside of those functions. Because local variables only exist within the function, changing a global variable, doesn't impact them. It is possible to make a local variable into a global one, but that is out of scope for this lesson.

### Function Return Values

Whether or not a function takes an argument, it will return a value. If we do not specify that return value at the end of our function, it is automatically set to `None`, a special value that simply means null or nothing. `None` is **not** the same thing as, say, the integer `0`.

We've already seen that some functions will return values that you can do things with, while others might perform some kind of an action but do not produce any result that you can use in other code. For example, the `print()` function just prints its argument and produces no data. Let's see what happens if we try to initialize a variable as the output of the `print()` function:


In [None]:
# Code cell 18
# Initialize a variable by assigning its value to the output from the `print()` function.
a_variable = print("Some string")

In [None]:
# Code cell 19
# See what happens when we try to print the new variable.
print(a_variable)

To make the results from a function accessible outside of the function, we use what's called a `return` statement at the end of our function. The `return` function should be the very last line of code in a function definition

For example, in the code cell below, we define a function called `add_two` that adds two numbers together and returns a result:

In [None]:
# Code cell 20
# This is a function for adding two numbers.
def add_two(a, b):
    added = a + b
    return added  # This function returns the sum of the two numbers.

*Hint: You will need to use `return` statements for the functions in all of the Dunkin assignments.*

In [None]:
# Code cell 21
# We are calling our function and assigning the result (the sum of the two arguments) to the variable result.
result = add_two(2,7)

In [None]:
# Code cell 22
# Print the result from our function.
print(result)

When we tried to print the output from the `print()` function, we got the value `None` since the `print()` function's definition does not have a `return` statement. In contrast, when we print the output from the `add_two()` function we get the sum of the two arguments because that is what the function **returns**.


Note that this function as defined expects two arguments. What do you think will happen if we have only one argument? What about three arguments? Test this out in the code block below:

In [None]:
# Code cell 23
# Try using our add_two function with one or three arguments.


Now let's take a look at how to repeat a task with iterations, and then how to use iterations within functions.

# Iteration

Computers are often used to automate repetitive tasks. Repeating identical or similar tasks without making errors is something that computers do well and people do poorly. Because **iteration** is so common, Python provides several features to make it easier. We are going to focus on just one method of iteration: `for` loops.

## Introduction to `for` Loops
What is a `for` loop? A `for` loop includes a series of instructions, in other words things you want the computer to do, which the computer will continue to perform in a repetitive cycle *for* each item.

There are different types of **loops** in Python that can perform repeated tasks by "looping" back around to the top of the code until some condition has been met. In this class, we will be focusing on the `for` loop, which allows us to loop through a sequence of tasks for each of the characters in a string, or the items in a dictionary, tuple, or list. We will examine these data structures in the next lesson.

The `for` loop will execute beginning at the first line of code immediately after the line where the loop is initiated. The loop will continue to execute sequentially until it reaches the final line in the the sequence. At the end of the sequence, the loop will check if the stop condition has been met before restarting the sequence at the first line, repeating the previous steps. We call each time we execute the body of the loop an **iteration**.

When we write a `for` loop, we initialize the loop with a line that starts with `for`, the condition for the loop, followed by a colon. The sequence we want the loop to iterate over is indented below the loop, much like a function defintiion. When we are done writing the instructions for the `for` loop, we no longer indent the lines. `for` loops look a bit like this:

`for each element in this set:` <br />
&nbsp; &nbsp; &nbsp; &nbsp;`take this action`

When you are working in Python, you need to pay very close attention to indentation, since this an important part of Python's syntax. If the indentation in your code is incorrect—for example, if you are missing the indentation in the second line of the `for` statement as outlined above—your code will fail and you will likely get an error message.

`for` loops can be used with many different data types. Here is an example `for` loop that operates on the strings in a list.


In [None]:
# Code cell 24
friends = ['Tabitha', 'Gregg', 'Shannon']  # Friends is defined as a list of strings.
for friend in friends:  # Iterate through each element in the list of friends.
    print('Happy New Year,', friend)  # Print the string "Happy New Year", followed by a string from the list.
print('Done!')  # When the for loop runs through all the elements in the list, print "Done"

In Python terms:
*  `friends` and `friend` are variables
* `for` and `in` are reserved Python keywords
* The variable `friends` is a **list** of three strings and the `for` loop goes through the list and executes the body once for each of the three strings

`friends` is the list that you defined in the first line of code. `friend` is something new: it is the **iteration variable** for this `for` loop. The iteration variable steps successively through the three strings stored in the `friends` variable.

As with the variables that we have already seen, you control the name of the iteration variables that you use in `for` loops. We could have also named the iteration variable `i` and the code would work just as well, but `friend` is a much more descriptive and meaningful name.

## An Example `for` Loop
Often, it is useful to use `for` loops to apply or run some kind of function on a set of items. For example, the code below will iterate through each item in the list and print the item name, much like the `for` loop above. Don't worry too much about how the list is structured—we will discuss data structures in the next lesson—instead, pay attention to what happens when the `for` loop is executed.

The `for` loop iterates through the items in the list, and executes the `print()` command on each of the items. The **iteration variable** in this example is called `color`. As the loop iterates over the list, `color` is set equal to a subsequent ite, in the `favorite_colors` list and printed using the `print()` function. After the loop has been executed, all the items in `favorite_colors` will have been printed.

In [None]:
# Code cell 25
# This is a loop that iterates through each item in a list and prints the item.
# What do you predict the output will be?
# Try modifying the iteration variable name. What happens?
favorite_colors = ['orange', 'red', 'blue', 'green', 'yellow', 'puce', 'mauve']
for color in favorite_colors:
  print(color)

*Hint: You will need to use a `for` loop for exercise 4 in Dunkin assignment 1.*

## Performing Computation with `for` Loops

Now, let's look at a few more things that can be done with `for` loops.

For loops can also be used to perform iterative calculations on data. For example, the code block below will iterate through items in the `favorite_colors` list using the iteration variable `color`, count each item, and print the total count of items in the list.

In [None]:
# Code cell 26
# A program that counts the items in a list.
favorite_colors = ['orange', 'red', 'blue', 'green', 'yellow', 'puce', 'mauve']  # Initializing the list
count = 0  # Initializing the count variable

for color in favorite_colors:  # This is the for loop header, it initializes the iteration variable color and tells the loop what to iterate through, in this case the list favorite_colors.
    count = count + 1  # For each color in favorite_colors, count is increased by one by setting the count variable equal to itself plus one.
    print(count)  # Print out the count at each iteration.
    print(color)  # Print out the color at each iteration.
print('Count: ', count)  # Printing out the final count.


We can also write the code block above with a shortcut called the **addition assignment** operator, written as `+=`, which adds a value and a variable and then assigns the result to that variable.

Try this out yourself; first run the code below to see that the results are the same, then try adding a few items to the list to see the count increase.

In [None]:
# Code cell 27
# Run this to try using the addition assignment operator.
# Then, try adding a few colors to the list and run again.
favorite_colors = ['orange', 'red', 'blue', 'green', 'yellow', 'puce', 'mauve']
count = 0
for iter_var in favorite_colors:
    count +=1
    print(count)
    print(color)
print('Count: ', count)

*Hint: You will use similar code for the third function in the Dunkin 2 assignment.*

Below is a similar loop that computes the total of a set of numbers:

In [None]:
# Code cell 28
# This is a program to calculate the total of a set of numbers in a list.
favorite_numbers = [3, 41, 12, 9, 74, 15]  # Initialize the list favorite_numbers.
total = 0  # Set the variable total to 0.
for number in favorite_numbers:  # Iterate through each number in the list.
    print(number)  # Print out the number for each iteration.
    total = total + number # Add the number for each iteration to the total. Note that this could also use the addition assignment operator and be written as: total += number.
    print(total)  # Print out the total at the end of each iteration.
print('Total: ', total)  # Print out the total of all of the numbers in the list.

*Hint: You will use similar code for the third function in the Dunkin 2 assignment.*

In this loop we make use of the iteration variable. Instead of simply adding one to the `count` as in the previous loop, we add the actual number (3, 41, 12, etc.) to the running total during each loop iteration. If you think about the variable `total`, it contains the “running total of the values so far”.

* Before the loop starts, `total` is zero because we have not yet seen any values,
* during the loop, `total` is the running total, and
* at the end of the loop, `total` is the overall total of all the values in the list.

**Note:** In practice, there are functions built into Python (`len` and `sum`) that do the same thing as these two examples. But, the examples are useful as a way of understanding how `for` loops work.

##Using `for` Loops in Functions

`for` loops can also be used within functions to iterate a command or calculation over some data. This is particularly useful when you have a large volume of data, or if you have generalized functions to apply to your specific dataset.

For example, the code block below uses iteration to repeat a simple calculation function. The function `multiply_favorite_numbers` multiplies each item in the list by a specified factor.

In [None]:
# Code cell 29
# This is a function to iterate through each number in favorite_numbers, multiply it by 2, and print the new, multiplied list.
favorite_numbers = [3, 41, 12, 9, 74, 15]  # Initialize the favorite_numbers list.
def multiply_favorite_numbers(multiplier):  # Define a function called multiply_favorite_numbers with a multiplier parameter.
  multiplied_list = []  # Initialize an empty list to hold the multiplied numbers.
  for num in favorite_numbers:  # Iterate through the list of favorite numbers.
    multiplied_num = num * multiplier  # Multiply each number in the favorite number list by the multiplier and save it to the variable multiplied_num.
    multiplied_list = multiplied_list + [multiplied_num]  # Place the multiplied number in [] to tell Python it is a list item, then add it to the end of the list of multiplied numbers.
  print(multiplied_list)  # Print multiplied number list.
  return multiplied_list  # Return the multiplied list as the function output.

In [None]:
# Code cell 30
multiply_favorite_numbers(2)  # Call the function to multiply each number in favorite_numbers by 2 (here, 2 is the argument passed into the function) and print the new list.
# The multiplied list may be printed twice due to the print statement in the function.

*Hint: This code may be helpful for writing the first function in the Dunkin 3 assignment.*

This example shows one way to use `for` loops inside of a function, with a relatively small list of data. However, iterations in functions can be coded to perform advanced calculations on large datasets. In the upcoming workshops, we will explore how to iterate calculations in large datasets, and how to use conditional statements in iterations to execute a command on a selected subset of data.

##Practice Exercises
As with the previous lessons, you should first try running the quick exercises in this notebook, and practice making changes and testing their results. Then, try out these exercises and see how your results compare with the sample solutions.

**Exercise One**

The code below initializes a list with class topics. Write a `for` loop that iterates through the list and prints out the string "We will be learning" followed by each item in the list.

In [None]:
# Code cell 31
# Initialize the class_topics list.
class_topics = ["Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", "Advanced Topics"]

# Now write a `for` loop that prints "We will be learning" + each topic.


**Exercise Two**

In [None]:
# Code cell 32
# Fill in your code to calculate the total inches of snow from the snowfall_mass list.
# First initialize the snowfall_mass list.
snowfall_mass = [22.5, 13, 12, 10.2, 18, 19.2, 8.5]

# Next replace the # to set the initial total to 0.
total = #
# Replace the ##s to set the loop to iterate through each snowfall.
for ## in snowfall_mass:
    total = ## + total

print("The total is:", total)

# Solutions
These are some sample solutions, but (as we've already noted) you might have taken a different approach.


In [None]:
# Code cell 33
# Exercise One
# Initialize the class_topics list.
class_topics = ["Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", "Advanced Topics"]

# Print the concatenation (combination) of the string "We will be learning " and each item in the list.
for topic in class_topics:
    print("We will be learning " + topic)

In [None]:
# Code cell 34
# Exercise One
# Another approach is not to concatenate (combine) the string and topics when you print.
# Instead, you can just print one, followed by the other.

# Print the string "We will be learning " followed by each item in the list.
for topic in class_topics:
    print("We will be learning", topic)

*Hint: This code may be helpful for exercise 4 in the Dunkin 1 assignment.*

In [None]:
# Code cell 35
# Exercise Two
# First initialize the snowfall_mass list.
snowfall_mass = [22.5, 13, 12, 10.2, 18, 19.2, 8.5]

total = 0
for snowfall in snowfall_mass:
    total = snowfall + total

print("The total is:", total) # Note that if we wanted to use concatenation here, we would need to change our integers to strings with `str()`.

In [None]:
# Code cell 36
# Exercise Two
# Another approach is to use the addition assignment operator.
total = 0
for snowfall in snowfall_mass:
    total += snowfall
print("The total is:", total)

*Hint: This code may be helpful for writing the third function on the Dunkin 2 assignment.*