# `for` Loops and Functions

## UBC MDS Extended Learning

### November 14

In [1]:
import numpy as np
import pandas as pd

## `for` loops

In [2]:
a_list = [1, 4, 5, 10]
another_list = [3,6,11]

In [3]:
for number in a_list:
    print(number)

1
4
5
10


In [4]:
for number in a_list:
    print(number)

1
4
5
10


The role of the word `number` is to be a placeholder or a temporary variable. We can actually call number again and see that it will have the last value saved.

In [5]:
number

10

But it doesn't have to be the word `number`; it can be anything you want it to be 😊

In [6]:
for number_a in a_list:
    for number_b in another_list:
        print(number_a, number_b, number_a + number_b)

1 3 4
1 6 7
1 11 12
4 3 7
4 6 10
4 11 15
5 3 8
5 6 11
5 11 16
10 3 13
10 6 16
10 11 21


Just remember: make it something meaningful.

All variables should be "self explanatory" when writing real code.

## `for` loops in a `dictionary`

Let's remember the structure of a dictionary:

```python

my_pantry = {'pasta': 3, 'garlic': 4,'sauce': 2,
             'basil': 2, 'salt': 3, 'olive oil': 3,
             'rice': 3, 'bread': 3}
```

Who are the `keys` and `values`?

In [7]:
my_pantry = {'pasta': 3, 'garlic': 4,'sauce': 2,
             'basil': 2, 'salt': 3, 'olive oil': 3,
             'rice': 3, 'bread': 3}

In [8]:
my_pantry.keys()

dict_keys(['pasta', 'garlic', 'sauce', 'basil', 'salt', 'olive oil', 'rice', 'bread'])

In [9]:
my_pantry.values()

dict_values([3, 4, 2, 2, 3, 3, 3, 3])

In [10]:
my_pantry.items()

dict_items([('pasta', 3), ('garlic', 4), ('sauce', 2), ('basil', 2), ('salt', 3), ('olive oil', 3), ('rice', 3), ('bread', 3)])

How can we iterate over them?

In [11]:
for key in my_pantry:
    print(key)

pasta
garlic
sauce
basil
salt
olive oil
rice
bread


In [12]:
for key in my_pantry.keys():
    print(key)

pasta
garlic
sauce
basil
salt
olive oil
rice
bread


In [13]:
for value in my_pantry.values():
    print(value)

3
4
2
2
3
3
3
3


In [14]:
for key in my_pantry:
    print(my_pantry[key], key)

3 pasta
4 garlic
2 sauce
2 basil
3 salt
3 olive oil
3 rice
3 bread


In [15]:
key

'bread'

In [16]:
for key, value in my_pantry.items():
    print(key, value)

pasta 3
garlic 4
sauce 2
basil 2
salt 3
olive oil 3
rice 3
bread 3


Which is more convenient if you are comparing two dictionaries that have the same keys?

In [17]:
my_list = {'pasta': 2, 'garlic': 1, 'sauce':0,
             'basil': 1, 'salt': 5, 'olive oil': 4,
             'rice': 1, 'bread': 5}

- We want to compare the values.
- However, we want to compare the values BASED on each item.
- I want to compare `pasta` against `pasta` and `olive oil` against `olive oil`.

In [18]:
my_pantry.items()

dict_items([('pasta', 3), ('garlic', 4), ('sauce', 2), ('basil', 2), ('salt', 3), ('olive oil', 3), ('rice', 3), ('bread', 3)])

In [19]:
my_list.items()

dict_items([('pasta', 2), ('garlic', 1), ('sauce', 0), ('basil', 1), ('salt', 5), ('olive oil', 4), ('rice', 1), ('bread', 5)])

In [20]:
for ingredient in my_pantry:
    print(my_pantry[ingredient] - my_list[ingredient])

1
3
2
1
-2
-1
2
-2


> What happens if I remove 'sauce' from my_list, since it is a 0

In [21]:
my_list = {'pasta': 2, 'garlic': 1,
             'basil': 1, 'salt': 5, 'olive oil': 4,
             'rice': 1, 'bread': 5}

In [22]:
my_pantry.items()

dict_items([('pasta', 3), ('garlic', 4), ('sauce', 2), ('basil', 2), ('salt', 3), ('olive oil', 3), ('rice', 3), ('bread', 3)])

In [23]:
my_list.items()

dict_items([('pasta', 2), ('garlic', 1), ('basil', 1), ('salt', 5), ('olive oil', 4), ('rice', 1), ('bread', 5)])

In [24]:
for ingredient in my_pantry:
    print(my_pantry[ingredient] - my_list[ingredient])

1
3


KeyError: 'sauce'

## `if` statements

`if` statements help us check conditions. 
Here we can state, what happens if we don't have the key in our second dictionary.

In [25]:
my_pantry

{'pasta': 3,
 'garlic': 4,
 'sauce': 2,
 'basil': 2,
 'salt': 3,
 'olive oil': 3,
 'rice': 3,
 'bread': 3}

In [26]:
my_list

{'pasta': 2,
 'garlic': 1,
 'basil': 1,
 'salt': 5,
 'olive oil': 4,
 'rice': 1,
 'bread': 5}

In [27]:
for ingredient in my_pantry:
    if ingredient in my_list:
        print(my_pantry[ingredient] - my_list[ingredient])
    else:
        print("No ingredient")

1
3
No ingredient
1
-2
-1
2
-2


### Creating a new dictionary where we can save the values:

In [28]:
shopping_list = dict()

for ingredient in my_pantry:
    if ingredient in my_list:
        shopping_list[ingredient] = my_pantry[ingredient] - my_list[ingredient]
    else:
        print("No ingredient")

No ingredient


In [29]:
shopping_list

{'pasta': 1,
 'garlic': 3,
 'basil': 1,
 'salt': -2,
 'olive oil': -1,
 'rice': 2,
 'bread': -2}

### Exercise:

In your assignment, you want to write a dictionary like the one we did before. The trick is, you don't want negative numbers. If there is enough in your pantry, you shouldn't have to add it to the shopping list.

How would you modify the if statements to capture a dictionary only with the positive values.

Think about it and if you have doubts let's discuss next session

## Functions

A function is a relationship or mapping between one or more inputs and a set of outputs.

In mathematics, we represent a function typically like this:

> $z = f(x,y)$

> $ y = mx + b$

* $z$ is the output.
* $x$ and $y$ are the inputs.
* $f()$ represents "what happens" in the function.

For example:
> $z = log(x)$

When $x = 3$, then:  
> $z = log(3)$

In [30]:
# z = log(3)

def log_function(number, word='hello', word_2='world'):
    z = np.log(number)
    print(word, word_2)
    print(f"The log of {number} is {z}.")

In [31]:
log_function(3, word_2 = "people")

hello people
The log of 3 is 1.0986122886681098.


In [32]:
number

10

In [33]:
log_function(3)

hello world
The log of 3 is 1.0986122886681098.


What is the error here?

> Try saving log_function(5) and store it in a variable named `log_five`. 

In [34]:
log_five = log_function(5)

hello world
The log of 5 is 1.6094379124341003.


In [35]:
log_five


Here is when we realize we need our `return` statement. This is the only thing that will allow us to "save" the output that we want.

**NOTE** - **Do not** `return` a `print` statement as it will return a null object.

In [36]:
def log_function(happy_number):
    
    return np.log(happy_number)

log_five = log_function(5)

In [37]:
log_five

1.6094379124341003

A return statement in a Python function serves two purposes:  
* It immediately terminates the function and passes execution control back to the caller.  
* It provides a mechanism by which the function can pass data back to the caller.  

In general:
* $f$ is a function that operates on inputs.
* Functions map the inputs to outputs.
* Programming functions are more generalized and versatile than mathematical functions.


When coding, a function is a self-contained block of code that encapsulates a specific task or group of tasks.

**Example:**

In [38]:
a = ['foo', 'bar', 'baz', 'qux']
a

['foo', 'bar', 'baz', 'qux']

In [39]:
len(a)

4

A built-in functions performs a specific task!

The code that accomplishes this task is defined **somewhere** - but you don’t need to know where. You don't even need to know how the code works.

You need to understand the function’s interface: 
> * What arguments (if any) it takes 
> * What values (if any) it returns

Then you call the function and pass the appropriate arguments.

For DS, you will not just use built-in functions - You will write our own functions!!

When you define your own Python function, it works just the same. From somewhere in your code, you’ll call your Python function and program execution will transfer to the body of code that makes up the function. 

BUT this time, since you wrote the code, and you know where it lives, you can see it!
Why bother defining functions? There are several very good reasons. Let’s go over a few now:

### Abstraction and Reusability
Suppose you write some code that does something useful task. And it is a task that you do several times.
You could "Copy-paste" but…
Later on, you’ll probably modify the code, or maybe you find a bug or you need to update it…
If you copied-pasted, you’ll need to make the necessary changes in every location.


Instead, use a function!! The abstraction of functionality into a function definition is an example of the **DRY** Principle of software development. This is arguably the strongest motivation for using functions.

### Modularity
Functions allow complex processes to be broken up into smaller steps.
Imagine, for example, that you have a program that reads in a file, processes the file contents, and then writes an output file. Your code could look like this:

```
# Main program
# Code to read file in
<statement>
<statement>
<statement>
<statement>
# Code to process file
<statement>
<statement>
<statement>
<statement>
# Code to write file out
<statement>
<statement>
<statement>
<statement>
```

Alternatively, you could structure the code more like the following:

```
# Main program
read_file()
process_file()
write_file()
```

PS. Here, you do have three scripts where you have defined the functions.  For example:
```
def read_file():
# Code to read file in
<statement>
<statement>
<statement>
<statement>
```

### Function Calls and Definition 

A programming function is written as:

```
def <function_name>([<parameters>]):
    '''
    Docstrings
    '''
    <statement(s)>
    <return>
```

| Component | Meaning|
|----| ----|
|def | Keyword that informs Python a function is being defined|
|<function_name> | A valid Python identifier that names the function |
|<parameter(s)> | An optional, comma-separated list of arguments that can be passed to the function |
|:| Punctuation that denotes the end of the function header |
|'''Docstrings'''| Documentation regarding the function |
| <statement(s)> | A block of valid Python statements |
| return | What the output is expected to be |

## 'for' loops in `groupby` objects

In [40]:
import pandas as pd

raw = {'employee_id': [1873, 4913, 4801, 4540, 3581,
                   4534, 1934, 4944, 1983, 1266], 
           'name': ['Josh', 'Laura', 'Hayley', 
                    'Mike', 'Tiffany', 'Anurag',
                    'Rocio', 'Eric', 'Monique',
                    'Emma'], 
            'neighbourhood': ['Sunset','West end','Kitsilano', 'Sunset', 
                              'Arbutus-ridge','Arbutus-ridge', 'Kitsilano', 
                              'West end','Kitsilano', 'Arbutus-ridge'],
            'type': ['full-time', 'part-time', 'part-time', 'full-time', 'part-time',
                     'full-time', 'full-time', 'part-time', 'part-time', 'full-time'],
            'hourly_rate': [25.0, 27.0, 30.0, 25.5, 32.0,
                         26.5, 27.0, 28.0, 25.5, 23.0]}

data = pd.DataFrame.from_dict(raw)

In [41]:
data

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
1,4913,Laura,West end,part-time,27.0
2,4801,Hayley,Kitsilano,part-time,30.0
3,4540,Mike,Sunset,full-time,25.5
4,3581,Tiffany,Arbutus-ridge,part-time,32.0
5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,1934,Rocio,Kitsilano,full-time,27.0
7,4944,Eric,West end,part-time,28.0
8,1983,Monique,Kitsilano,part-time,25.5
9,1266,Emma,Arbutus-ridge,full-time,23.0


In [42]:
grouped_data = data.groupby('type')
grouped_data

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10d1f23a0>

In [43]:
for group, rows in grouped_data:
    print(group, rows)

full-time    employee_id    name  neighbourhood       type  hourly_rate
0         1873    Josh         Sunset  full-time         25.0
3         4540    Mike         Sunset  full-time         25.5
5         4534  Anurag  Arbutus-ridge  full-time         26.5
6         1934   Rocio      Kitsilano  full-time         27.0
9         1266    Emma  Arbutus-ridge  full-time         23.0
part-time    employee_id     name  neighbourhood       type  hourly_rate
1         4913    Laura       West end  part-time         27.0
2         4801   Hayley      Kitsilano  part-time         30.0
4         3581  Tiffany  Arbutus-ridge  part-time         32.0
7         4944     Eric       West end  part-time         28.0
8         1983  Monique      Kitsilano  part-time         25.5


In [44]:
grouped_data.get_group('full-time')

Unnamed: 0,employee_id,name,neighbourhood,type,hourly_rate
0,1873,Josh,Sunset,full-time,25.0
3,4540,Mike,Sunset,full-time,25.5
5,4534,Anurag,Arbutus-ridge,full-time,26.5
6,1934,Rocio,Kitsilano,full-time,27.0
9,1266,Emma,Arbutus-ridge,full-time,23.0
