<a href="https://colab.research.google.com/github/vanderbilt-data-science/p4ai-essentials/blob/main/4_iteration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iteration
> Rounding out our base Python knowledge with iteration

In this lesson, you'll learn different ways of communicating iteration to Python using lists and dictionaries. We've already seen some examples of iteration, where we need to cycle through a collection data structure to apply statements to one or more of the elements of the collection (e.g. dictionaries, lists).

There are 2 primary ways that you see iteration in Python:
* `for` loops
* `comprehensions` (list, dictionary, generator)

## `for` loops
We've already seen some examples of `for` loops when we were learning about lists and dictionaries.

We said that our `for` goes through cyclical iterations, updating the index to process each element.

Our syntax was as follows:
```
for dummy_name in collection:
  ## indented code block steps to take
```

Let's explore this. We'll build on an expanded version of our dummy dog dataset.

In [None]:
#@markdown We'll just hide the work that we're going to do to generate the dataset.
#@markdown It's a simple, inelegant way of generating just a bit more data.
#@markdown Make sure to execute this cell to have access to the dog_data dataset.

import numpy as np
import pandas as pd

#original data
weight_kgs = [25.0, 20.22, 17.83, 10.22, 8.05]
height_cm = [68.0, 57.99, 45.21, 36.2, 10.22]
neck_circ_cm = [45.2, 50.35, 55.2, 40.88, 5.06]
back_length_cm = [63.2, 50.25, 43.8, 50.1, 12.5]
chest_circ_cm = [78.2, 86.92, 53.9, 71.2, 25.5]
breed = ['Afghan Hound', 'Airedale Terrier', 'Staffordshire Terrier', 'Australian Shepherd', 'Toy Poodle']

dog_data_sm = {'weight_kgs': weight_kgs,
            'height_cm': height_cm,
            'neck_circ_cm': neck_circ_cm,
            'back_length_cm': back_length_cm,
            'chest_circ_cm' : chest_circ_cm,
            'breed': breed}

#extend to 10 elements
dog_data = {key : value + (np.array(value) + 2*np.random.rand(len(value))).tolist()
            for key, value in dog_data_sm.items() if key!='breed'}

#fix breed to be twice as long
dog_data['breed'] = breed + breed

#for visualization purposes
pd.DataFrame(dog_data)

### Example 1: Converting to weight_lbs
Let's use a for loop to create a new list. The new list should be the conversion of weight_kgs to weight_lbs (where 1 kg = 2.2lbs)

In [None]:
# using a for loop:


weight_lbs

### Example 2: A basic linear regression
Let's make a REALLY terrible predictor. We'll try to predict the weight based on the height, circumferences, and back lengths. What is this operation?

$$ \hat{dog weight} = w_0 + (w_1*height) + (w_2 * neck\ circumference) + (w_3 * back\ length) + (w_4 * chest\ circumference) $$

In [None]:
# let's just start by making a list of random weights
w = [1, 0.3, -0.4, 2.2, 0.5]

In [None]:
# let's do the calculation for a single "row" or "dog"

dogweight_pred

In [None]:
# let's use a for loop to create this prediction for each of the dogs


# iterate through all elements based on indices


#see results
dogweight_preds

Yay, us! With great effort, we have created a terrible prediction model!!

## List Comprehensions
Another very compact way to represent for loops is through list comprehensions. They're great if you:
* Essentially have one function to apply to a list of elements
* Want to do binary conditional execution on elements of a list
* Want to perform filtering of elements (reduce the size of the list based on some condition)

The difficulty of list and dictionary comprehensions is the syntax because of the concise expression of the for loop. Let's take a look, but it offers wonderful functionality. It streamlines the creation of new lists based on old lists. Let's look at a brief comparison.

<center>
<img src="https://github.com/vanderbilt-data-science/p4ai-essentials/blob/main/img/iteration_comparison.png?raw=true" width="800">
</center>

### Example 1: Converting to weight_lbs
That's right, we're doing the same exact example again, just now using list comprehensions. The new list should be the conversion of weight_kgs to weight_lbs (where 1 kg = 2.2lbs).

For reference, our original answer was:
```
# using a for loop:
weight_lbs = []
for weight in weight_kgs:
    weight_lbs.append(round(weight * 2.2, 3))

weight_lbs
```

In [None]:
# using a list comprehension

weight_lbs

### Example 2: A basic linear regression
Let's make a REALLY terrible predictor. We'll try to predict the weight based on the height, circumferences, and back lengths. What is this operation?

$$ \hat{dog weight} = w_0 + (w_1*height) + (w_2 * neck\ circumference) + (w_3 * back\ length) + (w_4 * chest\ circumference) $$

For reference, the major part of the operation was this:
```
dogweight_preds[ind] = (w[0]+
                        w[1] * dog_data['height_cm'][ind] +
                        w[2] * dog_data['neck_circ_cm'][ind] +
                        w[3] * dog_data['back_length_cm'][ind] + 
                        w[4] * dog_data['chest_circ_cm'][ind])
```

In [None]:
# let's just start by making a list of random weights
w = [1, 0.3, -0.4, 2.2, 0.5]

In [None]:
# we'll need to start by making this into a function


In [None]:
# let's make our list comprehension!

dogweight_preds

Yay, us! With great effort, we have created a terrible prediction model, again!!

In [None]:
#@title Try it Yourself: Normalizing data
#@markdown Let's take about 10 minutes and practice with for loops vs list comprehensions. Your goal here is to normalize the data
#@markdown for the `weight_kgs` column.
#@markdown You can use np.std to calculate the standard deviation. Calculate the mean on your own, given that you can use
#@markdown the python `sum` function to calculate the sum of a list.
#@markdown The calculation for normalization that we will use is:
#@markdown $$ /frac{value - np.std(column)}{mean(column)}
#@markdown
#@markdown 1. Make the above calculation using for loops
#@markdown 2. Make the above calculation using list comprehensions.

#calculate mean
kgs_sum = sum(weight_kgs)
kgs_mean = sum(weight_kgs)/len(weight_kgs)

#1 for loops
kgs_norm_for = [0]*len(weight_kgs)
for ind, value in enumerate(weight_kgs):
    kgs_norm_for[ind] = (value - kgs_mean)/np.std(weight_kgs)

print(kgs_norm_for)

#2 list comprehension
kgs_norm_lc = [(value - kgs_mean)/np.std(weight_kgs) for value in weight_kgs]
print(kgs_norm_lc)


In [None]:
#Try it yourself solution
#calculate mean


I don't know about you, but this was oddly difficult given that this is base functionality that we use all the time when modeling. I wonder what can help us?