<a target="_blank" href="https://colab.research.google.com/github/JLDC/Data-Science-Fundamentals/blob/master/notebooks/003_dicts-and-loops.ipynb">
    <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Open this notebook in Google Colab
</a>

___

# Dictionaries and Loops
___
In this notebook, we will introduce two seemingly unrelated concepts: **dictionaries** and **loops**.

While both concepts are completely independent of each other, combining them allows us to build nice things and massively simplify or prettify the code we used in the previous notebook to create plots. We will introduce both concepts at a general level and we will then combine them to improve the code for our iris plot.

## Dictionaries
___
Like lists, dictionaries, are a collection of multiple objects. It is best to think of dictionaries as **pairs**. A dictionary is a collection of pairs, where each pair has a **key**, and a **value**. Both the key and the value can in theory be anything, but in practice, the key will often either be a number or a string. **The main utility of dictionaries is that we can easily a value for a specific key!**

Consider the following example.

In [None]:
# A dictionary is always within curly brackets { }
my_dict = {
    "key1": "value1", 
    "key2": "value2"
}

In [None]:
# See how easy it is to find the value for a specific key
my_dict["key1"] # Find the value of the `key1`

This is similar to how we can grab elements in a list using their index, e.g., `my_list[1]`. But there is a **major** difference: using a dictionary, we don't have to keep track of where the elements are in our list, and it can be much nicer to have keys which can be strings instead of just numbers.

In [None]:
# Consider a dictionary of item prices
item_prices = {
    "apricots": 8.95,
    "sweetcorn": 1.95,
    "beef": 10.75,
    "brownies": 4.05
}

In [None]:
# We can now easily obtain the price for a specific item
item_prices["brownies"]

This is much nicer than having a list `[8.95, 1.95, 10.75, 4.05]` and having to know that brownies come in the fourth position of our list!

While there are many different uses for dictionaries, we will be using them mostly for plots. Nonetheless, their concept is important. In particular, because we will also be using other objects (e.g., dataframes) which take this concept of **key-value-pairs** to another level!

In [None]:
# We can grab the keys of a dictionary using the .keys method
item_prices.keys()

In [None]:
# Similarly, we can grab the values using the .values method
item_prices.values()

## Loops
___
It's easier to understand how dictionaries can be used for plots once we understand what loops are. So let's delve into this concept!

Loops are useful for repeated evaluation of expressions. There are two main types of loops, `for`-loops and `while`-loops.

In [None]:
# The following code executes the block within for each iteration of i from 0 to 4
for i in range(5):
  print(i)

In [None]:
# The same also be done using a while loop
i = 0
while i < 5:
  print(i)
  i += 1 # Increment i by 1

When using a `while` loop, make sure you have an exit condition, otherwise your code will never stop and you will have to interrupt it. For instance the block
```python
i = 0
while i >= 0:
  print(i)
```
will never reach an exit condition, as `i` is always greater or equal to 0. In general, try using a `for`-loop instead of a `while`-loop. Of course, there are some cases where using a `while`-loop is inevitable...

#### ➡️ ✏️ Task 1

Modify the code below (where there is a ✏️) such that it prints the sum of all numbers up to 100.

*Hint*: in Python, we can increment a number `x` by 1 using `x += 1` 

In [None]:
mysum = 0 # A variable that keeps track of the sum

# ✏️ ... modify
n = 0 

# Run the loop
for i in range(1, n):
    if i % 5 == 0: # Every 5 step, print the result
        print(f"At iteration {i:>3}, the sum is {mysum:>5}")
    
    # ✏️ ... modify (⚠️ think about incrementation)
    mysum = 0

# Print the final result
print("-"*37)
print(f"The final sum is {mysum}")

# Warn the user in case the final sum is not correct
if mysum != sum(range(101)):
    print(f"\n⛔ There is an error somewhere. The sum should be {sum(range(101))}. ⛔")

`for`-loops can also be used to iterate over **the elements of a collection**. For instance, instead of iterating over the numbers in a specified range, you want want to iterate over the elements of a list or the characters in a string. A `for`-loop can do this as well!

In [None]:
# Specify a text
text = "Data Science Fundamentals"
# Iterate over the letters in the text and print them
for letter in text:
    print(letter)

In [None]:
# Of course, this also works for a list (strings are just like lists in a sense!)
my_list = [1, 2, "a", text, "b", mysum]
# Iterate and print the elements
for el in my_list:
    print(el)

Can you start to see how combining dictionaries and lists can be powerful? What if you look at the following example:

In [None]:
# Create a dictionary which maps iris species to a respective color
color_dict = {"setosa": "blue", "virginica": "green", "versicolor": "purple"} 
# Loop over the different iris species and print their color
for species in ["setosa", "virginica", "versicolor"]:
    print(f"The species {species} will have {color_dict[species]} color")

## Plotting
___
We now know enough about loops and dictionaries to improve the plotting code we used in the last notebook, so let's go ahead and do just that.

In [None]:
import matplotlib.pyplot as plt # Plotting package
import pandas as pd # Dataframe package

# Define the path where the data is stored
DATA_PATH = "https://raw.githubusercontent.com/JLDC/Data-Science-Fundamentals/master/data"

In [None]:
iris = pd.read_csv(f"{DATA_PATH}/iris.csv") # Read iris data

Do you remember the code we used in the previous notebook to create our plot? It looked something like this.

```python
# Create three subsets of the data, one for each species
iris_setosa = iris.loc[iris["species"] == "setosa"]
iris_virginica = iris.loc[iris["species"] == "virginica"]
iris_versicolor = iris.loc[iris["species"] == "versicolor"]

# Create a figure and axis
fig, ax = plt.subplots()
# Add the scatter plot ONLY for the setosa flowers
ax.scatter(iris_setosa["sepal length (cm)"], iris_setosa["petal length (cm)"],
          color="blue", label="setosa")
# Add the scatter plot ONLY for the virginica flowers
ax.scatter(iris_virginica["sepal length (cm)"], iris_virginica["petal length (cm)"],
          color="green", label="virginica")
# Add the scatter plot ONLY for the versicolor flowers
ax.scatter(iris_versicolor["sepal length (cm)"], iris_versicolor["petal length (cm)"],
          color="purple", label="versicolor")
# We need to explictly add the legend to the axis
ax.legend()
```

___
#### 🤔 Pause and ponder
Can you see how there is repetition going on in the above code and how we could use a loop to improve?
___

In [None]:
# Create a colormap to add colors to our plot
color_dict = {"setosa": "blue", "virginica": "green", "versicolor": "purple"} 

# Initialize the figure and axis
fig, ax = plt.subplots(figsize=(8, 6))

# Add a new series for each individual species
for species in iris["species"].unique():
    # Subset the data
    subset = iris.loc[iris["species"] == species] 
    # Add a scatter plot for the current species
    ax.scatter(subset["sepal length (cm)"], subset["petal length (cm)"], 
            label=species, color=color_dict[species], alpha=0.6)
# Set the axes labels
ax.set_xlabel("Sepal length in cm")
ax.set_ylabel("Petal length in cm")
ax.legend() # Add the legend (display the labels)

#### ➡️ ✏️ Task 2
Look at the above code and compare it to what we did in the previous notebook.

What is happening in the line where we create the `subset`, what is happening in the `ax.scatter(...)` function, in particular, look at the `label` and `color` parameters. Discuss with your classmates.