# 4.4 Exercises

In [6]:
# Required imports for the exercises
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

## Exercise 1: NumPy arrays

Numpy offers many functions to generate n-dimensional arrays without explicitly writing out the lists. For example, you can use the `np.zeros()` method to create an array full of zeros by providing the desired shape as the first argument. 

1. Have a look at the [documentation](https://numpy.org/doc/2.0/reference/generated/numpy.zeros.html) and then create a three dimensional array of shape `(2,2,2)`.  
2. How would such a shape look in the real world? Print the array on the screen. Make sure you understand how the values displayed visually map onto the dimensions of the array.
3. Can you generate such an array with only the `np.array()` method? *Hint: A three-diensional array can be seen as a list of lists of lists.*

Another useful method is the `np.random.randint()`, which can create arrays filled with random integers. 

4. Look up the documentation and then create a 5x5 NumPy array of random integers between 1 and 100 and print the sum of all the elements.

In [None]:
# Exercise 1

## Exercise 2: More Arrays

Create a NumPy array of shape (100,) with values evenly spaced between 0 and 50. For this array, compute the:

- Mean
- Standard deviation
- Sum of all values that are greater than 25

*Hint: Useful functions are `np.linspace()`, `np.mean()`, `np.std()`, and `np.sum()`.*


In [None]:
# Exercise 2

## Exercise 3: Pandas DataFrames

1. Please create a Pandas DataFrame with columns "Name", "Age", and "City" for 5 people and print out the first two rows.
2. Load a CSV file into a DataFrame, filter out all rows where the "Age" column is greater than 30, and calculate the average age for the remaining rows.

In [None]:
# Exercise 3

## Exercise 4: More DataFrames

1. Load the Yeatman data from https://yeatmanlab.github.io/AFQBrowser-demo/data/subjects.csv into a `DataFrame`.
2. Print the head of the `DataFrame` to get an overview of what is in there.
3. Add a filter column for people younger than 30 and call it`'Age < 30'`.
4. Calculate the average age and IQ for people younger than 30 as well as for the older people and compare the results.

*Hints:*

- *The conditions are mutually exclusive (i.e., a person is either younger than 30 or not), meaning you only need a single filter colun to cover both conditions.*
- *You can simply use the tilde operator (`~`) for indexing, which in pandas means "not". Indexing people 30 and older thus looks like this: `df[~df['Age < 30']]`.*

In [None]:
# Exercise 4

## Exercise 5: Plotting with matplotlib

The `plot` method has multiple other keyword arguments to control the appearance of
its results. For example, the color keyword argument controls the color of the lines.
One way to specify the color of each line is by using a string that is one of the named
colors specified in the [Matplotlib documentation](https://matplotlib.org/stable/gallery/color/named_colors.html). Use this keyword argument to make the three lines in the plot more distinguishable from each other by using colors that you find pleasing.

In [None]:
from matplotlib import pyplot as plt

trials = [1, 2, 3, 4, 5, 6]
first_block = [50, 51.7, 58.8, 68.8, 71.9, 77.9]
middle_block = [50, 78.8, 83, 84.2, 90.1, 92.7]
last_block = [50, 96.9, 97.8, 98.1, 98.8, 98.7]

fig, ax = plt.subplots()

ax.plot(trials, first_block, marker='o', linestyle='--', label="First block")
ax.plot(trials, middle_block, marker='v', linestyle='--', label="Middle block")
ax.plot(trials, last_block, marker='^', linestyle='--', label="Last block")

ax.legend()
ax.set(xlabel='Trials', ylabel='Percent correct', title='Harlow learning experiment')

plt.show()

## Exercise 6: More plotting with matplotlib

Plot two line graphs in the same plot:

- sin(x) and cos(x) for x values from 0 to 2π.
- Add a legend and set different line styles for the two functions.

In [None]:
# Exercise 6

## Voluntary Exercise 1: Advanced NumPy indexing

1. Generate a random 10x10 matrix
2. Find the index of the largest value by using `np.where()` and replace it with 0
3. Find the row-wise and column-wise sums of the updated array and print the result.

In [None]:
# Voluntary exercise 1

## Voluntary exercise 2: Advanced Pandas operations and plotting

Given the DataFrame with columns "Product", "Sales", and "Date":

1. Create a new column called `Cumulative_Sale` showing the cumulative sales by product over time
2. Plot a line graph of cumulative sales for each product. 


*Hints:*
- *DataFrames have a `.cumsum()` method which you can use*
- *You can loop over the unique products like so: `for product in df['Product'].unique():`*

In [None]:
df = pd.DataFrame({
    'Product': ['A', 'A', 'B', 'B', 'A', 'B'],
    'Sales': [100, 150, 200, 50, 300, 400],
    'Date': pd.date_range(start='2023-01-01', periods=6, freq='D')
})

## Voluntary exercise 3: Advanced plotting layouts

1. Create a 3x3 grid of subplots
2. For each subplot, plot a different function (e.g., sin(x), cos(x), tan(x), etc.)
3. Customize the titles, axes, and tick labels of each subplot.

*Hints:*
- *Try to do the plotting in a single loop. The potential loop could look like this: `for ax, func in zip(axes.flat, functions):`, with `axes` being the axex object from `plt.subplots` and `functions` being a list containing the relevant functions* 
- *You can find a list of usable mathematical functions [here](https://numpy.org/doc/stable/reference/routines.math.html)*
- *The name of the mathematical function can be accessed through the `func.__name_` attribute*

In [None]:
# Voluntary exercise 3