# Exercises

> Possible solutions can be found in the [solutions.ipynb](solutions.ipynb) notebook

### <font color="red"> *Exercise 1:* Widgets for interactive data fitting </font>

Widgets are fun, but they can also be useful. Here's an example showing how you can fit noisy data interactively.

1. Execute the cell below. It fits a 5th order polynomial to a gaussian function with some random noise 
2. Use the `@interact` decorator together with the function `fit`, such that you can visualize fits with polynomial orders `n` ranging from, say, 3 to 30

In [None]:
# gaussian function
def gauss(x,param):
    [a,b,c] = param
    return a*np.exp(-b*(x-c)**2)

# gaussian array y in interval -5<x-5 
nx = 100
x = np.linspace(-5.,5.,nx)
p = [2.0,0.5,1.5] # some parameters
y = gauss(x,p)

# add some noise
noise = np.random.normal(0,0.2,nx)
y += noise

# we fit a 5th order polynomial to it

def fit(n):
    pfit = np.polyfit(x,y,n)
    yfit = np.polyval(pfit,x)
    plt.plot(x,y,"r",label="Data")
    plt.plot(x,yfit,"b",label="Fit")
    plt.legend()
    plt.ylim(-0.5,2.5)
    plt.show()
    
# call function fit
# these lines are unnecessary when you use the interact widget
n=5
fit(n)

### <font color="red"> *Exercise 2a:* Cell profiling </font>

This exercise is about cell profiling, but you will get practice in working with magics and cells.

1. Load the random_walk.py code (in the current directory) into a cell below with the appropriate magic command 
    - note that you have to rerun the cell after the content is loaded
2. Split up the functions over cells (either via Edit menu or keyboard shortcut `Ctrl-Shift-minus`). 
3. Initializating `n` and calling `walk()` doesn't need to be in a main function, and you can remove the `__name__` stuff.
4. Plot the random walk trajectory.
5. Time the execution of `walk()` with a line magic.
6. Run the prun cell profiler.
7. Can you spot a little mistake which is slowing down the code?
8. In the next exercise you will install a line profiler which will more easily expose the performance mistake.

### <font color="red"> *Exercise 2b:* Installing a magic command for line profiling </font>



Magics can be installed using `pip` and loaded like plugins using the `%load_ext` magic. You will now install a line-profiler to get more detailed profile, and hopefully find insight to speed up the code from the previous exercise.

1. First install the line profiler using `!pip install line_profiler`.
2. Next load it using `%load_ext line_profiler`.
3. Have a look at the new magic command that has been enabled with `%lprun?`
3. Load the `random_walk.py` into a new cell, and execute it.
4. In a new cells, run the line profiler on each function of the example code using something like:   
`%lprun -f <func1> -f <func2> -f <func3> main()`
5. Inspect the output. Can you more easily see the mistake now?

### <font color="red"> *Exercise 3:* Data analysis with pandas dataframes </font>

Data science and data analysis are key use cases of Jupyter. In this exercise you will familiarize yourself with dataframes and various inbuilt analysis methods in the high-level `pandas` data exploration library. A dataset containing information on Nobel prizes will be used.

1. Start by navigating in the File Browser to the `data/` subfolder, and double-click on the `nobels.csv` dataset. This will open JupyterLab's inbuilt data browser.
2. Have a look at the data, column names, etc.
3. In a your own notebook, import the `pandas` module and load the dataset into a *dataframe*:  

```python
import pandas as pd
nobel = pd.read_csv("data/nobels.csv")
```

4. The "share" column of the dataframe contains the number of Nobel recipients that shared the prize. Have a look at the statistics of this column using  

```python
nobels["share"].describe()
```

5. The `describe()` method is smart about data types. Try this:  
```python
nobel["bornCountryCode"].describe()
```

    - What country has received the largest number of Nobel prizes, and how many?
    - How many countries are represented in the dataset?
6. Now analyze the age of prize recipients. You first need to convert the "born" column to datetime format: 

```python
nobel["born"] = pd.to_datetime(nobel["born"], 
                               errors ='coerce')
```

7. Next subtract the birth date from the year of receiving the prize and insert it into a new column "age":
```python
nobel["age"] = nobel["year"] - nobel["born"].dt.year
nobel[["surname","age"]].head(10)
```
 - Now print the "surname" and "age" of first 10 entries using the `head()` method.

8. Now plot results in two different ways:

```python
nobel["age"].plot.hist(bins=[20,30,40,50,60,70,80,
                             90,100],alpha=0.6);
nobel.boxplot(column="age", by="category")
```

9. Which Nobel laureates have been Swedish? See if you can use the `nobel.loc[CONDITION]` statement to extract the relevant rows from the `nobel` dataframe using the appropriate condition.

10. Finally, try the powerful `groupby()` method to analyze the number of Nobel prizes per country, and visualize it with the high-level `seaborn` plotting library. 
 - First add a column "number" to the `nobel` dataframe containing 1's (to enable the counting below).
 - Then extract any 4 countries (replace below) and create a subset of the dataframe:
```python
countries = np.array([COUNTRY1, COUNTRY2, COUNTRY3, COUNTRY4])
nobel2 = nobel.loc[nobel['bornCountry'].isin(countries)]
```
 - Next use `groupby()` and `sum()`, and inspect the resulting dataframe:
```python
nobels_by_country = nobel2.groupby(['bornCountry',"category"], 
                                   sort=True).sum()
```
 - Next use the `pivot_table` method to reshape the dataframe to a spreadsheet-like structure, and display the result:
```python
table = nobel2.pivot_table(values="number", index="bornCountry", 
                           columns="category", aggfunc=np.sum)
```
 - Finally visualize using a heatmap:
 ```python
import seaborn as sns
sns.heatmap(table,linewidths=.5);
```
    - Have a look at the help page for `sns.heatmap` and see if you can find an input parameter which annotates each cell in the plot with the count number.


### <font color="red"> *Exercise 4:* Defining your own custom magic command </font>


It is possible to create new magic commands using the `@register_cell_magic` decorator from the `IPython.core` library. Here you will create a cell magic command that compiles C++ code and executes it.


> This example has been adapted from the [IPython Minibook](http://ipython-books.github.io/), by Cyrille Rossant, Packt Publishing, 2015.


1. First import `register_cell_magic`

```python
from IPython.core.magic import register_cell_magic
```

2. Next execute the cell below here to register the new cell magic command. You can now start using the magic using `%%cpp`.

3. Write some C++ code into a cell and try executing it.

4. To be able to use the magic in another notebook, you need to add the following function at the end and then write the cell to a file in your PYTHONPATH. If the file is called `cpp_ext.py`, you can then load it by `%load_ext cpp_ext`.

```python
def load_ipython_extension(ipython):
    ipython.register_magic_function(cpp,'cell')
```


In [None]:
@register_cell_magic
def cpp(line, cell):
    """Compile, execute C++ code, and return the standard output."""

    # We first retrieve the current IPython interpreter instance.
    ip = get_ipython()
    # We define the source and executable filenames.
    source_filename = '_temp.cpp'
    program_filename = '_temp'
    # We write the code to the C++ file.
    with open(source_filename, 'w') as f:
        f.write(cell)
    # We compile the C++ code into an executable.
    compile = ip.getoutput("g++ {0:s} -o {1:s}".format(
        source_filename, program_filename))
    # We execute the executable and return the output.
    output = ip.getoutput('./{0:s}'.format(program_filename))
    print('\n'.join(output))


### <font color="red"> *Exercise 5:* Parallel Python with ipyparallel </font>

Traditionally, Python is considered to not support parallel programming very well ([see "GIL"](https://en.wikipedia.org/wiki/Global_interpreter_lock)), and "proper" parallel programming should be left to "heavy-duty" languages like Fortran or C/C++ where OpenMP and MPI can be utilised. 

However, IPython now supports many different styles of parallelism which can be useful to researchers. In particular, `ipyparallel` enables all types of parallel applications to be developed, executed, debugged, and monitored interactively. Possible use cases of `ipyparallel` include:
- Quickly parallelize algorithms that are embarrassingly parallel using a number of simple approaches.
- Run a set of tasks on a set of CPUs using dynamic load balancing.
- Develop, test and debug new parallel algorithms (that may use MPI) interactively.
- Analyze and visualize large datasets (that could be remote and/or distributed) interactively using IPython

This exercise is just to get started, for a thorough treatment see the [official documentation](https://ipyparallel.readthedocs.io/en/latest/) and [this detailed tutorial](https://github.com/DaanVanHauwermeiren/ipyparallel-tutorial).

1. First install `ipyparallel` using `conda` or `pip`. Open a terminal window inside JupyterLab and do the installation.
2. After installing `ipyparallel`, you need to start an "IPython cluster". Do this in the terminal with `ipcluster start`.
3. Then import `ipyparallel` in your notebook, initialize a `Client` instance, and create *DirectView* object for direct execution on the engines:
```python
import ipyparallel as ipp
client = ipp.Client()
print("Number of ipyparallel engines:", len(client.ids))
dview = client[:]
```
4. You have now started the parallel engines. To run something simple on each one of them, try the `apply_sync()` method:
```python
cluster[:].apply_sync(lambda : "Hello, World")
```
5. A serial evaluation of squares of integers can be seen in the code snippet below. 
```python
serial_result = list(map(lambda x:x**2, range(30)))
```
 - Convert this to a parallel calculation on the engines using the `map_sync()` method of the DirectView instance. Time both serial and parallel versions using `%%timeit -n 1`.

6. You will now parallelize the evaluation of $\pi$ using a Monte Carlo method. First load modules, and export the `random` module to the engines:
```python
from random import random
from math import pi
dview['random'] = random
```
Then execute the following code in a cell. The function `mcpi` is a Monte Carlo method to calculate $\pi$. Time the execution of this function using `%timeit -n 1` and a sample size of 10 million (`int(1e7)`).
```python
def mcpi(nsamples):
    s = 0
    for i in range(nsamples):
        x = random()
        y = random()
        if x*x + y*y <= 1:
            s+=1
    return 4.*s/nsamples
```    
Now take the incomplete function below which takes a `DirectView` object and a number of samples, divides the number of samples between the engines, and calls `mcpi()` with a subset of the samples on each engine. Complete the function (by replacing the `____` fields), call it with $10^7$ samples, time it and compare with the serial call to `mcpi()`.
```python
def multi_mcpi(dview, nsamples):
    # get total number target engines
    p = len(____.targets)
    if nsamples % p:
        # ensure even divisibility
        nsamples += p - (nsamples%p)
    
    subsamples = ____//p
    
    ar = view.apply(mcpi, ____)
    return sum(ar)/____
```

Final note: While parallelizing Python code is often worth it, there are other ways to get higher performance out of Python code. In particular, fast numerical packages like [Numpy](http://www.numpy.org/) should be used, and significant speedup can be obtained with just-in-time compilation with [Numba](https://numba.pydata.org/) and/or C-extensions from [Cython](http://cython.org/).
