# Software Carpentry Exercises

**Legend**  
❓ = Question to cover in the breakout session  
❔ = Optional question / homework  
💡 = hints  
🔎 = solution

**Solutions**  
Solutions are provided for each question. In order to view the solution, uncomment (`Ctrl + /`) the line `%load solutions/<solution.py>` and execute the cell (once for showing the answer, twice for executing the solution).

## Breakout Session 1

### ❓ <ins>Arithmic with different types</ins>

Where reasonable, `float()` will convert a string to a floating point number, and `int()` will convert a floating point number to an integer:

In [None]:
print("string to float:", float("3.4"))
print("float to int:", int(3.4))

**Question**  
Given this information, which of the following will return the floating point number `2.0`? Discuss your answer.  
_Note: there may be more than one right answer._

In [None]:
first = 1.0
second = "1"
third = "1.1"

1. `first + float(second)`
1. `float(second) + float(third)`
1. `first + int(third)`
1. `first + int(float(third))`
1. `int(first) + int(float(third))`
1. `2.0 * second`

In [None]:
# Your code here


<details>
<summary>🔎 Solution</summary>
Answer: 1 and 4
</details>

### ❓ <ins>Slicing strings</ins>

“Indexing” means referring to an element of an iterable by its position within the iterable. “Slicing” means getting a subset of elements from an iterable based on their indices. We can take slices of character strings as well:

In [None]:
element = "oxygen"
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])

**Question**  

What are the values of the folowing slices?
```python
element[4]
element[4:]
element[:] 
element[-1]
element[-3:]
```

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/slicing_strings.py

### ❓ <ins>Slicing lists with steps</ins>

We’ve seen how to use slicing to take single blocks of successive entries from a sequence. But what if we want to take a subset of entries that aren’t next to each other in the sequence? We can achieve this by providing a third argument to the range within the brackets, called the step size.

The full syntax for creating slices is `[begin:end:step]`, although you most often find a short-hand notation as we've seen in the above exercise.

The example below shows how you can take every third entry in a list:

In [None]:
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
subset = primes[0:12:3]
print('subset', subset)

Given the following list of months:

In [None]:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']

**Questions**

1. What slice of months will produce the following output `['jan', 'mar', 'may', 'jul', 'sep', nov']`?

1. Given the short-hand notation we used for the character string in the Exercise (i.e. `element[:2] == 'element[0:2]`), can you find the short-hand notation for question 1? What do you find easier to read?

1. Using the step size parameter, can you think of a way to reverse the list?

<details>
<summary>💡 Click here for hints</summary>

- Note that `months[-1]` will retrieve the last index (`dec`), but `months[0:-1]` will not. Slicing is "up to, but not including". 
- The step parameter accepts negative values for reversing the slicing direction

</details>

In [None]:
# Your code here
month_slice = months[::-1]
print(month_slice)

#### 🔎 Solution

In [None]:
# %load solutions/slicing_steps.py

## Breakout Session 2

### ❓ <ins>Change in inflammation</ins>

The patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept.

The `numpy.diff()` function takes an array and returns the differences between two successive values. Let’s use it to examine the changes per day across the first week of patient 3 from our inflammation dataset:

In [None]:
# Load data
import numpy
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [None]:
patient3_week1 = data[3, :7]
print(patient3_week1)

Calling `numpy.diff(patient3_week1)` would do the following calculations

```python
[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]
```
and return the 6 difference values in a new array:

In [None]:
numpy.diff(patient3_week1)

**Questions**

1. When calling `numpy.diff()` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff()` to our 2D inflammation array `data`, which axis would we specify?

1. If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the `shape` of the array be after you run the `diff()` function and why?
1. How would you find the largest change in inflammation for each patient?

1. Plot a histogram using `matplotlib.pyplot.hist()` of the change per day for patient 3 across all 40 days. 

<details>
<summary>💡 Click here for hints</summary>

1) Using `axis=0`, means it will calculate the consecutive difference between patients (considering all days). Using `axis=1` means it will calculate the consecutive difference between days (considering all patients). Since it is more useful to see what happens to all patients as they go through the clinical trial, it makes more sense to see overall how the number of inflammation episodes change for the patients between consecutive days.

2) Note that the array of differences is shorter by one element.
    
3) By using the `numpy.max()` function after you apply the `numpy.diff()` function, you will get the largest difference between days.
</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/inflammation.py

### ❔ <ins>Plotting differences</ins>

Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly), i.e., the difference between the leftmost plots of the first two figures we have plotted so far.

Steps:

1. Import libraries
1. Import data
1. Calculate difference
1. Create and annotate figure

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/plotting_differences.py

## Breakout Session 3

### ❓ <ins>Summing a list</ins>

Write a loop that calculates the sum of elements in a list by adding each element and printing the final value, so `[124, 402, 36]` prints 562

<details>
<summary>💡 Click here for hints</summary>

**Steps:**  
1. Define the input variable `numbers` and output variable `summed`
1. Create a for-loop over the indices of `numbers` and add each `num` to `summed`, e.g.
    
    ```python
    for num in numbers:
        # Add num to summed
    ```
1. Print the result with `print()`
    

</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/summing_list.py

### ❓ <ins>Rescaling an array</ins>

1. Write a function `rescale` that takes an array as input and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0.

<details>
<summary>💡 Click here for hints</summary>

If `L` and `H` are the lowest and highest values in the original array, 
then the replacement for a value `v` should be `(v-L) / (H-L)`.

    
</details>

In [None]:
import numpy

# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/rescale1.py

2. Run the commands `help(numpy.arange)` and `help(numpy.linspace)` to see how to use these functions to generate regularly-spaced values, then use those values to test your `rescale` function.

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/rescale2.py

**Optional**  
3. Rewrite the rescale function so that it scales data to lie between `0.0` and `1.0` by default, but will allow the caller to specify lower and upper bounds if they want. Verify that `rescale` now computes
    
```python
rescale(numpy.arange(13), low_val=-2, high_val=7)
```
```
Output
[-2.   -1.25 -0.5   0.25  1.    1.75  2.5   3.25  4.    4.75  5.5   6.25  7.  ]
```

Once you’ve successfully tested your function, add a docstring that explains what it does. Compare your implementation to your neighbor’s: do the two functions always behave the same way?

<details>
<summary>💡 Click here for hints</summary>

You can renormalize the `output_array` with `output_array * (H - L) + L`
    
</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/rescale3.py

### ❔ <ins>Computing the value of a polynomial</ins>

The built-in function `enumerate` takes a sequence (e.g. a list) and generates a new sequence of the same length. Each element of the new sequence is a pair composed of the index (0, 1, 2,…) and the value from the original sequence:

```python
for idx, val in enumerate(a_list):
    # Do something using idx and val
```

The code above loops through `a_list`, assigning the index to `idx` and the value to `val`.

Suppose you have encoded a polynomial as a list of coefficients in the following way: the first element is the constant term, the second element is the coefficient of the linear term, the third is the coefficient of the quadratic term, etc.

$$y = p_0 + p_1 x^1 + p_2 x^2 + ... + p_n x^n $$

In [None]:
x = 5
coefs = [2, 4, 3]
y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
print(y)

Write a loop using `enumerate(coefs)` which computes the value `y` of any polynomial, given `x` and `coefs`.

In [None]:
# Your code here


#### 🔎 Solution

In [185]:
# %load solutions/polynomial.py