# ECE 3 : Homework 1

# Instructions

To get started, you should go through the following steps.
- Rename this jupyter notebook by adding your name: e.g. `ECE3_HW-1_<your-name>.ipynb`.
- Complete all the exercises by directly editing your notebook.
- Make sure that the coding portions run without errors.

Run the following line of code to install the package `geomstats` that you will need to get the datasets.

In [9]:
!pip3 install --user pytest-runner
!pip3 install scipy
!pip3 install joblib
!pip3 install git+https://github.com/geomstats/geomstats.git

Collecting git+https://github.com/geomstats/geomstats.git
  Cloning https://github.com/geomstats/geomstats.git to /tmp/pip-req-build-l4wa7y1j
  Running command git clone -q https://github.com/geomstats/geomstats.git /tmp/pip-req-build-l4wa7y1j
  Resolved https://github.com/geomstats/geomstats.git to commit 36431faf5b36ddcf5632edf453982361ee57d21d


# Problem 1 - How do cancer drugs impact our cells?

In this problem, we study osteosarcoma (bone cancer) cells and the impact of drug treatment on their morphological shapes, by analyzing cell shapes from cell images obtained from fluorescence microscopy.

![Plot](https://github.com/geomstats/challenge-iclr-2021/blob/main/Florent-Michel__Shape-Analysis-of-Cancer-Cells/cells_image.png?raw=true)

<center>Representative images of the cells using fluorescence microscopy (Image credit : Ashok Prasad). The cells nuclei (blue), the actin cytoskeleton (green) and the lipid membrane (red) of each cell are stained and colored.</center>

In this problem, we only focus on the boundaries of the cells.

### Part (a) - Path Length

A "path" in 2D space can be defined as a `list` of 2D points, where each 2D point is expressed with its x-y coordinates as a `tuple`. An example of path is given below.

In [6]:
path_example = [(1, 1), (2, 3), (3, 8)]
path_example

[(1, 1), (2, 3), (3, 8)]

Write a function, `path_length()`, which takes a "path" as input and calculates the total distance of the path the points map out, starting from the first point in the list to the second, the second to the third, and so on. The total distance is the sum of all of the distances of segments forming the path. The function `path_length()` should return this total distance.

It is assumed that the list of points is already in the correct order. An image example of the path length approximation is shown below. The segments shown in **red** are the lengths you must compute correctly and sum to find the path length.


![Plot](https://raw.githubusercontent.com/monsij/ToDo_App/c692f1d894a1073f439945aa2a412ec34030e45e/pythonProbPlot.png)

The function definition has been started for you below. 
- Complete it with your answer. **(10 points)**
- Test it on the `path_example` above. **(5 points)**

In [13]:
import numpy as np
def path_length(pa):
    palen = 0
    for i in range(len(pa)-1):
        palen += np.linalg.norm(pa[i+1]-pa[i])
    return palen
path_example = np.array([(1, 1), (2, 3), (3, 8)])
path_length(path_example)

7.335087491092574

### Part (b) - Computing the length of cancer cells

We load a dataset of bone cancer cells using the package `geomstats` and extract the first cell, called `cell0`.

In [8]:
import geomstats.datasets.utils as data_utils
import matplotlib.pyplot as plt
import numpy as np

cells, _, treatments = data_utils.load_cells()
cell0 = cells[0]
#plt.plot(cell0[:, 0], cell0[:, 1])
#plt.title("Cell 0")
cell0 = list(tuple(point) for point in cell0)

- Compute the length of the first cell of this dataset. **(5 points)**

In [10]:
len(cell0)

210

### Part (c) - Does the cancer treatment affect the biophysics of the cell?

In this dataset of bone cancer cells, some have been treated with different drugs - Jasp (jasplakinolide) and Cytd (cytochalasin D) - and some (called control cells) have not been treated.

- The cell0 is a control cell.
- The cell500 is a cell treated with the drug "jasp".

In [None]:
drug0 = treatments[0]
print(drug0)

cell500 = list(tuple(point) for point in cells[500])
drug500 = treatments[500]
print(drug500)

control
jasp


We want to know if the drug affects the biophysics of the cell. An indication that the drug affects the biophysics of the cell would be that it modifies the cytoskeleton, and therefore the perimeter of the cell.

- Compute the length of cell500. **(5 points)**

In [12]:
len(cell500)

211

- Explain, in your own words, how you would design a simple code that tests if a given treatment affects the biophysics of the cell. (Coding is not required). **(10 points)**

I would like to 

# Problem 2 - Analyzing signals

### Part (a) - Average and RMS

We analyze an alternating signal - you may interpret it as a voltage signal (used to transmit bits in computing), or an optical signal (to transmit data across long distances in an optical fiber). A signal is represented as a `list` of numbers, where each number represents the amplitude of the signal. An example of alternating signal is shown below. The packages `numpy` and `matplotlib` have been imported for your convenience. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

time = np.linspace(0, 1, 1001) # time-values of the signal
signal_example = np.cos(200 * time) / (5 * time + 1) * np.sin(10 * time) # corresponding amplitude values of the signal

- Write a function, `avg_rms()`, which takes a signal as input and calculates the average signal value and the RMS value. The function `avg_rms()` should `return` these values as a tuple, with the average first and the RMS second, as in the code below. You **cannot** use `np.average` nor `np.mean` for this exercise, but you should use them to check your result. **(10 points)**

In [None]:
def avg_rms(signal):
    '''
    Enter your solution here.
    '''
    avg = 0.  # replace this value with your code
    rms = 0.  # replace this value with your code
    return avg, rms

### Part (b) - Plotting

Now we will practice plotting data with useful information. 
- Write a script that uses your previous `avg_rms()` function to find the RMS and average of the given signal (contained in the `time` and `signal_example` arrays above). Plot the given signal, the average value, and the average $+/-$ the RMS value. **(10 points)**

You must reference the function from Part **(a)** - do not recalculate the RMS and average in your script without the function.

Do not forget to provide explanatory labels to the axes of your plot.

In [None]:
'''
Enter your solution here.
'''

### Part (c) - Standard Deviation

Now we will find the standard deviation of the above signal (using the values contained in `signal`). The formula for the standard deviation of an $n$-vector $x = [x_1, x_2, ..., x_{n-1}, x_n]$ is given below:
$$
std(x) = \sqrt{\frac{\sum_{i=1}^{n}(x_i - avg(x))^2}{n}}
$$
with $avg(x)$ the mean value contained in the vector $x$ and $\Sigma$ means "sum".

- Write a function `std_dev` that computes the standard deviation of a signal.  You **cannot** use `np.std` for this exercise but you should use it to check your result in the next question. **(10 points)**
- Make a function call (below your function definition) that uses `std_dev()` to calculate and print the standard deviation of the signal amplitude array `signal_example` given above. **(5 points)**

In [None]:
def std_dev(signal):
    '''
    Enter your solution for the function here.
    '''
    
# Make the function call to std_dev() with the `signal_example` array. Print the result.

- Code another example of signal.
- Verify the formula $rms(x)^2 = avg(x)^2 + std(x)^2$ on your signal.

In [None]:
'''
Enter your solution here.
'''

# Problem 3 - Computing with vectors 
### Given vector $\textbf{a} = (𝑎_1,...,𝑎_𝑛)$, vector $\textbf{b} = (b_1, ..., b_n)$ and scalar $k$.  

- Write $\textbf{a}$ as a linear combination of the one-hot n-vectors $e_1, ... e_n$.  (2 points)
- Compute $(\textbf{a}^T\textbf{a)b}$ using entries of $\textbf{a}$ and $\textbf{b}$.  (5 points)
- Show the inner product property $(k\textbf{a})^T\textbf{b} = k(\textbf{a}^T\textbf{b})$ using entries of $\textbf{a}$ and $\textbf{b}$. (4 points)

# Problem 4 - Computing with functions

Let $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ be a function defined by $f(x) = -x_2^3 + \frac{1}{2}x_1^2 + 2x_2^2 + x_1x_2  -4x_1 - 4x_2 $ for any $2$-vector $x = (x_1, x_2)$.

### Part (a) - Linear functions

- Is $f$ a linear function? (5 points)
    - If yes, show that it satisfies the superposition principle. 
    - If no, gives an example of $\alpha, \beta, x, y$ such that it violates the superposition principle.

### Part (b) - Gradients

- Compute the gradient of $f$. Give its value at the $2$-vector $x = (1, 2)$. (3 points)

- Verify the value of your gradient at the $2$-vector $x = (1, 2)$ by using Python and the package `jax`. (3 points)

In [None]:
import jax

'''
Enter your solution here.
'''

### Part (c) - Taylor expansions

- Compute the first order Taylor approximation $\hat{f}$ of $f$ at $\textbf{z} = \begin{bmatrix} 1\\2\end{bmatrix}$. (2 points)

- Compute the value of the first order Taylor approximation of $f$ at $\textbf{x} = \begin{bmatrix} 1.01\\2.02\end{bmatrix}$, i.e. compute $\hat{f}(x)$. (4 points)

- Check your result by using Python. (2 points)

In [None]:
'''
Enter your solution here.
'''