# Math  1376: Programming for Data Science
---

## Assignment 03: Functions, logic, a little bit of loops, and how to test your functions and modules
---

**Expected time to completion: 6-9 hours**

In [None]:
import numpy as np

<span style='background:rgba(255,255,0, 0.25); color:black'> Run the code cell below and click the "play" button to see the recorded lecture associated with this notebook.</span> 

In [None]:
# 1. Running this cell with embed the short recorded lecture associated with this part of the notebook
# 2. Press on the "play" button to start the video.

from IPython.display import YouTubeVideo

YouTubeVideo('ICppremFi1k', width=800, height=450)

## Problem 1: Population growth
---

### Problem 1(a) 
---

*You may want to review the final activity in your 01-Jupyter-lecture notebook to help you with this problem.*

Review the population growth model and its solution here: https://en.wikipedia.org/wiki/Logistic_function#In_ecology:_modeling_population_growth

Use the Markdown cell below to summarize both

- the population growth model (given as a differential equation), and
<br>
- the solution to the model.

Do not forget to include bullet lists explaining what the inputs and outputs are of these equations.

### Problem 1(a): Answers
---

YOUR ANSWERS GO HERE.

### Problem 1(b)
---

- Finishing coding the function in the code cell below. 

    - The function should have parameters `r`, `P_0`, `K`, and `t` (which you should have described in your answer to problem 1(a) as $r$, $P_0$, $K$, and $t$). 

    - The function output is simply the population values associated with the (arrays) of parameter values given to it.

    - The `exp` function within `numpy` is useful here.

    - The most common student errors involve forgetting order of operations and incorrect placement of parentheses. 

In [None]:
# Code solution function here to answer Problem 1(b) 
def P(): # PUT PARAMETERS INSIDE THE PARENTHESES
    p_vals = # PUT POPULATION FUNCTION HERE
    return p_vals

### Problem 1(c)
---

- Evaluate the function you coded in part (b) above with $r=0.01$, $P_0=.1$, $K=2$, at $t = 1, 10, 100$, and $1000$ and print results. 

- Repeat those computations except now use $r=0.5$. 

- Discuss/interpret the results in terms of the sensitivity of the population to increasing $r$ values. 

In [None]:
# Code for Problem 1(c) goes here

YOUR DISCUSSION/INTERPRETATION FOR PROBLEM 1(C) GO HERE.

### Problem 1(d)
---

- Run the code cells below that will use your function to compute and plot $P(t)$ for various values of $r, P_0$, and $K$. 

- Add some comments to the code. 

- Discuss/interpret the results in the Markdown cell that follows. 

    - A reference plot of what you *should* get if you coded the function correctly in part (b) is given in the Markdown cell. If your plot differs from this, then you need to critically examine what you coded in part (b). This should also help inform you whether or not your results and interpretations in part (c) are correct or need re-examining.

In [None]:
# We need to import pyplot from matplotlib
%matplotlib inline
import matplotlib.pyplot as plt 

In [None]:
plt.figure(figsize=(8,5))
t_f = 10
rs = [.3, .3, .5, .5]
P_0 = .1
Ks = [2, 4, 2, 4]
for (r,K) in zip(rs,Ks):
    t = np.linspace(0, t_f, 100)
    label_str = '$(r,K)$ = (' + str(r) + ',' + str(K) + ')'
    plt.plot(t, P(r=r, P_0=P_0, K=K, t=t), label=label_str)
plt.legend(fontsize=18)

### Reference plot and discussion/interpretation for Problem 1(d) goes below
---

![Reference plot to check work](Reference_Pop_Plot.png)

<span style='background:rgba(255,255,0, 0.25); color:black'>YOUR DISCUSSION/INTERPRETATION GOES HERE.</span>



## Some background on RBFs (a preamble to Problem 2)

[Radial Basis Functions (RBFs)](https://en.wikipedia.org/wiki/Radial_basis_function) are used in many applications and disciplines such as non-parametric density estimation in statistics, approximations of function responses on unstructured data sets, and defining some types of artificial neural networks.
In other words, they are [kind of a big deal](https://www.youtube.com/watch?v=H8OxKx6zKkQ).

### What is the idea/purpose/features of an RBF?

We summarize the <font size=5>big picture</font>  idea behind why RBFs are used as follows:

> We have some information about "what happens" at a point in space, denoted by $\mathbf{c}$, and we want to use this information to "infer what may happen" at a point $\mathbf{x}$. 

In the context of the big picture idea, we expect RBFs to exhibit the following features:

- If $\mathbf{x}$ is "close" to $\mathbf{c}$, then the knowledge we have about what happens at $\mathbf{c}$ should play a big role in informing what we think may happen at $\mathbf{x}$. 

- If $\mathbf{x}$ is "far away" from $\mathbf{c}$, then perhaps we should reduce how much we rely on the knowledge we have at $\mathbf{c}$ to inform us about what may happen at $\mathbf{x}$. 

### How do we go from concept to practice? 

A few things have to be made more concrete: what functions exhibit such desirable features, what do we mean by "close" and "far away", and how do we measure distance anyway?

To make this a bit more mathematically precise, we formalize our notation. 
- Let $\phi$ denote an RBF that maps a "distance" $r\in[0,\infty)$ into $\mathbf{R}$ such that $\phi(0)$ is the maximum and $\phi(r_1)>\phi(r_2)$ whenever $0\leq r_1<r_2$.

- Let $\mathbf{c}$ denote some point in a vector space at which we "know something." What is a vector space? Some simple examples are $\mathbb{R}^2$ and $\mathbb{R}^3$. 

- Let $\mathbf{x}$ denote some other point in the vector space where we would like to "infer something."

- Let $r=\vert\vert\mathbf{x}-\mathbf{c}\vert\vert$ denote the distance from $\mathbf{x}$ to $\mathbf{c}$ where $\vert\vert\cdot\vert\vert$ denotes a norm on the vector space. If you have never seen a norm before, do not worry, we will get some practice below. Primarily, in the context of this class, you can think of a norm as determining the length of a vector, which is simply stating how far the point defined by its coordinates are from the origin.

The image below of a Gaussian RBF illustrates these concepts where we see that if $r=0$, this would mean the point $\mathbf{x}=\mathbf{c}$ and the RBF is "maximized" and the RBF decreases as $r$ increases (when $\mathbf{x}$ moves away from $\mathbf{c}$ either to the left or right).

![Illustration of RBF](Gaussian-RBF-cartoon.png "The items that make up a figure")

### A note on important technical content: norms

Wait, how do we compute $r$? What does $\vert\vert \cdot \vert\vert$ being a norm mean? 

In 1-dimension (such as in the image of an RBF shown above), $r=\vert\vert \mathbf{x}-\mathbf{c}\vert\vert$ is just given by the absolute value of the difference between the points $\mathbf{x}$ and $\mathbf{c}$. In other words, $r = \vert \mathbf{x}-\mathbf{c}\vert$. 

We just have to be more thoughtful as we move into higher-dimensional spaces because we have more options available for how we would measure the size of vectors and thus the distance between two points.

In many cases, it is fine to simply think of this as the "straight line distance" between two points that you are probably familiar with in 2- and 3-dimensional spaces. 
But, it is important to note that there are other useful ways of measuring distances.
The interested reader is encouraged to examine the following resources for more information (click the links to see graphical representations of these distances as well):

- The [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) generalizes the concept of a straight line distance in $\mathbf{R}^k$ for $k$ beyond 3-dimensions. This is sometimes called the 2-norm distance. 

- The [taxicab distance](https://en.wikipedia.org/wiki/Taxicab_geometry) is useful when the distance measured in moving from one point to another is best described by moving along a "grid of roads". This is closely related to how we actually think of the "real distance" of two points on a map when the "straight line path" is impossible to traverse. This is sometimes called the 1-norm distance.

- The [Chebyshev distance](https://en.wikipedia.org/wiki/Chebyshev_distance) is useful when the distance measured in moving from one point to another is best described by the longest distance we have to travel in any one direction. This is sometimes called the $\infty$-norm distance. 

### Are there simple ways to compute norms in Python?

While norms are actually pretty simple functions to code up yourself, the subpackage `numpy.linalg` has a function [`norm`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html) that allows you to pass an `ord` parameter to define which type of norm you want to use. 
Basically, this `ord` parameter allows you to choose which particular Minkowski distance function you want to use.

- `ord=2` means you are using the 2-norm. 

- `ord=1` means you are using the 1-norm.

- `ord=np.inf` means you are using the $\infty$-norm (you pass the "infinity" variable stored in `numpy` as the `ord`)

We show some examples of this below.

In [None]:
from numpy.linalg import norm 

In [None]:
# Some vectors (as 1-D numpy arrays of various shapes)

a = np.array([5, 5]) # Shape is (2,) so interpret as a point/vector in 2-D space

b = np.array([3, 7, 2]) # Shape is (3,) so interpret as a point/vector in 3-D space

In [None]:
# First some 2-dimensional distances between a point x and a

x = np.array([2, 3]) 

print(norm(x-a)) # The 2-norm is the default, this gives the Euclidean distance
print()

print(norm(x-a, ord=2)) # Also gives the Euclidean distance
print()

print(norm(x-a, ord=1)) # This gives the 1-norm distance 
print()

print(norm(x-a, ord=np.inf)) # This gives the infinity-norm distance 

In [None]:
# Now some 3-dimensional distances between a point x and b

x = np.array([2, 3, 5])

print(norm(x-b, ord=2)) # Gives the Euclidean distance (recall, we do not need to specify ord here, but it is a good idea to do so)
print()

print(norm(x-b, ord=1)) # This gives the 1-norm distance 
print()

print(norm(x-b, ord=np.inf)) # This gives the infinity-norm distance 

## Problem 2: Radial Basis Functions (RBFs)
---

### Problem 2(a)
---

Read the Wiki page on [Radial Basis Functions (RBFs)](https://en.wikipedia.org/wiki/Radial_basis_function) and summarize the following RBFs in the Markdown cell below: 

- Gaussian and Bump function. 

- Briefly explain the role of the shape parameter $\epsilon$ in determining what is considered close and far away (hint: think of "squeezing" or "stretching" out the Gaussian RBF pictured above). 

### Problem 2(a) Answers:
---

YOUR ANSWERS GO HERE

### Problem 2(b)

In the code cells below, you should
- Create three functions: `my_Gaussian`, `my_Bump`, and a "wrapper" function called `my_RBFs` function that will choose which of these two RBFs to evaluate based on a string argument `which_RBF` that defaults to the Gaussian RBF. 
<br><br>
   - All three of these functions should also take in as arguments the following: `x` and `c` (both `numpy` arrays of the same size), the shape parameter `eps`,  and `ord` (which determines the norm used to compute $r=\vert \vert \mathbf{x}-\mathbf{c} \vert\vert$ where you use the [`norm`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html) function in `numpy.linalg` to compute this norm). 
 <br><br>
   - The functions should return the value of the RBF at the computed $r$ value.

*I get you started on the `my_Gaussian` and `my_RBFs` functions, but they need to be completed. You need to write the `my_Bump` function in its entirety.*

In [None]:
# Code for my_Gaussian goes here

def my_Gaussian(x, c, eps=1, ord=None):
    r = norm(x-c, ord=ord) # Compute the distance from x to c
    # Compute the Gaussian RBF with shape parameter eps below
    phi = # YOU NEED TO FILL IN THIS PART!!!!
    return phi

In [None]:
# Code for my_Bump goes here

In [None]:
# Code for my_RBFs goes here

def my_RBFs(x, c, eps=1, ord=None, which_RBF='Gaussian'):
    if which_RBF == 'Gaussian':
        # Fill this in
    elif which_RBF == 'Bump':
        # Fill this in
    else:
        print('Uh, what RBF do you want? Choose Gaussian or Bump please.')
        return
    return phi

### Problem 2(c)
---

Execute the code cells below to verify that both RBFs can be evaluated for the given points/vectors in $\mathbf{R}$, $\mathbf{R}^2$, and $\mathbf{R}^{10}$ with different choices of `ord` and `eps`.

In [None]:
# Create 1-D numpy arrays of shape (1,), (2,), and (10,) to test the code

a_R = np.array([1])
b_R = np.array([2])

a_R2 = np.array([1, 2])
b_R2 = np.array([3, 4])

a_R10 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
b_R10 = np.array([2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

In [None]:
# Showing 1-D results for Gaussian RBF

phi = my_RBFs(a_R, b_R, eps=1, ord=2)
print(phi)
print()

phi = my_RBFs(a_R, b_R, eps=0.5, ord=1)
print(phi)
print()

phi = my_RBFs(a_R, b_R, eps=2, ord=np.inf)
print(phi)

In [None]:
# Showing 1-D results for  Bump RBF

phi = my_RBFs(a_R, b_R, eps=1, ord=2, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R, b_R, eps=0.5, ord=1, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R, b_R, eps=2, ord=np.inf, which_RBF='Bump')
print(phi)

In [None]:
# Showing 2-D results for Gaussian RBF

phi = my_RBFs(a_R2, b_R2, eps=1, ord=2)
print(phi)
print()

phi = my_RBFs(a_R2, b_R2, eps=0.5, ord=1)
print(phi)
print()

phi = my_RBFs(a_R2, b_R2, eps=2, ord=np.inf)
print(phi)

In [None]:
# Showing 2-D results for Bump RBF

phi = my_RBFs(a_R2, b_R2, eps=1, ord=2, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R2, b_R2, eps=0.5, ord=1, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R2, b_R2, eps=2, ord=np.inf, which_RBF='Bump')
print(phi)

In [None]:
# Showing 10-D results for Gaussian RBF

phi = my_RBFs(a_R10, b_R10, eps=1, ord=2)
print(phi)
print()

phi = my_RBFs(a_R10, b_R10, eps=0.5, ord=1)
print(phi)
print()

phi = my_RBFs(a_R10, b_R10, eps=2, ord=np.inf)
print(phi)

In [None]:
# Showing 10-D results for Bump RBF

phi = my_RBFs(a_R10, b_R10, eps=1, ord=2, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R10, b_R10, eps=0.5, ord=1, which_RBF='Bump')
print(phi)
print()

phi = my_RBFs(a_R10, b_R10, eps=2, ord=np.inf, which_RBF='Bump')
print(phi)

## Introducing [doctest](https://docs.python.org/3.8/library/doctest.html) (a preamble to problem 3)
---

The documentation for a [`doctest`](https://docs.python.org/3.8/library/doctest.html) is probably most useful to review after seeing a simple example. There are other Python modules available for testing your code such as [`unittest`](https://docs.python.org/3/library/unittest.html?highlight=unittest#module-unittest), but we will primarily focus on `doctest` for its simplicity in this course.

### The value of testing
---

Here, we briefly discuss the value of constructing tests for your code. This also reinforces the importance of documenting code as discussed in a lecture notebook.

The value of testing becomes apparent once we realize that coding is often an iterative process in practice that typically "stitches" together lots of different "pieces" to make a more useful "whole." 
Over time, individual pieces will be updated to improve functionality or fix issues that arise in operation. 
Common goals for improved functionality are to increase the speed of computations, allow for more flexible processing of data types, improved handling of "edge cases" (i.e., inputs that may "break" the original code in some way or given unexpected outputs), or providing new and useful features as the user-base or use-cases for the code mature. 
It is therefore critically important to make sure that these individual pieces are actually functioning correctly both prior to and following any updates. 
This is where testing comes into play.

### The purpose of `doctest` vs `unittest`
---

- The basic purpose of `doctest` is to perform tests written into your documentation of a function that demonstrate *typical use cases.* 

    - The tests should be simple and useful to someone trying to  understand what the function does.


- The basic purpose of `unittest` is to perform tests that are more "diagnostic" in nature to ensure that nothing "broke" in the code when either it or other parts of the code it may depend upon were updated. 


Consequently, I would say that unit tests are meant to be *thorough* whereas doc tests are meant to be *illuminating*. These are just my opinions though. You should formulate your own.

### A suggestion
---

You should probably spend 15-20 minutes reading more about testing and why it is useful. There are lots of blogs and articles written about unit testing. Find some on Google (or your favorite search engine). There is actually a nice discussion about the built-in Python modules `doctest` and `unittest` on [stackoverflow](https://stackoverflow.com/questions/361675/python-doctest-vs-unittest). 

### Without further ado...
---

We now dive into it. 

- Run the code cells below and interpret what is happening in the Markdown cell that follows. 

To truly understand what is happening, you may need to try editing the tests so that they *fail*, add new tests for yourself, etc. Play around with it.

In [None]:
import doctest

In [None]:
def add(a, b):
    '''
    This function adds a to b. Why do we need a function to do that?
    We don't. No reason. But, it is useful for illustrating how to use
    the doctest feature. 
    
    This is a test:
    >>> add(2, 2)
    4
    
    The above was a test. If we wanted to have more tests, then we could
    add some more. Notice the formatting requirements.
    
    We should also test that:
    >>> add(3,5)
    8
    
    >>> add(105.5, 0.5)
    106.0
    '''
    return a + b

In [None]:
doctest.testmod(verbose=True)

In [None]:
doctest.testmod()

In [None]:
def add_strings(a, b):
    '''
    What if I want to add a and b as if they were strings?
    
    >>> add_strings(2, 2)
    '22'
    
    >>> add_strings(3,5)
    '35'
    
    >>> add_strings(105.5, 0.5)
    '105.50.5'
    '''
    return add(str(a), str(b))

In [None]:
doctest.testmod(verbose=True) #This may be too much if we only want to test the add_strings function

In [None]:
doctest.run_docstring_examples(add_strings, globs=None, verbose=True)

<span style='background:rgba(255,255,0, 0.25); color:black'> THIS IS NOT A GRADED PART OF THE ASSIGNMENT BUT YOU SHOULD REALLY TRY TO EXPLAIN/INTERPRET THE ABOVE CODE CELLS HERE FOR YOUR OWN BENEFIT.</span>

<span style='background:rgba(255,0,255, 0.25); color:black'> AND JUST IN CASE YOU DID NOT EXPERIMENT WITH A FAILED TEST.</span>

If you did not experiment much with the above code cells, then consider the code cells below that show a failed test.

What is the proper takeaway when a test fails?

If a test fails, this does not necessarily mean that the test is bad. While it may be the case that a failed test is due to a poorly written test, a good test *should* fail when the code has something *wrong* with it. 

In general, assuming the test was "good" then a failed test basically means that the test is doing what it should. It is detecting possible errors in the code that should be fixed before the code is deployed to other users.

**In fact, the best thing a test can do for you is *fail* even if the code does not produce any errors.** It is good to let that sink in for a minute.

It is possible for code to execute but give unexpected/undesirable outputs.
Catching these with tests can save you (and the users of your code) a lot of frustration.

So, run the code cells below, explain what is going on in the Markdown cell, and explain what needs to be fixed in the code. 

In [None]:
def weird_combo(a, b):
    '''
    We treat a and b as strings.
    First, we add a to the first character in b.
    Second, we add b to the last character in a.
    Finally, we return the sum of these two results.
    
    >>> weird_combo(31, 24)
    '312241'
    '''
    
    first = add(str(a), str(b)[0])
    second = add(str(a)[-1], str(b))
                 
    return add(first, second)

In [None]:
doctest.run_docstring_examples(weird_combo, globs=None, verbose=True)

<span style='background:rgba(255,255,0, 0.25); color:black'> YOU SHOULD TRY TO EXPLAIN THE RESULTS OF THE ABOVE CODE CELLS HERE. THERE ARE TWO POSSIBLE INTERPRETATIONS TO CONSIDER: (1) THE TEST IS CORRECT SO THE FUNCTION HAS AN ERROR, OR (2) THE TEST IS INCORRECT AND THE FUNCTION IS FINE. WHICH DO YOU THINK IT IS? READ THE DOCSTRING CAREFULLY TO DETERMINE WHICH OF THESE IS THE MORE PLAUSIBLE SCENARIO.</span>

## Problem 3: Population model and doctest
---

We revisit the population model from problem 1.

- Copy/paste the solution function from problem 1(b) below, but now add a docstring to describe the function, its parameters, and outputs, and add at least two tests in the docstring based on easily computed inputs.

- Use `doctest` to test your function.

In [None]:
# Code solution function here with docstring and tests

In [None]:
# Use doctest here to show your tests pass

## Problem 4: Testing the `differences` module from the lecture
---

- Run the code below. 
- Add docstrings with useful tests to *every* function in the module. 
- Show that all your tests pass. 

In [None]:
import differences as diff

In [None]:
doctest.testmod(diff)

## Problem 5: Testing your module from the lecture
---

- Create some docstrings with tests for each function in the module you created as an activity at the end of the part-c lecture notebook. 
- Show that your tests pass.