# <img style="float: left; padding-right: 10px; width: 45px" src="https://github.com/Harvard-IACS/2018-CS109A/blob/master/content/styles/iacs.png?raw=true"> CS109A  Introduction to Data Science 


## Lab 2 Post lab: Numpy and Post lab


**Harvard University**<br>
**Fall 2019**<br>
**Instructors:** Pavlos Protopapas, Kevin Rader, and Chris Tanner<br>

**Material prepared by**: David Sondak, Will Claybaugh, Pavlos Protopapas, and Eleni Kaxiras.

---

In [31]:
#RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)

## Learning Goals

By the end of this lab, you should be able to:
* Review `numpy` including 2-D arrays and understand array reshaping
* Use `matplotlib` to make plots


## Table of Contents

#### <font color='red'> HIGHLIGHTS FROM PRE-LAB </font>

* [1 - Review of numpy](#first-bullet)
* [2 - Intro to matplotlib plus more ](#second-bullet)


In [None]:
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
import time
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
#import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Displays the plots for us.
%matplotlib inline

In [3]:
# Use this as a variable to load solutions: %load PATHTOSOLUTIONS/exercise1.py. It will be substituted in the code
# so do not worry if it disappears after you run the cell.
PATHTOSOLUTIONS = 'solutions'

<a class="anchor" id="first-bullet"></a>
## 1 - Review of  the  `numpy` Python library

In lab1 we learned about the `numpy` library [(documentation)](http://www.numpy.org/) and its fast array structure, called the `numpy array`. 

In [4]:
# import numpy
import numpy as np

In [None]:
# make an array
my_array = np.array([1,4,9,16])
my_array

In [None]:
print(f'Size of my array: {my_array.size}, or length of my array: {len(my_array)}')
print (f'Shape of my array: {my_array.shape}')

#### Notice the way the shape appears in numpy arrays

- For a 1D array, .shape returns a tuple with 1 element (n,)
- For a 2D array, .shape returns a tuple with 2 elements (n,m)
- For a 3D array, .shape returns a tuple with 3 elements (n,m,p)

In [None]:
# How to reshape a 1D array to a 2D
my_array.reshape(-1,2)

Numpy arrays support the same operations as lists! Below we slice and iterate. 

In [None]:
print("array[2:4]:", my_array[2:4]) # A slice of the array

# Iterate over the array
for ele in my_array:
    print("element:", ele)

Remember `numpy` gains a lot of its efficiency from being **strongly typed** (all elements are of the same type, such as integer or floating point). If the elements of an array are of a different type, `numpy` will force them into the same type (the longest in terms of bytes)

In [None]:
mixed = np.array([1, 2.3, 'eleni', True])
print(type(1), type(2.3), type('eleni'), type(True))
mixed # all elements will become strings

Next, we push ahead to two-dimensional arrays and begin to dive into some of the deeper aspects of `numpy`.

In [None]:
# create a 2d-array by handing a list of lists
my_array2d = np.array([ [1, 2, 3, 4], 
                        [5, 6, 7, 8], 
                        [9, 10, 11, 12] 
])

my_array2d

### Array Slicing (a reminder...)

Numpy arrays can be sliced, and can be iterated over with loops.  Below is a schematic illustrating slicing two-dimensional arrays.  

 <img src="../images/2dindex_v2.png" alt="Drawing" style="width: 500px;"/>
 
Notice that the list slicing syntax still works!  
`array[2:,3]` says "in the array, get rows 2 through the end, column 3]"  
`array[3,:]` says "in the array, get row 3, all columns".

### Pandas Slicing (a reminder...)

`.iloc` is by position (position is unique), `.loc` is by label (label is not unique)

In [None]:
# import cast dataframe 
cast = pd.read_csv('../data/mtcars.csv', encoding='utf_8')
cast.head()

In [None]:
# get me rows 10 to 13 (python slicing style : exclusive of end) 
cast.iloc[10:13]

In [None]:
# get me columns 0 to 2 but all rows - use head()
cast.iloc[:, 0:2].head()

In [None]:
# get me rows 10 to 13 AND only columns 0 to 2
cast.iloc[10:13, 0:2]

In [None]:
# COMPARE: get me rows 10 to 13 (pandas slicing style : inclusive of end)
cast.loc[10:13]

In [None]:
# give me columns 'year' and 'type' by label but only for rows 5 to 10
cast.loc[5:10,['year','type']]

### Python Trick of the Day

In [46]:
import re
names = ['mayday','springday','horseday','june']

In [None]:
# TODO : substitute these lines code with 1 line of code using list comprehension

cleaned = []
for name in names:
    this = re.sub('[Dd]ay$', '', name)
    cleaned.append(this)
cleaned

In [48]:
# your code here


In [None]:
# solution
cleaned2 = [re.sub('[Dd]ay$', '', name) for name in names]
cleaned2

<a class="anchor" id="second-bullet"></a>
## 2 - Plotting with matplotlib and beyond
<br>
<img style="float: center" src="https://imgs.xkcd.com/comics/convincing.png"> 

`matplotlib` is a very powerful `python` library for making scientific plots. 

We will not focus too much on the internal aspects of `matplotlib` in today's lab. There are many excellent tutorials out there for `matplotlib`.  For example,
* [`matplotlib` homepage](https://matplotlib.org/)
* [`matplotlib` tutorial](https://github.com/matplotlib/AnatomyOfMatplotlib)

Conveying your findings convincingly is an absolutely crucial part of any analysis. Therefore, you must be able to write well and make compelling visuals.  Creating informative visuals is an involved process and we won't cover that in this lab.  However, part of creating informative data visualizations means generating *readable* figures.  If people can't read your figures or have a difficult time interpreting them, they won't understand the results of your work.  Here are some non-negotiable commandments for any plot:
* Label $x$ and $y$ axes
* Axes labels should be informative
* Axes labels should be large enough to read
* Make tick labels large enough
* Include a legend if necessary
* Include a title if necessary
* Use appropriate line widths
* Use different line styles for different lines on the plot
* Use different markers for different lines

There are other important elements, but that list should get you started on your way.

We will work with `matplotlib` and `seaborn` for plotting in this class.  `matplotlib` is a very powerful `python` library for making scientific plots.  `seaborn` is a little more specialized in that it was developed for statistical data visualization.  We will cover some `seaborn` later in class. In the meantime you can look at the [seaborn documentation](https://seaborn.pydata.org)

First, let's generate some data.

#### Let's plot some functions

We will use the following three functions to make some plots:

* Logistic function:
  \begin{align*}
    f\left(z\right) = \dfrac{1}{1 + be^{-az}}
  \end{align*}
  where $a$ and $b$ are parameters.
* Hyperbolic tangent:
  \begin{align*}
    g\left(z\right) = b\tanh\left(az\right) + c
  \end{align*}
  where $a$, $b$, and $c$ are parameters.
* Rectified Linear Unit:
  \begin{align*}
    h\left(z\right) = 
    \left\{
      \begin{array}{lr}
        z, \quad z > 0 \\
        \epsilon z, \quad z\leq 0
      \end{array}
    \right.
  \end{align*}
  where $\epsilon < 0$ is a small, positive parameter.

You are given the code for the first two functions.  Notice that $z$ is passed in as a `numpy` array and that the functions are returned as `numpy` arrays.  Parameters are passed in as floats.

You should write a function to compute the rectified linear unit.  The input should be a `numpy` array for $z$ and a positive float for $\epsilon$.

In [50]:
import numpy as np

def logistic(z: np.ndarray, a: float, b: float) -> np.ndarray:
    """ Compute logistic function
      Inputs:
         a: exponential parameter
         b: exponential prefactor
         z: numpy array; domain
      Outputs:
         f: numpy array of floats, logistic function
    """
    
    den = 1.0 + b * np.exp(-a * z)
    return 1.0 / den

def stretch_tanh(z: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
    """ Compute stretched hyperbolic tangent
      Inputs:
         a: horizontal stretch parameter (a>1 implies a horizontal squish)
         b: vertical stretch parameter
         c: vertical shift parameter
         z: numpy array; domain
      Outputs:
         g: numpy array of floats, stretched tanh
    """
    return b * np.tanh(a * z) + c

def relu(z: np.ndarray, eps: float = 0.01) -> np.ndarray:
    """ Compute rectificed linear unit
      Inputs:
         eps: small positive parameter
         z: numpy array; domain
      Outputs:
         h: numpy array; relu
    """
    return np.fmax(z, eps * z)

Now let's make some plots.  First, let's just warm up and plot the logistic function.

In [51]:
x = np.linspace(-5.0, 5.0, 100) # Equally spaced grid of 100 pts between -5 and 5

f = logistic(x, 1.0, 1.0) # Generate data

In [None]:
plt.plot(x, f)
plt.xlabel('x')
plt.ylabel('f')
plt.title('Logistic Function')
plt.grid(True)

#### Figures with subplots

Let's start thinking about the plots as objects. We have the `figure` object which is like a matrix of smaller plots named `axes`. You can use array notation when handling it. 

In [None]:
fig, ax = plt.subplots(1,1) # Get figure and axes objects

ax.plot(x, f) # Make a plot

# Create some labels
ax.set_xlabel('x')
ax.set_ylabel('f')
ax.set_title('Logistic Function')

# Grid
ax.grid(True)

Wow, it's *exactly* the same plot!  Notice, however, the use of `ax.set_xlabel()` instead of `plt.xlabel()`.  The difference is tiny, but you should be aware of it.  I will use this plotting syntax from now on.

What else do we need to do to make this figure better?  Here are some options:
* Make labels bigger!
* Make line fatter
* Make tick mark labels bigger
* Make the grid less pronounced
* Make figure bigger

Let's get to it.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10,6)) # Make figure bigger

# Make line plot
ax.plot(x, f, lw=4)

# Update ticklabel size
ax.tick_params(labelsize=24)

# Make labels
ax.set_xlabel(r'$x$', fontsize=24) # Use TeX for mathematical rendering
ax.set_ylabel(r'$f(x)$', fontsize=24) # Use TeX for mathematical rendering
ax.set_title('Logistic Function', fontsize=24)

ax.grid(True, lw=1.5, ls='--', alpha=0.75)

Notice:
* `lw` stands for `linewidth`.  We could also write `ax.plot(x, f, linewidth=4)`
* `ls` stands for `linestyle`.
* `alpha` stands for transparency.

The only thing remaining to do is to change the $x$ limits.  Clearly these should go from $-5$ to $5$.

In [55]:
#fig.savefig('logistic.png')

# Put this in a markdown cell and uncomment this to check what you saved.
# ![](../images/logistic.png)

#### Resources
If you want to see all the styles available, please take a look at the documentation.
* [Line styles](https://matplotlib.org/2.0.1/api/lines_api.html#matplotlib.lines.Line2D.set_linestyle)
* [Marker styles](https://matplotlib.org/2.0.1/api/markers_api.html#module-matplotlib.markers)
* [Everything you could ever want](https://matplotlib.org/2.0.1/api/lines_api.html#matplotlib.lines.Line2D.set_marker)

We haven't discussed it yet, but you can also put a legend on a figure.  You'll do that in the next exercise.  Here are some additional resources:
* [Legend](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html)
* [Grid](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.grid.html)

`ax.legend(loc='best', fontsize=24);`

<div class="exercise"><b>Exercise</b></div>

Do the following:
* Make a figure with the logistic function, hyperbolic tangent, and rectified linear unit.
* Use different line styles for each plot
* Put a legend on your figure

Here's an example of a figure:
![](../images/nice_plots.png)

In [None]:
# your code here

# First get the data
f = logistic(x, 2.0, 1.0)
g = stretch_tanh(x, 2.0, 0.5, 0.5)
h = relu(x)

fig, ax = plt.subplots(1,1, figsize=(10,6)) # Create figure object

# Make actual plots
# (Notice the label argument!)
ax.plot(x, f, lw=4, ls='-', label=r'$L(x;1)$')
ax.plot(x, g, lw=4, ls='--', label=r'$\tanh(2x)$')
ax.plot(x, h, lw=4, ls='-.', label=r'$relu(x; 0.01)$')

# Make the tick labels readable
ax.tick_params(labelsize=24)

# Set axes limits to make the scale nice
ax.set_xlim(x.min(), x.max())
ax.set_ylim(h.min(), 1.1)

# Make readable labels
ax.set_xlabel(r'$x$', fontsize=24)
ax.set_ylabel(r'$h(x)$', fontsize=24)
ax.set_title('Activation Functions', fontsize=24)

# Set up grid
ax.grid(True, lw=1.75, ls='--', alpha=0.75)

# Put legend on figure
ax.legend(loc='best', fontsize=24);

fig.savefig('../images/nice_plots.png')

<div class="exercise"><b>Exercise</b></div>

These figures look nice in the plot and it makes sense for comparison. Now let's put the 3 different figures in separate plots.

* Make a separate plot for each figure and line them up on the same row.

In [57]:
# your code here


In [None]:
# %load solutions/three_subplots.py
# First get the data
f = logistic(x, 2.0, 1.0)
g = stretch_tanh(x, 2.0, 0.5, 0.5)
h = relu(x)

fig, ax = plt.subplots(1,3, figsize=(20,6)) # Create figure object

# Make actual plots
ax[0].plot(x, f, lw=4, ls='-', label=r'$L(x;1)$')
ax[1].plot(x, g, lw=4, ls='--', label=r'$\tanh(2x)$')
ax[2].plot(x, h, lw=4, ls='-.', label=r'$relu(x; 0.01)$')

# Make the tick labels readable
ax[0].tick_params(labelsize=24)
ax[1].tick_params(labelsize=24)
ax[2].tick_params(labelsize=24)

# Set axes limits to make the scale nice
ax[0].set_xlim(x.min(), x.max())
ax[0].set_ylim(h.min(), 1.1)
ax[1].set_xlim(x.min(), x.max())
ax[1].set_ylim(h.min(), 1.1)
ax[2].set_xlim(x.min(), x.max())
ax[2].set_ylim(h.min(), 1.1)

# Make readable labels
ax[0].set_xlabel(r'$x$', fontsize=24)
ax[0].set_ylabel(r'$h(x)$', fontsize=24)
ax[0].set_title('Activation Functions', fontsize=24)

ax[1].set_xlabel(r'$x$', fontsize=24)
ax[1].set_ylabel(r'$h(x)$', fontsize=24)
ax[1].set_title('Activation Functions', fontsize=24)

ax[2].set_xlabel(r'$x$', fontsize=24)
ax[2].set_ylabel(r'$h(x)$', fontsize=24)
ax[2].set_title('Activation Functions', fontsize=24)

# Set up grid
ax[0].grid(True, lw=1.75, ls='--', alpha=0.75)
ax[1].grid(True, lw=1.75, ls='--', alpha=0.75)
ax[2].grid(True, lw=1.75, ls='--', alpha=0.75)

# Put legend on figure
ax[0].legend(loc='best', fontsize=24);
ax[1].legend(loc='best', fontsize=24);
ax[2].legend(loc='best', fontsize=24);

#fig.savefig('../images/nice_sub_plots.png')


<div class="exercise"><b>Exercise</b></div>

* Make a grid of 2 x 3 separate plots, 3 will be empty. Just plot the functions and do not worry about cosmetics. We just want you ro see the functionality.

In [59]:
# your code here


In [None]:
# %load solutions/six_subplots.py

# First get the data
f = logistic(x, 2.0, 1.0)
g = stretch_tanh(x, 2.0, 0.5, 0.5)
h = relu(x)

fig, ax = plt.subplots(2,3, figsize=(20,6)) # Create figure object

# Make actual plots
ax[0][0].plot(x, f, lw=4, ls='-', label=r'$L(x;1)$')
ax[1][1].plot(x, g, lw=4, ls='--', label=r'$\tanh(2x)$')
ax[1][2].plot(x, h, lw=4, ls='-.', label=r'$relu(x; 0.01)$')
ax[0][2].plot(x, h, lw=4, ls='-.', label=r'$relu(x; 0.01)$')
