# Math  1376: Programming for Data Science
---

## Module 02: Python basics 

## Learning Objectives


- Understand how to visualize data in 2- and 3-dimensions using `matplotlib`


- Understand plotting options that allow you to resize figures, create arrays of figures, and change how data are visualized.


- Know how to use some built-in `numpy` functions for creating regularly spaced input data that are useful for plotting purposes.

## Notebook contents <a name='Contents'>

* [Part (d): Plotting and `matplotlib`](#Plotting)

    * [Activity 1: Some basic plotting practice](#activity-plotting-basics)
    
    * [Activity 2: Using curve-fitting and plotting tools](#activity-plotting-fitted-curve)
    
    * [Activity: Summary](#activity-summary)


## Part (d): Plotting and `matplotlib` <a name='Plotting'>
---
    
**Expected time to completion: 2 hours**
    
<mark> Run the code cell below and click the "play" button to see the recorded lecture associated with this notebook.</mark> 

In [None]:
from IPython.display import YouTubeVideo

YouTubeVideo('XQAl2LomcLo', width=800, height=450)

The mathematician Richard Hamming once said, 
> The purpose of computing is insight, not numbers. 

and the best way to develop insight is often to visualize data. 

Visualization could be an entire suite of lectures (or even its own course), but we can explore a few features of Python’s `matplotlib` library here. While there is no “official” plotting library, this package is the de facto standard. First, we will import the `pyplot` module from `matplotlib`. 

A good tutorial to bookmark and peruse is this: https://matplotlib.org/stable/tutorials/introductory/pyplot.html

In fact, you are highly encouraged to follow that tutorial and turn its contents into your own notebook.

We will see other types of data visualization and tools throughout this course in the context of other types of problems (which is the best way to learn how to use these tools). 

### Understanding the components of a figure/plot
---

Python's `matplotlib` library emulates many features of Matlab plotting and uses the same layout for how it creates plots as illustrated below.

![Illustration of plotting with matplotlib goes here](https://github.com/CU-Denver-MathStats-OER/Programming-for-Data-Science/blob/main/Lectures-and-Assignments/02-Python-Basics/lectures/matplotlib_layout.png?raw=true "A figure contains axes (potentially multiple axes for multiple plots) and each axis (there can be up to 3 for a 3-d plot) can be labeled and the plot shown in the axes can be titled.")

In [None]:
import numpy as np

### A magic command for plotting in a notebook
---

The code cell below presents some options for how to plot inside of a notebook.

In [None]:
# The below is commented out and not necessary in Colab or in a Jupyter lab 
# environment; however, it is useful to see this in case you use other 
# environments to run notebooks so that you understand why the code is there.

# The next line enables the display of graphical output within Jupyter notebooks 
# in certain environments and is NOT needed outside of notebooks (e.g., when 
# creating plots via python scripts).
# %matplotlib inline 
# Can also use %matplotlib notebook for additional interactivity (may have some browser/OS dependencies)

#This next line IS needed even outside of Jupyter Notebook
import matplotlib.pyplot as plt 

In [None]:
plt.figure()  # This creates an empty figure
plt.show()  # There is nothing to show!

<mark> ***Key Points:*** <mark>
 
- The basic plot command plots xdata versus ydata.  The default behavior is to connect data pairs via a straight solid line.

- The `np.linspace(a,b,n)` generates $n$ points in the closed interval $[a,b]$, including the endpoints (so $n-2$ points are *interior* to the interval $[a,b]$). This is a commonly used command when wanting to generate regularly spaced data to plot.

In [None]:
plt.figure()

x = np.linspace(-np.pi,np.pi,1000)  # Creates 1000 points from -np.pi to np.pi

y = x*np.sin(1.0/x)  # Evalutes the function x*sin(1/x) at these x points from above

# You can add a title to the plot before or after you actually plot the data
plt.title('$f(x)=x\sin(x^{-1})$', fontsize=18)  # Try (1) changing fontsize; (2) changing the color of the font

plt.plot(x,y)  # This plots y vs x. Think of how points are usually written as (x,y) which is why we use plot(x,y)

---

## <mark>Activity 1: Some basic plotting practice</mark> <a id='activity-plotting-basics'></a>

You may find it useful to refer to the pyplot tutorial: https://matplotlib.org/stable/tutorials/introductory/pyplot.html

1. Copy/paste the code cell plotting $f(x)=x\sin(x^{-1})$ below.

2. Plot the function using a thicker red dashed-dotted curve. 

3. Add x- and y-axis labels.

4. Add/edit comments to at least three different lines of code to explain what they are doing.

End of Activity 1.

---


Another handy way of generating a vector/array of numbers for either computations or plotting is using `numpy.arange(start,stop,increment)`
This will fill up the half-open interval $[start,stop)$.

In [None]:
plt.figure()

x_1 = np.arange(-np.pi, np.pi, 1E-2)

y_1 = x_1*np.sin(x_1)

plt.plot(x_1, y_1, linestyle='--', c='k')  # dashed lines, k means black color

Let's do a *scatter* plot of a noisy linear function

In [None]:
xcor = np.random.rand(100)

ycor = 5*xcor + np.random.rand(100)

plt.scatter(xcor, ycor)

### Subplots and 3d plots using `mpl_toolkits`
Subplots are one way to arrange multiple plots into one figure. The subplot function takes the following arguments: **`add_subplot(nrows, ncols, plot_number)`**

You may find https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html to be a useful reference when determining how you want to index an array in 2- or 3-D. 

In [None]:
A = np.reshape(range(1,13),(3,4))

In [None]:
fig = plt.figure(num=1, figsize=(10, 6))

# Try commenting/uncommenting out parts of the code below and see what happens.

axes1 = fig.add_subplot(1, 3, 1)  # the first plot in a 1x3 array
# axes2 = fig.add_subplot(1, 3, 2)  # the second plot in a 1x3 array
axes3 = fig.add_subplot(1, 3, 3)  # the third plot in a 1x3 array

axes1.set_ylabel('average')
axes1.scatter(np.arange(A.shape[1]), np.mean(A, axis=0))
axes1.set_xticks(np.arange(A.shape[1]))
axes1.set_aspect(1)


# Using a triple set of tick marks creates what is called a "docstring" 
# and can be used to comment out a big chunk of code, but while convenient at
# times, this is not really good coding practices. We will discuss
# docstrings further in the next module.
''' 
axes2.set_ylabel('max')
axes2.plot(np.max(A, axis=0))
axes2.set_xticks(np.arange(A.shape[1]))
axes2.set_aspect(2)
'''

axes3.set_ylabel('min')
axes3.plot(np.min(A, axis=0))
axes3.set_aspect(3)

fig.tight_layout()

In [None]:
# We will pretend that A is a function over the unit square 
# in the xy-plane that we want to plot
x = np.linspace(0,1,4)  # we create a regular uniform grid in the x-direction
y = np.linspace(0,1,3)  # we create a regular uniform grid in the y-direction
x, y = np.meshgrid(x,y,indexing='xy')  # we then create a meshgrid in the xy-plane
#print(x)
#print(y)
#print(A)

from mpl_toolkits.mplot3d import axes3d  # This enables 3d plotting

fig = plt.figure(2, figsize=(10, 6))

ax1 = fig.add_subplot(1, 3, 1, projection='3d')
ax1.scatter(x, y, A)  # we then plot A over this grid as a scatter plot
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('A')

ax2 = fig.add_subplot(1, 3, 2, projection='3d')
ax2.plot_wireframe(x, y, A)  # we then plot A over this grid as a wireframe
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_zlabel('A')

from matplotlib import cm  # Allow for more colormaps
ax3 = fig.add_subplot(1, 3, 3, projection='3d')
ax3.plot_surface(x, y, A, rstride=1, cstride=1, cmap=cm.coolwarm)  # we then plot A over this grid as a surface
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.set_zlabel('A')

plt.tight_layout()

---

##  <mark>Activity 2: Using curve-fitting and plotting tools</mark> <a name='activity-plotting-fitted-curve'></a>

`numpy` has a `polyfit` function (https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html) to perform least-squares fits of polynomials to data.

Least-squares is a type of regression that is ***very common*** in the computational and data sciences and is heavily used in machine learning, artificial intelligence, statistics, etc.

In the code cells below, `num_data` denotes the number of data points used to fit a polynomial curve to the noisy data defined by (`xdata`,`ydata`) where the `xdata` belongs to the interval [-4,4].
Finish the code cells below so that

   * A scatter plot of (`xdata`,`ydata`) is generated;
   
   * A third-order polynomial is fitted to the noisy data (read the `polyfit` documentation and look over the examples to see how to use `poly1d` to generate a polynomial function `p` from the output of the `polyfit` function);
   
   * Use `linspace` within `numpy` to create a regular uniform grid of 100 points in [-4,4] called `xgrid` and plot (`xgrid`,`p(xgrid)`) on the same plot as the scatter of the noisy data.

In [None]:
# Simulate data
# Question: What is the difference between np.random.rand and np.random.randn? 

num_data = 100

xdata = np.random.rand(num_data)*8-4  # Transform random numbers in [0,1] to random numbers in [-4,4]

ydata = -xdata**3 + 2*xdata**2 + xdata + 2 + np.random.randn(num_data)*10  # Generate noisy data

# Now try to fit a model to the data (your work goes below)

In [None]:
plt.figure()

plt.scatter( , )  # Complete this to plot the ydata vs the xdata

z = np.polyfit( , , )  # Complete this (refer to the polyfit documentation)

p = np.poly1d( )  # Complete this (refer to the polyfit documentation)

xgrid = np.linspace( , , )  # Complete this to make 100 points between -4 and 4

plt.plot( ,  ,c='r', linestyle='-.', linewidth=4, label='Best fit')  # Complete this to plot p(xgrid) vs xgrid

# Below we also plot the "true signal" that generated the noisy data
y_noise_free = -xgrid**3 + 2*xgrid**2 + xgrid + 2

plt.plot(xgrid, y_noise_free, c='k', linestyle='--', linewidth=2, label='Truth')

plt.legend(fontsize=12)

End of Activity 2.

---


## <mark>Activity: Summary</mark> <a name='activity-summary'/>

Summarize some of the key takeaways/points from this notebook in a list below and prepare a few code examples related to these takeaways/points in the code cells below. You need to have at least one example for each of your summary points and you need at least three summary points.

In this notebook, we have seen the following:

- [Your summary point 1 goes here]




- [Your summary point 2 goes here]




- [Your summary point 3 goes here]

<hr style="border:5px solid cyan"> </hr>


# So what is next in Module 03?

Much of scientific programming and data science involves applications of basic logic (e.g., using conditional statements to determine an action), repeating operations across arrays (e.g., using for-loops), and making user-defined functions to handle problem-specific issues. We will study these ideas in more depth in the next module.

### [Click here to return to Notebook Contents](#Contents)