# Introduction to Python, Jupyter, and Matplotlib
This is a Jupyter Python notebook, which is a collection of cells. Each cell is either of type 'markdown' (formatted text, like this cell) or code (python, grey background). The two most important rules of Jupyter Notebooks are:
1. ***SHIFT-ENTER*** will cause the current cell to execute. 
  - For Markdown cells, 'execute' means render the formatting. ([Here's a markdown cheatsheet](https://sqlbak.com/blog/wp-content/uploads/2020/04/Jupyter-Notebook-Markdown-Cheatsheet.pdf))
  - For Code cells, 'execute' means run the python.
  - Some Code cells take a while to execute, watch for the * to change to a number
1. Any cell can be edited (double-click into it) and re-executed (SHIFT-ENTER again).
--- 
The first code in any Python script/Jupyter notebook, needs to import any libraries that will be used. The `as` directives allow specification of nicknames that are more convenient to type.

In [None]:
import numpy             as np    # all kinds of numerical and matrix capabilities in here
import matplotlib.pyplot as plt   # this is the only nonstandard library we need for this notebook

The cell below creates a variable `xs` and puts a list of numbers into it. The syntax (punctuation) is very important to get right. This is equivalent to 'Make a variable' named `xs` in Snap! and then `(set xs to (list ...))`

In [None]:
exes = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]

One thing you can do it Jupyter Notebooks that you can't do in regular Python scripts (programs), is to just mention something at the end of a code cell, which is a request for the notebook to display it.

In [None]:
exes

Python has a function called `len()`, which unsurprisingly returns the length of a list (just like in the Snap! Variables palette there's a `(length of list)` block -- *all programming languages are essentially the same like this*)

In [None]:
len(exes)

Double-click into this next cell and change the index to different values to see what happens. Can you break it?

In [None]:
exes[0]

In [None]:
wise = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]

In [None]:
wise

Now that we have (matching-length) exes and wise to graph, we can use matplotlib to graph them. Here's probably the simplest possible graph. (Note that a code cell can have multiple commands in it)

In [None]:
plt.figure()             # always start a new graph with plt.figure() to clear away anything previous
plt.plot(exes, wise)     # plot() makes an X/Y plot; other functions include scatter(), bar(), ...
plt.show()               # when all setup is done, this causes the plot to render

That's not too useful, because the default behavior of `plot()` is a line graph. Let's take a quick look at a default scatterplot:

In [None]:
plt.figure()             # same as before
plt.scatter(exes, wise)  # use the function scatter() instead of plot()
plt.show()               # same as before

Let's go back to `plot()` though, to learn about options to control appearance.

In this command, `exes, wise` are **positional** arguments, plot knows what to do with them based on what order they are in at the front.

The rest are all **named**, optional arguments. They can be present or absent, in any order. But they are all of the form name=value (and text values get quotes).

All parameters, whether positional or named, are separated by commas.

In [None]:
plt.figure()
plt.plot(exes, wise,   # Long lines can be continued, but they should be indented
         color='red',       # Note for each of these named parameters, the input parameter name is NOT in quotes
         linestyle='solid', # Input values which are text MUST be in quotes, it can be '' or ""
         linewidth=1,       # Input values which are numbers do not get quotes, that would turn them into text
         marker='o',        # Also it is important to have commas after each parameter
         markersize=10,     
         markeredgecolor='g',
         markeredgewidth=3,
         markerfacecolor='k') # plot's argument list opened with a (, and at the end must close with a matching )   
plt.show()

Many `plot()` input parameters have convenient nicknames, as do the values that can be passed into them. The one line plot command below is the same as the multiline one above:

In [None]:
plt.figure()
plt.plot(exes, wise, c='r', ls='-', lw=1, marker='o', ms=10, mec='g', mew=3, mfc='k')
plt.show()

Experiment with the plot above by testing various types of (don't forget quotes as appropriate):
* colors: r g b k y c m purple orange teal darkgreen ...
* linestyle: `'' -  --  -.  :` or `'None', 'solid'`
* marker: . , o + x * s d v ^ < > 
* sizes: 1, 3.5, ...

Keep this link handy for a reminder of what's available: https://matplotlib.org/2.1.1/api/_as_gen/matplotlib.pyplot.plot.html

### Multiple Series
Now let's add a second set of y values for the same x values:

In [None]:
whys = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]

Additional data series can be added with additional `plot()` commands, and their appearance can be controlled separately:

In [None]:
plt.figure()
plt.plot(exes, wise, c='b', marker='o', ls='')
plt.plot(exes, whys, c='g', marker='^', ls='')
plt.show()

Data series don't have to be explicitly spelled out, they can be functional.

In [None]:
# In Python, range intervals are always Inclusive on the left side, and Exclusive on the right side,
# so this ends up [3.5, 4.0, 4.5, ... 14.5]
xrange = np.arange(3.5, 15.0, 0.5)

We choose this range because exes is from 4...14, so 3.5...14.5 will give a line that nicely extrapolates a little bit. Let's take a look at it:

In [None]:
xrange

`numpy` allows us to arithmetically manipulate all the values in this array at once. Check these out:

In [None]:
xrange*2

In [None]:
xrange*2 - 5

In [None]:
-1.5*xrange*xrange + 0.5*xrange + 3

In [None]:
plt.figure()
plt.plot(exes, wise, c='b', marker='o', ls='')
plt.plot(exes, whys, c='g', marker='^', ls='')
plt.plot(xrange,   1.5*xrange - 1,                  c='b', ls=':')  # plot the line  y = 1.5x - 1
plt.plot(xrange, -.127*xrange*xrange + 2.78*xrange, c='g', ls=':')  # plot the curve y = -0.127x^2 + 2.78x
plt.show()

# Exercise
In the plot above, fiddle with the coefficients of the line and the curve to get them to fit the data better

---

### Labeling
Matplotlib has many options for controlling the appearance of graphs beyond the data series themselves.

In [None]:
fig = plt.figure()          # before we have always ignored the object returned by plt.figure()
ax  = fig.add_subplot(111)  # this time we need it to get an 'Axes' object. 111 means 1x1 grid, subplot number 1
plt.plot(exes, wise, c='b', marker='o', ls='')
plt.plot(exes, whys, c='g', marker='^', ls='')
ax.set_title('Cambridge students need more work')
ax.set_xlabel('Weekly hours of homework per class')
ax.set_ylabel('Educational effectiveness')
ax.set_xlim(0, 20)          # These are bad choices!
ax.set_ylim(0, 15)
ax.set_xticks( exes )
ax.set_yticks( [2, 3.14, 5, 6, 12] )
ax.legend(['actual', 'expected'], 
          loc='lower right') # Usually you don't need this option
plt.plot([4.5], [4], marker='x', c='r', ms=10)
plt.annotate('We are here', (4.5, 4), xytext=(7,3), color='r',
             arrowprops=dict(arrowstyle="->", color='r') )     # this is a bundle of properties for the arrow
plt.show()

### Subplots
A figure can be a grid of multiple subplots

In [None]:
# Two more datasets for 'Anscombe's Quartet'
wyes = [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73] # this 3rd one also uses x=exes

x4 = [8,    8,    8,    8,    8,    8,    8,    19,    8,    8,    8]
y4 = [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]

In [None]:
fig = plt.figure(figsize=(15,12))   # size in 'inches', width, height

axI = fig.add_subplot(221)          # there are going to be 2x2, this is the first 1 (upper left)
axI.plot(exes, wise, c='b', marker='o', ls='')
axI.set_title('I: Linear scatter')

axII = fig.add_subplot(222)         #                        ... this is subplot 2 (upper right)
axII.plot(exes, whys, c='r', marker='s', ls='')
axII.set_title('II: Quadratic curve')

axIII = fig.add_subplot(223)        #                        ... this is subplot 3 (next row)
axIII.plot(exes, wyes, c='g', marker='^', ls='')
axIII.set_title('III: Line with outlier')

axIV = fig.add_subplot(224)        #                        ... this is subplot 4 (last one)
axIV.plot(x4, y4, c='orange', marker='^', ls='')
axIV.set_title('IV: Line with really bad outlier')

plt.show()

# Exercise
The four subplots in the figure above are called [Anscombe's Quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet). If you use linear regression to solve for the least-squares best-fit line through all four of those datasets, the result is identical. Graphs III and IV in particular illustrate the influence of outliers in datasets (and thus the importance of removing justifiable outliers before analysis).

**Add that best-fit line (y=0.5x+3) to each of the four subplots above. Make them look good** (color, linestyle, range). In particular, IV will need a different xrange than I-III. Don't reuse the variable name xrange, or it will mess up I-III.

Once your plot is done, File/Download As/.ipynb, and submit that file.