# Using `numpy` and `matplotlib`

Notebook inspired from Mark Bakker's [page](http://mbakker7.github.io/exploratory_computing_with_python/) at TU Delft.

Plotting is not part of standard Python, but a nice package exists to create pretty graphics (and ugly ones, if you want). A package is a library of functions for a specific set of tasks. There are many Python packages and we will use several of them. The graphics package we use is called `matplotlib`. To be able to use the plotting functions in `matplotlib`, we have to import it. For now, we import the plotting part of `matplotlib` and call it `plt`. 

Before we import `matplotlib`, we tell the Jupyter Notebook to show any graphs inside this Notebook and not in a separate window using the *magic* command `%matplotlib inline`.

Similarly, we are going to import the package `numpy` and call it `np`, so that any function in the `numpy` package may be called as `np.function`. The package `numpy` is used to do linear and tensor algebra. 

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

Packages only have to be imported once in a Python session. After the above import statement, any plotting function may be called from any code cell as `plt.function`. For example

### Basic plotting and a first array

In [None]:
plt.plot([1, 2, 4, 2])

Let's try to plot y vs x for x going from -4 to +4 for the polynomial

$y=x^2+x-4$

To do that, we need to evaluate y at a bunch of points. A sequence of values of the same type is called an array (for example an array of integers or floats). 

To create an array `x` consisting of, for example, 5 equally spaced points between `-4` and `4`, use the `linspace` command

In [None]:
x = np.linspace(start=-4, stop=4, num=5)
print(x)

# This easier syntax works too
x = np.linspace(-4, 4, 5)
print(x)

In the above cell, `x` is an array of 5 floats (`-4.` is a float, `-4` is an integer).
Let's plot y using 100 x values from 
-4 to +4.

In [None]:
a = 1
b = 1
c = -6
x = np.linspace(-4, 4, 100)
y = a * x ** 2 + b * x + c  # Compute y for all x values
plt.plot(x, y)

Note that  *one hundred* `y` values are computed in the simple line `y = a * x ** 2 + b * x + c`. Python treats arrays in the same fashion as it treats regular variables when you perform mathematical operations. The math is simply applied to every value in the array (and it runs much faster than when you would do every calculation separately). 

You may wonder what the statement like `[<matplotlib.lines.Line2D at 0x30990b0>]` is (the numbers above on your machine may look different). This is actually a handle to the line that is created with the last command in the code block (in this case `plt.plot(x, y)`). Remember: the result of the last line in a code cell is printed to the screen, unless it is stored in a variable. You can tell the Notebook not to print this to the screen by putting a semicolon after the last command in the code block (so type `plot(x, y);`). We will learn later on that it may also be useful to store this handle in a variable.

The `plot` function can take many arguments. Looking at the help box of the `plot` function, by typing `plt.plot(` and then shift-tab, gives you a lot of help. Typing `plt.plot?` gives a new scrollable subwindow at the bottom of the notebook, showing the documentation on `plot`. Click the x in the upper right hand corner to close the subwindow again.

In short, `plot` can be used with one argument as `plot(y)`, which plots `y` values along the vertical axis and enumerates the horizontal axis starting at 0. `plot(x, y)` plots `y` vs `x`, and `plot(x, y, formatstring)` plots `y` vs `x` using colors and markers defined in `formatstring`, which can be a lot of things. It can be used to define the color, for example `'b'` for blue, `'r'` for red, and `'g'` for green. Or it can be used to define the linetype `'-'` for line, `'--'` for dashed, `':'` for dots. Or you can define markers, for example `'o'` for circles and `'s'` for squares. You can even combine them: `'r--'` gives a red dashed line, while `'go'` gives green circular markers. 

If that isn't enough, `plot` takes a large number of keyword arguments. A keyword argument is an optional argument that may be added to a function. The syntax is `function(keyword1=value1, keyword2=value2)`, etc. For example, to plot a line with width 6 (the default is 1), type

In [None]:
plt.plot([1, 2, 3], [2, 4, 3], linewidth=6);

Keyword arguments should come after regular arguments. `plot(linewidth=6, [1, 2, 3], [2, 4, 3])` gives an error.

Names may be added along the axes with the `xlabel` and `ylabel` functions, e.g., `plt.xlabel('this is the x-axis')`. Note that both functions take a string as argument. A title can be added to the figure with the `plt.title` command. Multiple curves can be added to the same figure by giving multiple plotting commands in the same code cell. They are automatically added to the same figure.

### New figure and figure size

Whenever you give a plotting statement in a code cell, a figure with a default size is automatically created, and all subsequent plotting statements in the code cell are added to the same figure. If you want a different size of the figure, you can create a figure first with the desired figure size using the `plt.figure(figsize=(width, height))` syntax. Any subsequent plotting statement in the code cell is then added to the figure. You can even create a second figure (or third or fourth...).

In [None]:
plt.figure(figsize=(10, 3))
plt.plot([1, 2, 3], [2, 4, 3], linewidth=6)
plt.title('very wide figure')
plt.figure()  # new figure of default size
plt.plot([1, 2, 3], [1, 3, 1], 'r')
plt.title('second figure');

### <a name="ex2"></a> Exercise 1. First graph
Plot 

$$y=(x+2)(x-1)(x-2)$$ 

for `x` going from -3 to +3 using a dashed red line. On the same figure, plot a blue circle for every point where `y` equals zero. Set the size of the markers to 10 (you may need to read the help of `plt.plot` to find out how to do that). Label the axes as 'x-axis' and 'y-axis'. Add the title 'First nice Python figure of Your Name', where you enter your own name.

In [None]:
plt.plot?

### Style

As was already mentioned above, good coding style is important. It makes the code easier to read so that it is much easier to find errors and bugs. For example, consider the code below, which recreates the graph we produced earlier (with a wider line), but now there are no additional spaces inserted

In [None]:
a=1
b=1
c=-6
x=np.linspace(-4,4,100)
y=a*x**2+b*x+c#Compute y for all x values
plt.plot(x,y,linewidth=3)

The code in the previous code cell is difficult to read. Good style includes at least the following:
* spaces around every mathematical symbol (`=`, `+`, `-`, `*`, `/`), but not needed around `**`
* spaces between arguments of a function
* no spaces around an equal sign for a keyword argument (so `linewidth=3` is correct)
* one space after every comma
* one space after each `#`
* two spaces before a `#` when it follows a Python statement
* no space between the function name and the list of arguments. So `plt.plot(x, y)` is good style, and `plt.plot (x, y)` is not good style.

These rules are (a very small part of) the official Python style guide called PEP8. When these rules are applied, the code is *much* easier to read, as you can see below:

In [None]:
a = 1
b = 1
c = -6
x = np.linspace(-4, 4, 100)
y = a * x**2 + b * x + c  # Compute y for all x values
plt.plot(x, y, linewidth=3);

Use correct style in all other exercises and all Notebooks to come. 

### Exercise 2. First graph revisited
Go back to your Exercise 1 and apply correct style. (You can check you code on [PEP8 online](http://pep8online.com) for instance.)

### Loading data files

Numerical data can be loaded from a data file using the `loadtxt` function of `numpy`; i.e., the command is `np.loadtxt`. You need to make sure the file is in the same directory as your notebook, or provide the full path. The filename (or path plus filename) needs to be between quotes. 

### Exercise 3. Loading data and adding a legend
You are provided with the data files containing the mean montly temperature of Holland, New York City, and Beijing. The Dutch data is stored in `holland_temperature.dat`, and the other filenames are similar. Plot the temperature for each location against the number of the month (starting with 1 for January) all in a single graph. Add a legend by using the function `plt.legend(['line1','line2'])`, etc., but then with more descriptive names. Find out about the `legend` command using `plt.legend?`. Place the legend in an appropriate spot (the upper left-hand corner may be nice, or let Python figure out the best place). 

### Exercise 4. Subplots and fancy tick markers
Load the average monthly air temperature and seawater temperature for Holland. Create one plot with two graphs above each other using the `subplot` command (use `plt.subplot?` to find out how). On the top graph, plot the air and sea temperature. Label the ticks on the horizontal axis as 'jan', 'feb', 'mar', etc., rather than numbers. Use `plt.xticks?` to find out how. In the bottom graph, plot the difference between the air and seawater temperature. Add legends, axes labels, the whole shebang.

### Colors
If you don't specify a color for a plotting statement, `matplotlib` will use its default colors. The first three default colors are special shades of blue, orange and green. The names of the default colors are a capital `C` followed by the number, starting with number `0`. For example

In [None]:
plt.plot([0, 1], [0, 1], 'C0')
plt.plot([0, 1], [1, 2], 'C1')
plt.plot([0, 1], [2, 3], 'C2')
plt.legend(['default blue', 'default orange', 'default green']);

There are five different ways to specify your own colors in matplotlib plotting; you may read about them [here](http://matplotlib.org/examples/pylab_examples/color_demo.html). A useful way is to use the html color names.  The html codes may be found, for example, [here](http://en.wikipedia.org/wiki/Web_colors). 

In [None]:
color1 = 'fuchsia'
color2 = 'lime'
color3 = 'DodgerBlue'
plt.plot([0, 1], [0, 1], color1)
plt.plot([0, 1], [1, 2], color2)
plt.plot([0, 1], [2, 3], color3)
plt.legend([color1, color2, color3]);

### Gallery of graphs
The plotting package `matplotlib` allows you to make very fancy graphs. Check out the <A href="http://matplotlib.org/gallery.html"  target=_blank>matplotlib gallery</A> to get an overview of many of the options. The following exercises use several of the matplotlib options.

## `numpy` arrays

A nice overview of `numpy` functionality can be found [here](https://docs.scipy.org/doc/numpy/user/quickstart.html). 

### One-dimesional arrays
There are many ways to create arrays. For example, you can create an array from a Python list. 

In [None]:
np.array([1, 7, 2, 12])

Note that the `array` function takes one sequence of points between square brackets. 
Another function to create an array is `np.ones(shape)`, which creates an array of the specified `shape` filled with the value 1. 
There is an analogous function `np.zeros(shape)` to create an array filled with the value 0 (which can also be achieved with `0 * np.ones(shape)`). Next to the already mentioned `np.linspace` function there is the `np.arange(start, end, step)` 
function, which creates an array starting at `start`, taking steps equal to `step` and stopping *before* it reaches `end`. If you don't specify the `step`, 
it is set equal to 1. If you only specify one input value, it returns a sequence starting at 0 and incrementing by 1 until the specified value is reached (but again, it stops before it reaches that value)

In [None]:
print(np.arange(1, 7)) # Takes default steps of 1 and doesn't include 7
print(np.arange(5)) # Starts at 0 end ends at 4, giving 5 numbers

Recall that comments in Python are preceded by a `#`. 
Arrays have a dimension. So far we have only used one-dimensional arrays. 
Hence the dimension is 1. 
For one-dimensional arrays, you can also compute the length (which is part of Python and not `numpy`), which returns the number of values in the array

In [None]:
x = np.array([1, 7, 2, 12])
print('number of dimensions of x:', np.ndim(x))
print('shape of x:', x.shape)

The individual elements of an array can be accessed with their index. Indices start at 0. 
This may require a bit of getting used to. It means that the first value in the array has index 0. The index of an array is specified using square brackets.

In [None]:
x = np.arange(20, 30)
print('array x:', x)
print('value with index 0:', x[0])
print('value with index 5:', x[5])

A range of indices may be specified using the colon syntax:
`x[start:end_before]` or `x[start:end_before:step]`. If the `start` isn't specified, 0 will be used. If the step isn't specified, 1 will be used. 

In [None]:
x = np.arange(20, 30)
print(x)
print(x[0:5])
print(x[:5])  # same as previous one
print(x[3:7])
print(x[2:9:2])  # step is 2

You can also start at the end and count back. Generally, the index of the end is not known. You can find out how long the array is and access the last value by typing `x[len(x) - 1]` but it would be inconvenient to have to type `len(arrayname)` all the time. Luckily, there is a shortcut: `x[-1]` is the same as `x[len(x) - 1]` and represents the last value in the array. For example:

In [None]:
xvalues = np.arange(0, 100, 10)
print(xvalues)
print(xvalues[9])  # last value in array
print(xvalues[-1])  # much easier
print(xvalues[-1::-1])  # start at the end and go back with steps of -1

You can assign one value to a range of an array by specifying a range of indices, 
or you can assign an array to a range of another array, as long as the ranges have the same length. In the last example below, the first 5 values of `x` (specified as `x[0:5]`) are given the values `[40, 42, 44, 46, 48]`.

In [None]:
x = 20 * np.ones(10)
print(x)
x[0:5] = 40
print(x)
x[0:5] = np.arange(40, 50, 2)
print(x)

### Exercise 5. Arrays and indices
Create an array of zeros with length 20. Change the first 5 values to 10. Change the next 10 values to a sequence starting at 12 and increasig with steps of 2 to 30 (do this with one command). Set the final 5 values to 30. Plot the value of the array on the y-axis vs. the index of the array on the x-axis. Draw vertical dashed lines at x=4 and x=14 (i.e., the section between the dashed lines is where the line increases from 10 to 30). Set the minimum and maximum values of the y-axis to 8 and 32 using the `ylim` command.

### Two-dimensional arrays
Arrays may have arbitrary dimensions (as long as they fit in your computer's memory). We will make frequent use of two-dimensional arrays. They can be created with any of the aforementioned functions by specifying the number of rows and columns of the array. Note that the number of rows and columns must be a tuple (so they need to be between parentheses), as the functions expect only one input argument for the shape of the array, which may be either one number or a tuple of multiple numbers.

In [None]:
x = np.ones((3, 4)) # An array with 3 rows and 4 columns
print(x)

Arrays may also be defined by specifying all the values in the array. The `array` function gets passed one list consisting of separate lists for each row of the array. In the example below, the rows are entered on different lines. That may make it easier to enter the array, but it is not required. You can change the size of an array to any shape using the `reshape` function as long as the total number of entries doesn't change. 

In [None]:
x = np.array([[4, 2, 3, 2],
              [2, 4, 3, 1],
              [0, 4, 1, 3]])
print(x)
print(np.reshape(x, (2, 6)))  # 2 rows, 6 columns
print(np.reshape(x, (1, 12)))  # 1 row, 12 columns

The index of a two-dimensional array is specified with two values, first the row index, then the column index.

In [None]:
x = np.zeros((3, 8))
x[0, 0] = 100
x[1, 4:] = 200  # Row with index 1, columns starting with 4 to the end
x[2, -1:4:-1] = 400  # Row with index 2, columns counting back from the end with steps of 1 and stop before reaching index 4
print(x)

### Arrays are not matrices
Now that we talk about the rows and columns of an array, the math-oriented reader may think that arrays are matrices, or that one-dimensional arrays are vectors. It is crucial to understand that *arrays are not vectors or matrices*. The multiplication and division of two arrays is term by term

In [None]:
a = np.arange(4, 20, 4)
b = np.array([2, 2, 4, 4])
print('array a:', a)
print('array b:', b)
print('a * b  :', a * b)  # term by term multiplication
print('a / b  :', a / b)  # term by term division

### Exercise 6. Two-dimensional array indices
For the array `x` shown below, write code to print: 

* the first row of `x`
* the first column of `x`
* the third row of `x`
* the last two columns of `x`
* the 2 by 2 block of values in the upper right-hand corner of `x`
* the 2 by 2 block of values at the center of `x`

`x = np.array([[4, 2, 3, 2],
               [2, 4, 3, 1],
               [2, 4, 1, 3],
               [4, 1, 2, 3]])`

### Visualizing two-dimensional arrays
Two-dimensonal arrays can be visualized with the `plt.matshow` function. In the example below, the array is very small (only 4 by 4), but it illustrates the general principle. A colorbar is added as a legend. The ticks in the colorbar are specified to be 2, 4, 6, and 8. Note that the first row of the array (with index 0), is plotted at the top, which corresponds to the location of the first row in the array.

In [None]:
x = np.array([[8, 4, 6, 2],
              [4, 8, 6, 2],
              [4, 8, 2, 6],
              [8, 2, 4, 6]])
plt.matshow(x)
plt.colorbar(ticks=[2, 4, 6, 8], shrink=0.8)
print(x)

The colors that are used are defined in the default color map (it is called `viridis`), which maps the highest value to yellow, the lowest value to purple and the numbers in between varying between blue and green. An explanation of the advantages of `viridis` can be seen [here](https://youtu.be/xAoljeRJ3lU). If you want other colors, you can choose one of the other color maps with the `cmap` keyword argument. To find out all the available color maps, go 
[here](http://matplotlib.org/users/colormaps.html). For example, setting the color map to `rainbow` gives

In [None]:
plt.matshow(x, cmap='rainbow')
plt.colorbar(ticks=np.arange(2, 9, 2), shrink=0.8);

### Exercise 7. Create and visualize an array
Create an array of size 10 by 10. Set the upper-left quadrant of the array to 4, the upper-right  to 3, the lower-right  to 2 and the lower-left  to 1. First create an array of 10 by 10 using the `np.zeros` command, then fill each quadrant by specifying the correct index ranges. Visualize the array using `matshow`. It should give a red, yellow, light blue and dark blue box (clock-wise starting from upper left) when you use the `jet` colormap.

### Using conditions on arrays
If you have a variable, you can check whether its value is smaller or larger than a certain other value. This is called a *conditional* statement.
For example:

In [None]:
a = 4
print('a < 2:', a < 2)
print('a > 2:', a > 2)

The statement `a < 2` returns a variable of type boolean, which means it can either be `True` or `False`. Besides smaller than or larger than, there are several other conditions you can use:

In [None]:
data = np.arange(5)
print(data)
print(data < 3)

The statement `data < 3` returns an array of type `boolean` that has the same length as the array `data` and for each item in the array it is either `True` or `False`. The cool thing is that this array of `True` and `False` values can be used to specify the indices of an array:

In [None]:
a = np.arange(5)
print(a)
print(a[[True, True, False, False, True]])

When the indices of an array are specified with a boolean array, only the values of the array where the boolean array is `True` are selected. This is a very powerful feature. For example, all values of an array that are less than, for example, 3 may be obtained by specifying a condition as the indices.

In [None]:
a = np.arange(5)
print('the total array:', a)
print('values less than 3:', a[a < 3])

If we want to replace all values that are less than 3 by, for example, the value 10, use the following short syntax:

In [None]:
a = np.arange(5)
print(a)
a[a < 3] = 10
print(a)

### Exercise 8. Replace high and low values in an array
Create an array for variable $x$ consisting of 100 values from 0 to 20. Compute $y=\sin(x)$ and plot $y$ vs. $x$ with a blue line. Next, replace all values of $y$ that are larger than 0.5 by 0.5, and all values that are smaller than $-$0.75 by $-$0.75, and plot the modified $y$ values vs. $x$ using a red line on the same graph. 

### Exercise 9. Change marker color based on data value
Create an array for variable $x$ consisting of 100 points from 0 to 20 and compute $y=\sin(x)$. Plot a blue dot for every $y$ that is larger than zero, and a red dot otherwise

### Select indices based on multiple conditions
Multiple conditions can be given as well. When two conditions both have to be true, use the `&` symbol. When at least one of the conditions needs to be true, use the '|' symbol (that is the vertical bar). For example, let's plot $y=\sin(x)$ and plot blue markers when $y>0.7$ or $y<-0.5$ (using one `plot` statement), and a red marker when $-0.5\le y\le 0.7$. Note that when there are multiple conditions, they need to be between parentheses.

In [None]:
x = np.linspace(0, 6 * np.pi, 50)
y = np.sin(x)
plt.plot(x[(y > 0.7) | (y < -0.5)], y[(y > 0.7) | (y < -0.5)], 'bo')
plt.plot(x[(y > -0.5) & (y < 0.7)], y[(y > -0.5) & (y < 0.7)], 'ro');

### Exercise 10. Fix the error 
In the code below, it is meant to give the last 5 values of the array `x` the values [50, 52, 54, 56, 58] and print the result to the screen, but there are some errors in the code. Run the code to see the error message. Then fix the code and run it again.

In [None]:
x = np.ones(10)
x[5:] = np.arange(50, 62, 1)
print(x)