# Introduction to Python and the Jupyter Notebook Environment


_You are looking at [Jupyter Notebook](http://jupyter-notebook.readthedocs.io/en/latest/notebook.html)_.  A Jupyter notebook (formerly known as an iPython notebook) is a lightweight environment for describing and running blocks of code in a [variety of languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels).  Over the last few years, it has become a popular tool in the scientific community for its use in data analysis and exploration.  Throughout this course we'll be using notebooks to look at data both from observations and numerical models.




----------
## 1. Cell types
A key feature of Jupyter notebooks is that they support different cell types.  A "cell" in a Jupyter notebook is a field in which one can enter text.  To enter text in a cell, one simply needs to click inside it to make it active.  There are two types of cells:

1. Markdown cells (like this cell itself)
2. Code cells.

### Markdown cells
Markdown cells are meant for displaying text.  Markdown is a way of indicating basic formatting aspects in plain text form.  [A full description of its syntax (with examples) can be found here.](https://daringfireball.net/projects/markdown/syntax) You can also double click on any Markdown cell in this notebook to see the text that produced it.  

To create a Markdown cell it's easiest to select a cell and change it to a Markdown cell by selecting Cell -> Cell Type -> Markdown in the Jupyter notebook menu bar.  Things Markdown supports are multi-level headings, bulleted and numbered lists, links, images, and font styles (e.g. **bold** or *italic*).  Jupyter notebooks also include MathJax support, so you can type LaTeX-style equations.  E.g. 
$$y = mx + b$$
In practice, it can often be helpful to use Markdown cells to provide background for the data analysis you are doing (e.g. to provide links to scientific papers, or describe the math behind a certain kind of analysis).

### Code cells
Code cells are what make the Jupyter notebook a powerful data analysis tool.  They enable one to write multi-line snippets of code, which can be executed one at a time, and in any order.  To run the active code cell (i.e. the one you are typing in) you hold down the Shift key and then type Enter.

After running a cell, the variables created in the cell will now be accessible to all other cells in the notebook.

In [1]:
print 'Hello world!'

Hello world!


## 2. Python basics

Python is a dynamically typed language; therefore it is very straightforward to assign data to a variable (there is no need to specify the data type).

Here we'll note that after executing the cell above, all other cells now have access to the variables "a" and "b."

## Data structures
Some fundamental Python data structures that will be of use to us in this course are:

- Lists
- Dicts
- `numpy` arrays

### Lists
Lists store sequences of objects.  The objects need not be of the same type (but often they will be).  See the following two examples of lists below.

One can iterate over items in a list using a `for` loop:

One can also access specific elements of a list using integer indexes (note that Python is "zero-indexed" meaning that the first element of the list is at index 0.)

An alternative way of iterating over the list using a `for` loop would be using an integer index and `range`.

---------------

<div class="alert alert-info"><h1>Exercise 1</h1></div>

+ Create a cell below 
+ Display the first 2 elements of `list_a`


----------

Also one can assign values to elements of a list:

Other operations one can do with lists can be found in the [Python documentation](https://docs.python.org/3/tutorial/datastructures.html).

### Dicts
"Dicts" or "Dictionaries" map a key to a value.  One use case is to describe a collection of related objects with string-based keys.  For example:

In this case, it is a little more transparent which value one will be accessing (i.e. there is no need for an opaque integer index).

Again one can find more information on Python dictionaries in the [documentation](https://docs.python.org/3/tutorial/datastructures.html?highlight=dictionaries#dictionaries).

### `NumPy` and `NumPy` arrays
`NumPy` is a standard package for scientific computing with Python. It contains among other things:

- `NumPy` arrays that can be used to store N-dimensional tables.
- Functions and useful linear algebra

`NumPy` is not built into the default namespace of Python; therefore we need to import it using an import statement.

To create a simple 2D numpy array, we can use a nested list:

This array consists of two rows and three columns:

Analogously to the 1D list, we can index a 2D array using integer indexes:

You can use slices to index rows and columns separately; for instance if we wanted to select just the first two columns of the array we could write:

Many `numpy` functions can be passed an "axis" argument; this specifies a particular axis of the array to execute the function along.  For example, using our 2D array, we can sum along each axis:

We can also use the function linspace, which linearly interpolates values between the specified starting and end points. We can also sum all the elements of x using sum.

Lastly, we can create an empty array of zeros by using the `np.zeros` command (say if we wanted to create a blank one to assign values to later).

The `numpy` library implements a huge number of functions that one can use on arrays.  We'll get a taste for a few of them later on, but more detail can be found in the [`numpy` documentation](https://docs.scipy.org/doc/numpy-1.12.0/reference/).

-----------

<div class="alert alert-info"><h1>Exercise 2</h1></div>

+ Create an array called x of 200 equally spaced values between 0 and 12.
+ Using a for loop and the sum function, compute the cumulative sum of x (i.e. an array of same dimension as x in which the nth element is the sum of the first n elements of x). 

On a smaller array as an example, if `x` were `[1, 2, 3, 4]`, we would want an array that looked like `[1, 3, 6, 10]`.

----------

Finally, `numpy` contains a function for loading in data from a text file.  This function is called [`np.loadtxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html).  One common way to store data is in CSV (comma separated value) format.  What this means is that in a text file, datapoints are stored in columns separated by commas (or more generally some "delimiter").

For example if we had a text file whose contents looked like:
```
a, b, c
1, 1, 2
3, 4, 5
```
and was called `example.csv`, we could load it into `numpy` with the following command:
```
arr = np.loadtxt('example.csv', delimiter=',', skiprows=1)
```
Now the array (`arr`) will look like:
```
[[1, 1, 2],
[3, 4, 5]]
```
We have used the `delimiter` argument to `np.loadtxt` to specify that the datapoints are separated using commas, and have used the `skiprows` argument to specify that we wish to ignore the first row (which contains the column labels, and not the datapoints themselves).


## 3. Plotting a time series
Here we will introduce some more applications-driven examples.  Plotting in Python is typically done using the [`matplotlib`](https://matplotlib.org) library.  Again `matplotlib` is not part of the Python standard library so we will have to import it.  In addition, if we want to view figures produced by `matplotlib` in the Jupyter notebook, we need to add another command.

As a basic example, we'll plot a family of sine curves using `matplotlib` to illustrate the basics of making a plot.  We'll plot:
$$ y = \sin(t - a) $$
and vary the value of $a$.  First we'll create some values for $t$.  To do so, we'll use the `linspace` command, which linearly interpolates values between the specified beginning and end points.

To create an environment in which we can make a plot, we'll use the [`plt.subplots`](https://matplotlib.org/api/pyplot_api.html?highlight=subplots#matplotlib.pyplot.subplots) command.  This returns a [`Figure`](https://matplotlib.org/api/figure_api.html?highlight=figure#module-matplotlib.figure) object and single [`Axes`](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) object or list of `Axes` objects, depending on the two input arguments.  The first input argument is the number of rows of `Axes` and the second is the number of columns (so 2, 2 would represent four `Axes` objects).

Below is a simple example of a single panel plot.  To plot a line we will use the [`plot`](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html?highlight=plot#matplotlib.axes.Axes.plot) command.

This is an example of a two-panel plot.

In [None]:
fig, axes = plt.subplots(1, 2)
fig.set_size_inches(8, 3)
y = np.sin(t)

for ax in axes:
    ax.plot(t, y)
    ax.set_xlim([-2. * np.pi, 2. * np.pi])
    ax.set_ylim([-1.25, 1.25])
    ax.set_xlabel('time')
    ax.set_ylabel('y')
    ax.set_title('Title')

Adding a [legend](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.legend.html?highlight=ax%20legend#matplotlib.axes.Axes.legend) requires setting a label in your plot command.

In [None]:
fig, ax = plt.subplots(1, 1)
a_values = np.array([1, 2, 3, 4]) / np.pi

for a in a_values:
    y = np.sin(t - a)
    ax.plot(t, y, label='a = {:0.2f}'.format(a))

ax.set_xlim([-2. * np.pi, 2. * np.pi])
ax.set_ylim([-1.75, 1.75])
ax.set_xlabel('time')
ax.set_ylabel('y')
ax.set_title('Title')
ax.legend(loc='upper right', ncol=2, frameon=False)

<div class="alert alert-info"><h1>Exercise 3</h1></div>

- Use `np.loadtxt` load the monthly time series of carbon dioxide as measured at the Mauna Loa Observatory in Hawaii from the file `'monthly_in_situ_co2_mauna_loa.csv'`
- Plot the time series of CO2 in [ppm] as a function of date. 