# Introductory Material

## 1. Working with Jupyter

- execute a cell with CTRL+ENTER or ALT+ENTER (to open a new cell underneath)
- CTRL+Z for (per-cell) 'undo'
- cells follow order of execution - *not* top-to-bottom
- add [Markdown](https://daringfireball.net/projects/markdown/syntax) cells for formatted text

In [None]:
# system commands
! ls

Jupyter 'magic' commands:
- `%alias` = create command/function alias
- `%debug` = open interactive debugger
- `%edit` = open editor & execute after close
- `%env` = set environment variables
- `%matplotlib inline` = embed plots/images in notebook
- `%store` = save data for use in another notebook
- `%time/%timeit` = time execution of function call


Recommended reading:
- [28 Jupyter Notebook Tips & Tricks](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)
- [`interact` docs](http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html)

## 2. Finding Help & Reading Documentation

In Python, you can view the documentation for any object with the syntax `help(obj)`. Invoke a help search prompt with `help()`.

To get a list of all of an objects attributes and methods, use `dir(obj)`. Calling `dir()` with no arguments provides a listing of everything defined in the current scope.

In [None]:
a = 'a string'
b = 10
def c(a, b):
    print(a*b)

print(dir(a))
print(dir())

(__Note:__ In Jupyter, the environment includes a lot of additional, weird-looking variables e.g. `In` and `Out`.)

In the Juptyter Notebook, another help option is available: type

`range?`

to get a pop-up window of documentation for the `range` function. You can scroll up and down this help page, and close it once you're done.

### Reading docstrings and help pages

Let's dissect the help page for `numpy`s `arange` function.

```
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use ``linspace`` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values, ``out[i+1] - out[i]``.  The default
    step size is 1.  If `step` is specified, `start` must also be given.
dtype : dtype
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.

Returns
-------
arange : ndarray
    Array of evenly spaced values.

    For floating point arguments, the length of the result is
    ``ceil((stop - start)/step)``.  Because of floating point overflow,
    this rule may result in the last element of `out` being greater
    than `stop`.

See Also
--------
linspace : Evenly spaced numbers with careful handling of endpoints.
ogrid: Arrays of evenly spaced numbers in N-dimensions.
mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.

Examples
--------
[snip]
```

First, the usage statement:

```
arange([start,] stop[, step,], dtype=None)
```

This tells the user how to correctly invoke the function. Optional arguments are given inside `[]`s, and default values of arguments are displayed as `argument=default_value`. So here we can see that `numpy.arange` has: one required argument - `stop`; two optional arguments - `start` and `step`; and one argument - `dtype` - with a default value of `None`.

Beneath the usage statement, we are given more information about what the function does, followed by some details about the type and meaning of each of the arguments. These are given in the form

```
argument: object_type, mode
    description
```

Then, we are told what the function returns. This is important because it helps us know what we can do with the data that is returned by the function.

Lastly, we get some hints about related functions, and then an extensive collection of usage examples, which I have left out above.

This is a fine example of documentation for a function and in fact all of the documentation for `numpy` is of extremely high quality. I'm sorry to say that you can't rely on the documentation always being this detailed and helpful! But, in general, one of the strengths of Python as a language is that it typically has helpful documentation as well as an active community of users and developers who are willing to provide advice and help e.g. through StackOverflow and other online resources.

## 3. Reading Data From Files

In [None]:
handle = open('example_data.txt', 'r')
for line in handle.readlines():
    print line.strip()
handle.close()

In [None]:
with open('example_data.txt', 'r') as handle:
    for line in handle.readlines():
        print line.strip()
# no closing necessary when using 'with' - Python takes care of this for you

## 4. A Brief Introduction to `matplotlib`

(This material largely taken from the [`matplotlib` documentation/examples](https://matplotlib.org/index.html) and [this excellent blogpost](http://pbpython.com/effective-matplotlib.html).)

In [None]:
% matplotlib inline
# first, a quick review of import statement syntax
from matplotlib import pyplot as plt
# or import matplotlib.pyplot as plt
import numpy as np

The default `matplotlib` style is not to everyone's taste. It has been improved in the recent update to v2.0, but you may also choose another style easily:

In [None]:
plt.style.available

In [None]:
plt.style.use('seaborn-notebook')

If you want to easily re-use your own settings, you can write your own stylesheet, add it to the list above, and load it in every time. See [here](https://matplotlib.org/users/customizing.html) for detailed instructions.

`matplotlib` gives you two fundamental objects to work with - the `Figure` and `Axes`. The `Figure` object is the top-level container for an entire figure. It can contain a number of `Axes` objects, each of which is a single plot. By approaching it this way, `matplotlib` makes it very easy to organise multiple plots into a single figure.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(9, 4))

Now let's plot something...

In [None]:
# generate some random test data
all_data = [np.random.normal(0, std, 100) for std in range(6, 10)]

# plot violin plot on the first set of axes
axes[0].violinplot(all_data,
                   showmeans=False,
                   showmedians=True)
axes[0].set_title('violin plot')

# and a box plot on the second set
axes[1].boxplot(all_data)
axes[1].set_title('box plot')

# show the plot in its current form
fig

That's a good start. Let's add some horizontal gridlines to make the distributions easier to compare, and some labels to the x-axes.

In [None]:
# adding horizontal grid lines
for ax in axes:
    ax.yaxis.grid(True)
    ax.set_xticks([y+1 for y in range(len(all_data))])
    ax.set_xlabel('xlabel')
    ax.set_ylabel('ylabel')

# add x-tick labels
plt.setp(axes, xticks=[y+1 for y in range(len(all_data))],
         xticklabels=['x1', 'x2', 'x3', 'x4'])
fig

This is a really simple example, which barely scratches the surface of what you can do with `matplotlib`. The online documentation is extremely comprehensive, and I encourage you to check it out. In addition, if you are doing a lot of data handling as well as plotting, you should check out the [`pandas`](http://pandas.pydata.org) library, for data analysis in Python. As well as providing shortcuts to create common plot types from datasets in a few lines (raw text file -> boxplot in three lines!), `pandas` allows sophisticated data filtering, summarising, and processing techniques.

### Subplots

In the main tutorial of this course, you may want to plot display images next to one another. This is fairly straightforward with `matplotlib`, as you can see in the example above. Create a number of subplots in a figure with

In [None]:
figure, array_of_subplots = plt.subplots(nrows=2, ncols=2) # create four subplots, in a 2x2 grid

print(array_of_subplots.shape) # array_of_subplots is a 2x2 array

# note: somewhat unhelpfully, if you create only one row of subplots, the array will be one-dimensional, 
# meaning that you only use array_of_subplots[0], array_of_subplots[1], etc to access each set of axes

# generate some random data
x1, x2, x3, x4 = [np.random.randint(10, size=10) for i in range(4)]
y1, y2, y3, y4 = [np.random.randint(10, size=10) for i in range(4)]

# plot onto the first subplot (top-left)
array_of_subplots[0][0].bar(list(range(10)), y1, width=0.9)

# plot onto the second subplot (top-right)
array_of_subplots[0][1].scatter(x2, y2)

# and the third and fourth (second row)
array_of_subplots[1][0].pie(x3)

for i in range(len(x4)):
    array_of_subplots[1][1].text(float(x4[i])/10, float(y4[i])/10, 'abcdefghij'[i])

# render the figure
figure.show()