# Data wrangling with Python / Numpy




## What is Jupyter?

This is a Jupyter Notebook. The Jupyter Notebook is a web application that allows you to create documents that contain live code and explanatory text.

Assuming you haven't used Jupyter notebooks before, we'll provide the bare minimum explanation to get you started. 

A notebook is made up of cells. Try clicking on a couple of cells below. You notice that in this notebook there are __two types of cells: code cells and Markdown cells__. This is a Markdown cell - it allows us to explanatory text in the simple Markdown syntax. 

Code cells are much more intersting. These allow us to run code and have the output returned to us in the Notebook.

Try clicking on the first code cell below, the one that says 

```Python
import numpy
```

This highlights the code block, but does not actually run the code. To run the code, click  <kbd>SHIFT</kbd>+<kbd>ENTER</kbd>, or click the play button (<button class='fa fa-play icon-play btn btn-xs btn-default'></button>) in the toolbar above.

You can run this analysis in this Notebook by basically just repeatedely clicking  <kbd>SHIFT</kbd>+<kbd>ENTER</kbd> on every cell. 

* When you're ready to try writing your own code, start a new cell by clicking  in the insert tab in the toolbar.

## Importing a python Library

In [None]:
import numpy

## loading data

In [None]:
numpy.loadtxt('nasdaq.csv')


## numpy arrays as variables

In [None]:
data = numpy.loadtxt('nasdaq.csv')


In [None]:
%pylab inline
plt.plot(data)


## maths with arrays

In [None]:
data*2.0

In [None]:
d2 = data*2.0
d2

In [None]:
print(data - data)

## indexing

In [None]:
print('first value in data:', data[0])

In [None]:
zeroonetwo = numpy.arange(3)
print(zeroonetwo)
print(zeroonetwo[2])

In [None]:

print('last value in data:', data[-1])


## slicing and dicing

In [None]:
print(data[0:5])

## combining tricks

![Alt](figs/vectorized_diff.png "difference")

In [None]:
u = numpy.arange(5)
print(u)
print(u[1:] - u[:-1])

## asking questions / boolean arrays

In [None]:
print(data.mean())

In [None]:
data > data.mean()

In [None]:
ba = data > data.mean()
print(ba)
print(data[ba])

In [None]:
len(data[ba])

## Challenge

Remember that our data array contains the daily closing price for the Nasdaq. We are interested in the overall frequency with which today's closing price change follows yesterday's.

Much of the work we need is done by the following lines of code:

```perl
follow = data[:-1] < data[1:]
seq = follow[:-1] == follow[1:]
```

The first line code will return a Numpy array. That array will have `True` if one day's price (`data[1:]`) is greater than the previous day's (`data[:-1]`); otherwise `False`. 

The second line of code forms another array, containing `True` if an element has the same value as the preceeding element; otherwise `False`. This is the same as saying, return True if Yesterday's price change was the same as today's price change, otherwise return False. Once again these values are assigned to a new variable, called seq.

In [None]:
follow = data[:-1] < data[1:]
seq = follow[:-1] == follow[1:]

print(data[0:5])
print(follow[0:5])
print(seq[0:5])

So finally, your challenge is to combine the following two Python expressions to answer the __question what is the overall frequency with which today's closing price _change_ follows yesterday's.__

```perl
seq.sum(), len(data)
```

* Talk with the helpers about what these Python expressions mean.
* Try combining the expressions with a python mathematical operator, e.g:
 
```perl
print(seq.sum() +  len(data))
```