# Data analysis

This notebook introduces the use of `Jupyter` notebook and `Python` for data analysis.

In [1]:
%pylab inline
import numpy
import seaborn

Populating the interactive namespace from numpy and matplotlib


In [2]:
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
       ..., 
       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])

In [3]:
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [4]:
type(data)

numpy.ndarray

In [5]:
print(data.dtype)

float64


In [6]:
print(data.shape)

(60, 40)


## Indexing arrays

Arrays are indexed by *row* and *column*, using *square bracket* notation:

```python
data[30, 20]  # get entry at row 30, column 20 of the array
```

Counting of array elements starts at 0 (zero), not at 1 (one).

In [7]:
print('first value in data:', data[0, 0])

first value in data: 0.0


In [8]:
print('middle value in data:', data[30, 20])

middle value in data: 13.0


## Slicing arrays

We can select whole sections of arrays by *slicing* them: defining the start and end points of the *slice* in square brackets, separating start and end with `:` (colon)

```python
data[0:4, 0:10]
```

The slice `0:4` means "*start at index 0 and go up to, but not including, index 4*"

In [9]:
print(data[0:4, 0:10])

[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
 [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
 [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
 [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]


In [10]:
print(data[5:10, 0:10])

[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
 [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
 [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
 [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
 [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]


If we don't specify a *start* to the slice, `Python` assumes we mean the beginning of the axis (first element, `0` (zero)).

If we don't specify an *end* to the slice, `Python` assumes we mean the end of the axis (last element.

In [11]:
small = data[:3, 36:]
print('small is:')
print(small)

small is:
[[ 2.  3.  0.  0.]
 [ 1.  1.  0.  1.]
 [ 2.  2.  1.  1.]]


# Operations on arrays

Arithmetic operations, such as `+`, `-`, `*`, `/` are performed *elementwise* on arrays.

We can multiply an array by a scalar.

In [12]:
doubledata = data * 2.0

In [13]:
print('original:')
print(data[:3, 36:])
print('doubledata:')
print(doubledata[:3, 36:])

original:
[[ 2.  3.  0.  0.]
 [ 1.  1.  0.  1.]
 [ 2.  2.  1.  1.]]
doubledata:
[[ 4.  6.  0.  0.]
 [ 2.  2.  0.  2.]
 [ 4.  4.  2.  2.]]


Adding two arrays is also performed elementwise.

In [14]:
tripledata = doubledata + data

In [15]:
print('tripledata:')
print(tripledata[:3, 36:])

tripledata:
[[ 6.  9.  0.  0.]
 [ 3.  3.  0.  3.]
 [ 6.  6.  3.  3.]]
