# NumPy

NumPy is the core of any Python script that requires any kind of numerical computation.
It provides a N-dimensional object that is missing in the core Python language.
But it also contains many helper functions that are required for basic data analysis.

So let us begin, import numpy as follows.

In [1]:
import numpy as np

This imports the entire NumPy library under the namespace of np. You should've covered this in the previous session.

In [2]:
np.__version__ #Show them the NumPy namespace

'1.9.2'

This is important to see. It is cluttered. Almost every NumPy function can be seen from that tab completion. Name-spacing is done differently for each module as we will see at the end of this session.
But importing NumPy this way stops you not knowing where a function is being called from!

## Arrays

So arrays are the core feature of NumPy.

We can create one like so,

In [3]:
np.array([1])

array([1])

It is important to note that for this function to work, you must have the argument be a single object. So here, you will wrap the whole input in either a list  **[]** or a tuple **()**.

A one number array is pretty useless here, so we can create a larger array like so,

In [4]:
np.array([[1,2,3],[4,5,6]])

array([[1, 2, 3],
       [4, 5, 6]])

These are very basic arrays, the first one is 1 by 1 and the second is 2 by 3. 
Numpy arrays tend to be **n** rows and **m**  columns for 2D arrays.
This is due to the fact that Python is built on C which is row major. 

But we can have an array with any number of dimensions or any number of different variable types.
Like so,


In [5]:
np.array([[1],[1,2],['stt'], np.array([1,2])])

array([[1], [1, 2], ['stt'], array([1, 2])], dtype=object)

I hope you never have to do that! There is some basic properties of arrays that are important to cover.

Each array has a series of attributes that describes the fundamental properties of each array.

ndarray.ndim:

This is the number of axes (dimensions) of the array.

ndarray.shape:

This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m).

ndarray.size: 

This is the total number of elements of the array.

ndarray.dtype:

This is the object describing the type of the elements in the array, for example, integer or float.

Further you can create arrays filled with either zeroes or ones if you require. This time you specify the shape i.e the dimensions of the array as input.

In [6]:
np.zeros(10), np.zeros((3,5,6)), np.ones((1,2,3,4,5))

(array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 array([[[ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.]],
 
        [[ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.]],
 
        [[ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.,  0.,  0.]]]),
 array([[[[[ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.]],
 
          [[ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.]],
 
      

So now we can create really strange arrays, it is time to get that information back out. 
To do this, we will now index the array and it is sometimes referred to as slicing.

Python (like C) is row-major, so when we index an array, in the 2D we will end up the row first.

Some basic examples,

In [7]:
arr = np.array([[1,2,3],[4,5,6]])
arr.shape

(2, 3)

In [8]:
arr[0],arr[1],arr[0][0]

(array([1, 2, 3]), array([4, 5, 6]), 1)

We can reassign variables inside these arrays, like you can do with lists. 

In [9]:
arr[0] = 120
arr

array([[120, 120, 120],
       [  4,   5,   6]])

See what has happened? It has reassigned the entire row to be the same number.

In [10]:
arr[0][0] = - 100
arr

array([[-100,  120,  120],
       [   4,    5,    6]])

Here, we have now just changed the first number in the first row. You have to index the specific value you require at times.

There are other forms of indexing.
This example will use a NumPy function called arange that is used to create a range of numbers.
You specify a start, stop and step size. 
Here we will create 0 to 99 in steps of one.

In [11]:
arr = np.arange(0,100)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

But what if for some reason you want the even elements?

In [12]:
arr[::2]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

This allows some creative slicing opportunities!

In [13]:
arr[::4], arr[4::3], arr[95::-5]

(array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64,
        68, 72, 76, 80, 84, 88, 92, 96]),
 array([ 4,  7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52,
        55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94, 97]),
 array([95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15,
        10,  5,  0]))

But wait! This is still 1D slicing. This won't help if your working on a 3D data set! 

So let us skip 3D and go for 4D data.
Since you've not covered plotting or any fancy file input, I can't show you some of the lovely 4D observational data I have (also, the files are huge).

So we will create a fake 4D array with this axis layout, [x, y, wave, time].
To do this, we use another feature of NumPy, the random module, for all your random number needs.
The name is important to note, the random functions in NumPy do not reside under the NumPy namespace directly. But under the NumPy.Random namespace. 

In [14]:
fake_solar_data = np.random.random((400,400,4,100)) # Just a small one for RAM purposes.
fake_solar_data.shape

(400, 400, 4, 100)

So we have our *data*, I want to see the image at the first time step.

*There won't be a plot since it will be just static.*

In [15]:
fake_solar_data[:,:,0,0].shape

(400, 400)

The colon symbol is the same in IDL as the * symbol for arrays. It returns all the values for that index.
If you want the values from several indices, there is another syntax method.

In [16]:
fake_solar_data[...,0].shape

(400, 400, 4)

So say we want a line profile from this data set, we require a specific pixel co-ordinate but across all wavelengths.

In [17]:
fake_solar_data[150,300,:,0].shape

(4,)

If you want a spectrum image, you want all of time!

In [18]:
fake_solar_data[150,300,:,:].shape

(4, 100)

Or,

In [19]:
fake_solar_data[150,300].shape

(4, 100)

I forgot to mention taking a small area of data!

In [20]:
fake_solar_data[100:300,100:300,2].shape

(200, 200, 100)

Here, we can use the colon symbol to specify a start and end range. 
So you could in the future, create a movie of your data set for this wavelength.

That is basically all for slicing.
However, we aren't finished yet with arrays!

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<p>
<ol>
<li> Create a 3D array with size (150, 150) of nothing but zeroes. </li>
<li> Fill it the number 12. </li>
</p>
</div>

## Solution 1

In [21]:
zero_arr = np.zeros((150, 150))

## Solution 2

In [22]:
for i in range(zero_arr.shape[0]):
    for j in range(zero_arr.shape[1]):
        zero_arr[i,j] = 12
zero_arr

array([[ 12.,  12.,  12., ...,  12.,  12.,  12.],
       [ 12.,  12.,  12., ...,  12.,  12.,  12.],
       [ 12.,  12.,  12., ...,  12.,  12.,  12.],
       ..., 
       [ 12.,  12.,  12., ...,  12.,  12.,  12.],
       [ 12.,  12.,  12., ...,  12.,  12.,  12.],
       [ 12.,  12.,  12., ...,  12.,  12.,  12.]])

## Broadcasting

This word hides alot of the heavily lifting that NumPy does when it comes to array operations. Python is so very slow. Oh so slow. However, the C it is built upon is so quick. 

The goal of NumPy array development was that where possible, any numerical operation on a via an array would be done using the C API instead of Python.

So to illustrate this, we do two different array multiplications.

In [23]:
a = np.arange(1000000)
b = 2 * np.ones(1000000)
a*b

array([  0.00000000e+00,   2.00000000e+00,   4.00000000e+00, ...,
         1.99999400e+06,   1.99999600e+06,   1.99999800e+06])

So, it simply does element by element multiplication. Nice and simples.

Here, we have an array times a constant.

In [24]:
2 * a

array([      0,       2,       4, ..., 1999994, 1999996, 1999998])

Once again, it simply times each element by the constant. However that is not what NumPy actually has done. It **"stretches"** the constant out, so that it appears to be an array with the same shape.
This enables the operation to be done in C and not in Python.

The interesting fact is that the second operation is faster than the first operation. One small caveat, it is slower if the array size is small.

In [25]:
%timeit a*b

1000 loops, best of 3: 1.89 ms per loop


In [26]:
%timeit 2 * a

1000 loops, best of 3: 1.18 ms per loop


So, NumPy allows you to add, multiply, subtract and divide arrays of different shapes as long as the the operation have the same number of columns. This means that the smaller array will be broadcast over the larger array resulting in an output.

For example, here we will add a 1 by 3 array and a 1 by 1 array.

In [27]:
np.array([1,2,3]) + np.array([1])

array([2, 3, 4])

Broadcasting has simply created a 1 by 3 array of 1s, allowing the summation to work.
It will break like so,

In [28]:
np.array([1,2,3]) + np.array([1,2])

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

If you stare at the error message, it simply tells you that the shapes are wrong. It is impossible to broadcast 2 values over 3 elements.

Broadcasting does allow the outer product to be calculated of two arrays.

In [29]:
a = np.array([0,10,20,30])
b = np.array([1,2,3])

In [30]:
a[:,np.newaxis],a[:,np.newaxis].shape

(array([[ 0],
        [10],
        [20],
        [30]]), (4, 1))

In [31]:
a[:,np.newaxis] + b

array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

Here, a new NumPy function has been introduced. NumPy arrays allow a new axis to be added to in order to change the shape of any array. Each array was and still is 1D. However the first array was then turned into a column vector and then added to a row vector. The new array is 4 by 3. This is a outer product. You can do other outer operations like this.

This is a simple overview of broadcasting, if you want to see the excat rules used, the link is http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting

Now on to some functions!

## Functions

So, arrays are all fancy but that isn't all that NumPy can do. 

In [32]:
np.sin(fake_solar_data[100:300])

array([[[[  6.10079216e-01,   8.01264522e-01,   6.59489767e-02, ...,
            4.62556188e-01,   5.12723932e-01,   2.62523150e-02],
         [  7.61135035e-01,   2.78851238e-01,   5.90659011e-01, ...,
            7.76407761e-03,   5.03350743e-01,   2.65222948e-01],
         [  5.12996430e-01,   4.83668595e-01,   7.53139950e-01, ...,
            6.76103570e-01,   7.05069556e-01,   4.00662852e-01],
         [  7.92750646e-01,   4.28291673e-01,   6.87400818e-01, ...,
            5.53845060e-01,   9.83649343e-02,   4.76400593e-01]],

        [[  2.48479707e-01,   6.54510154e-02,   3.05364973e-01, ...,
            3.36427896e-01,   6.59669872e-01,   6.68556364e-01],
         [  4.56517389e-01,   7.74257409e-01,   7.03616360e-01, ...,
            3.67373173e-01,   6.72875398e-01,   3.33190438e-01],
         [  1.80059580e-01,   2.43731748e-02,   6.13198265e-03, ...,
            6.08895900e-01,   7.89417417e-02,   1.80382316e-01],
         [  2.60666533e-02,   7.12285661e-01,   8.27628707e-

So, here we have used the sin function within NumPy and used as an argument, the entire fake_solar_data array. Unlike the core math module in Python, you can enter entire lists or arrays into NumPy functions.

NumPy has a large selection of functions, from the traditional trigonometric functions to min/max to interpolation. 

The online doucmantion is pretty extensive and is the best place to find the function you are after. Found here: http://docs.scipy.org/doc/numpy/reference/ It showcases the core of NumPy.

On a side-note, within IPython, to raise the help page, type the function and then have a question mark at the end. Normally you will have to Google the function to bring up its documentation.

In [33]:
np.max?

Time for a segway function!
The dot function.

In [34]:
np.dot?

In [35]:
a = [[1, 0], [0, 1]]
b = [[4, 1], [2, 2]]
np.dot(a, b)

array([[4, 1],
       [2, 2]])

This should look familiar, since it is matrix multiplication. But what if we want to work in matrices only? 
Luckily, NumPy has a matrix object.

In [36]:
a = np.matrix(a)
b = np.matrix(b)
a

matrix([[1, 0],
        [0, 1]])

So now everything we do will follow the rules of matrices.

Summation!

In [37]:
a + b

matrix([[5, 1],
        [2, 3]])

Multiplication!

In [38]:
a * b

matrix([[4, 1],
        [2, 2]])

Transpose!

In [39]:
(a * b).T

matrix([[4, 2],
        [1, 2]])

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<p>
<ol>
<li> Find the maximum value in the fake_solar_data array for the first timestep and wavelength. </li>
<li> Now index the array to create an array of maximums. </li>
</p>
</div>

## Solution 1

In [68]:
np.argmax(fake_solar_data[:,:,0,0])

22418

## Solution 2

In [69]:
fake_solar_data[:,:,0,0].flat[22418]

0.99999875066820409

# I/O

In [41]:
from IPython.display import Image

In [None]:
Image('2015-03-24.jpg')

Input/Output is a overlooked topic. You will on occasion have have to deal with text files or comma-separated files (CSV). NumPy offers a convenient method to opening and saving out these kind of files.

Let us deal with saving first!

In [None]:
arr

In [None]:
np.savetxt('test.txt',arr)

In [None]:
np.loadtxt('test.txt')

In [None]:
np.loadtxt('fakenames.csv',delimiter=',')

So what has happened here? This CSV file is complex. Typically if you plan to use NumPy loadtxt, you have simple data files. 

This CSV file has a header row, some of the entries aren't even numbers. Numpy hates this.

In [None]:
np.genfromtxt('fakenames.csv', delimiter=',')[0]

So this is slightly better. But it really isn't want you want. 
Let us try another function!

In [None]:
np.recfromcsv('fakenames.csv')[1]

That is better. While we now lack the header, we can see each entry and it seems reasonably clear what each column is.

This kind of data shifting would better be using a different library. But that is for another time.

# SciPy

It is difficult to talk about NumPy without mentioning SciPy, its older brother. 

SciPy has alot of functions that are near identical from NumPy, for example the FFT routines in NumPy also exist in SciPy. So which do you use?

# YOU USE SCIPY

The reason for this is that NumPy contains pieces of code for backwards compatibility (spits). So as a result has routines that are not as well developed as SciPy.

Before I go too far, to show an example of this, the SciPy IO module is more advanced than NumPys. It has the ability to read in IDL save files (but not write them due to license issues).

In [None]:
from scipy.io import readsav 

There is an IDL savefile called "area_thresh.sav" in this directory!

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<p>
<ol>
<li> With this IDL savefile, extract the area and intensity data series it contains. </li>
<li> Then save out these arrays as a CSV file. </li>
</p>
</div>

## Solution 1

In [None]:
area = readsav('area_thresh.sav')['area']
intensity = readsav('area_thresh.sav')['intensity']

## Solution 2

In [None]:
np.savetxt('area.csv',area, delimiter=',')
np.savetxt('intensity.csv',intensity, delimiter=',')