# NumPy

NumPy is the core of any Python script that requires any kind of numerical computation.
It provides a N-dimensional object that is missing in the core Python language.
But it also contains many helper functions that are required for basic data analysis.

In this session we will cover:

* importing numpy
* Creating numpy arrays
* Indexing and Slicing arrays
* Broadcasting
* NumPy functions
* Basic text IO

So let us begin, import numpy as follows.

This imports the entire NumPy library under the namespace of np. You should've covered this in the previous session.

This is important to see. It is cluttered. Almost every NumPy function can be seen from that tab completion. Name-spacing is done differently for each module as we will see at the end of this session.
But importing NumPy this way stops you not knowing where a function is being called from!

## Arrays

So arrays are the core feature of NumPy.

We can create one like so,

It is important to note that for this function to work, you must have the argument be a single object. So here, you will wrap the whole input in either a list  **[]** or a tuple **()**.

A one number array is pretty useless here, so we can create a larger array like so,

These are very basic arrays, the first one is 1 by 1 and the second is 2 by 3. 
Numpy arrays tend to be **n** rows and **m**  columns for 2D arrays.
This is due to the fact that Python is built on C which is row major. 

I hope you never have to do that! There are some basic properties of arrays that are important to cover.

Each array has a series of attributes that describes the fundamental properties of each array.

This is the dimensionality of the array:

This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m).

This is the total number of elements of the array.

This is the object describing the type of the elements in the array, for example, integer or float.

Further more, you can create arrays filled with either zeroes or ones if you require. This time you specify the shape i.e the dimensions of the array as input.

So now we can create really strange arrays, it is time to get that information back out. 
To do this, we will now index the array and it is sometimes referred to as slicing.

Python (like C) is row-major, so when we index an array, in the 2D we will end up the row first.

Some basic examples,

We can reassign variables inside these arrays, like you can do with lists. 

See what has happened? It has reassigned the entire row to be the same number.

Here, we have now just changed the first number in the first row. You have to index the specific value you require at times.

There are other forms of indexing.
This example will use a NumPy function called arange that is used to create a range of numbers.
You specify a start, stop and step size. 
Here we will create 0 to 99 in steps of one.

But what if for some reason you want the even elements?

This allows some creative slicing opportunities!

But wait! This is still 1D slicing. This won't help if you're working on a 3D data set! 

So let us skip 3D and go for 4D data.
Since you've not covered plotting or any fancy file input, I can't show you some of the lovely 4D observational data I have (also, the files are huge).

So we will create a fake 4D array with this axis layout, [x, y, wave, time].
To do this, we use another feature of NumPy, the random module, for all your random number needs.
The name is important to note, the random functions in NumPy do not reside under the NumPy namespace directly. But under the numpy.random namespace. 

So we have our *data*, I want to see the image at the first time step.

*There won't be a plot since it will be just static.*

The colon symbol is the same in IDL as the * symbol for arrays. It returns all the values for that index.
If you want the values from several indices, there is another syntax method.

So say we want a line profile from this data set, we require a specific pixel co-ordinate but across all wavelengths.

If you want a spectrum image, you want all of time!

Or,

I forgot to mention taking a small area of data!

Here, we can use the colon symbol to specify a start and end range. 
So you could in the future, create a movie of your data set for this wavelength.

That is basically all for slicing.
However, we aren't finished yet with arrays!

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<p>
<ol>
<li> Create a 3D array with size (150, 150) of nothing but zeroes. </li>
<li> Fill it the number 12. </li>
</p>
</div>

## Solution 1

## Solution 2

## Broadcasting

This word hides alot of the heavy lifting that NumPy does when it comes to array operations. Python is so very slow. Oh so slow. However, the C it is built upon is so quick. 

The goal of NumPy array development was that where possible, any numerical operation on an array would be done using the C API instead of Python.

So to illustrate this, we do two different array multiplications.

So, it simply does element by element multiplication. Nice and simples.

Here, we have an array times a constant.

Once again, it simply multiplies each element by the constant. However that is not what NumPy actually has done. It **"stretches"** the constant out, so that it appears to be an array with the same shape.
This enables the operation to be done in C and not in Python.

The interesting fact is that the second operation is faster than the first operation. One small caveat, it is slower if the array size is small.

So, NumPy allows you to add, multiply, subtract and divide arrays of different shapes as long as those arrays have the same number of columns. This means that the smaller array will be broadcast over the larger array resulting in an output.

For example, here we will add a 1 by 3 array and a 1 by 1 array.

Broadcasting has simply created a 1 by 3 array of 1s, allowing the summation to work.
It will break like so,

If you stare at the error message, it simply tells you that the shapes are wrong. It is impossible to broadcast 2 values over 3 elements.

Broadcasting does allow the outer product to be calculated of two arrays.

Here, a new NumPy function has been introduced. NumPy arrays allow a new axis to be added to in order to change the shape of any array. Each array was and still is 1D. However the first array was then turned into a column vector and then added to a row vector. The new array is 4 by 3. This is a outer product. You can do other outer operations like this.

This is a simple overview of broadcasting, if you want to see the exact rules used, the link is http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting

Now on to some functions!

## Functions

So, arrays are all fancy but that isn't all that NumPy can do. 

So, here we have used the sin function within NumPy and used as an argument, the entire `fake_solar_data array`. Unlike the core math module in Python, you can enter entire lists or arrays into NumPy functions.

NumPy has a large selection of functions, from the traditional trigonometric functions to min/max to interpolation. 

The online documentation is pretty extensive and is the best place to find the function you are after. Found here: http://docs.scipy.org/doc/numpy/reference/ It showcases the core of NumPy.

On a side-note, within IPython, to raise the help page, type the function and then have a question mark at the end. Normally you will have to Google the function to bring up its documentation.

Time for a segway function!
The dot function.

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<p>
<ol> 
<li> Find the maximum value in the fake_solar_data array for the first timestep and wavelength. </li>
<li> Now index the array to find that maximum value. </li>
</p>
</div>

## Solution 1

## Solution 2

# I/O

Input/Output is a overlooked topic. You will on occasion have have to deal with text files or comma-separated files (CSV). NumPy offers a convenient method to opening and saving out these kind of files.

Let us deal with saving first!

So what has happened here? This CSV file is complex. Typically if you plan to use NumPy loadtxt, you have simple data files. 

This CSV file has a header row, some of the entries aren't even numbers. Numpy hates this.

So this is slightly better. But it really isn't want you want. 
Let us try another function!

That is better. While we now lack the header, we can see each entry and it seems reasonably clear what each column is.

This kind of data shifting would better be using a different library. But that is for another time.

# SciPy

It is difficult to talk about NumPy without mentioning SciPy. 

SciPy has alot of functions that are nearly identical to ones in NumPy, for example the FFT routines in NumPy also exist in SciPy. So which do you use?

## YOU USE SCIPY

The reason for this is that NumPy contains pieces of code for backwards compatibility (spits). So as a result it has routines that are not as well developed as SciPy.

Before I go too far, to show an example of this, the SciPy IO module is more advanced than NumPys. It has the ability to read in IDL save files (but not write them due to license issues).

There is an IDL savefile called "area_thresh.sav" in this directory!

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<h2>Challenges</h2>
<ol>
<li> With this IDL savefile, extract the area and intensity data series it contains. </li>
<li> Then save out these arrays as a CSV file. </li>
</div>

## Solution 1

## Solution 2