# Numpy and Matplotlib essential

As suggested in one of the previous challenge, the library *numpy* is providing a very similar object than the list called *array*. The main difference is in the operation you can realise on it which are oriented towards computation. 

In this lesson we are going to learn a little more how to use this fundamental library to do any numerical analysis in Python.

Numpy extend the Python langage by providing new type (array, matrix, masked_array...), functions and method to realise efficient numerical calculation using Python.

The most basic numpy type is called *array*. The most basic array is a multi-dimensional object which contains numerical data.

[Numpy](http://www.numpy.org/) is **the** numerical library for Python. This library is too big to be taught in a day. That will be extremely boring and not useful at all. 

This library is at the base of all the other libraries used in science and data science: 

- [scipy](https://www.scipy.org/) Fundamental library for scientific computing (interation, optimisation...)
- [pandas](http://pandas.pydata.org/) data structure and data analysis tools
- [matplotlib](https://matplotlib.org/) Python 2D plotting

And more specialised one like:

- [astropy](http://www.astropy.org/) for the astronomy
- [h5py](https://www.h5py.org/) to interact with the HDF5 format
- [scikit-learn](http://scikit-learn.org/) for Machine Learning
- [TensorFlow](https://www.tensorflow.org) for Deep Learning

We are going to learn some of the basic command not seen in the first course using some of these libraries but keep in mind that we are just covering the basics so you can understand how to start using this library and hoping that will peak your interest enough.

To start, we are going to import the two libraries *numpy* and *matplotlib* that will be use in this episode.

**note**

HDUList are not object that you are familiar with.

We can access the first HDU (the one of interest) which contains the image as an element in a list using the index:

In [None]:
HDU_copy1 = im1[0]     # As mention an HDU is a list and we want to have access to the first one.

It is also possible to access it by using it's name (or key) as in a dictionary: 

In [None]:
HDU_copy2 = im1['PRIMARY']  # HDU can also be accessed through their name like dictionary

We can verified that both copy are the same.

In [None]:
HDU_copy1 == HDU_copy2

Fits data, as most of scientific data images containers, are composed with a header and a data part. The header contains metadata relevant to the observations and to the data itself. You can print them on screen by using the attribute *header*.

In [None]:
im1[0].header

To access the data itself (here an image of a nebulae), it is similar but the attribute is called *data*

In [None]:
imdata = im1[0].data
print(type(imdata))

We can use matplotlib to show what the image look like

In [None]:
plt.imshow(imdata, origin='lower', cmap='gray')
plt.colorbar()

It is difficult to see anything on that image. This happen very often in astronomical images where very bright objects are saturating the CCD and a linear output will show a limited number of pixel because of contrast. 

We are going to improve the visible output by doing some simple analysis on the image which will help to side step the contrast problem.

In [None]:
print('mean value im1:', imdata.mean())
print('median value im1:', np.median(imdata))  # Note: median is not provided by a method!
print('max value im1:', imdata.max())
print('min value im1:', imdata.min())

The previous result are giving us some useful informations. We can see that the range between the minimum and maximum value is really big and also that the maximum value is very far from the median or median value in a pixel. That probably means that a very small amount of pixels has very high value.

We can check it by plotting an histogram which will give us the number of pixel per number of photons.

In [None]:
hist = plt.hist(imdata.ravel(), bins=100)
plt.show()

The histogram confirm our suspicion but we can improve the visualisation by plotting in logarithm.

In [None]:
hist = plt.hist(imdata.ravel(), bins=100)
plt.yscale('log')

We can also modify manually the upper limit in number of pixel with photons (y axis) using the method ylim:

In [None]:
hist = plt.hist(imdata.ravel(), bins=100)
plt.ylim(0,1e5)

It is also possible to limit the range in the number of photons in a pixel (x axis) using the *range* argument in the *hist* function:

In [None]:
hist = plt.hist(imdata.ravel(), bins=100, range=(1,30))
plt.ylim(0,1e5)

Using the previous graphic we can limit the range in photons in plotting the previous image and improve the contrast using the *vmax* argument: 

In [None]:
plt.imshow(imdata, origin='lower', cmap='gray', vmax=25)
plt.colorbar()

The next plots does not provide a lot of informations useful in our analysis but it is a classical plot that you can obtain with some visualisation tool. It sum the columns and the lines of the images and divide by the number of pixels in the direction. with this plot you can distinguish the 2 bright objects (stars) which are located for the first one in the lower left part of the imae and the second one, the brightest, which is more central.

In [None]:
plt.plot(imdata.sum(axis=0)/imdata.shape[0], label='axis0')
plt.plot(imdata[:,::-1].sum(axis=1)/imdata.shape[1], label='axis1')
plt.legend()

Another option to improve the contrast visually is to plot the image after a conversion in logarithm (that will flatten the image). The problem is that the image can have negative value. The  *imdata - imdata.min() + 1* assure that every value in the image will be strictly greater than 0.

In [None]:
logim = np.log(imdata - imdata.min() + 1)
plt.imshow(logim, origin='lower', cmap='gray')

## Masked array

Numpy is providing an extremely useful tool called **masked array**. This tool associated to a numpy array another array composed only of boolean value (**True** or **False**) which indicate to numpy to use (or not) the specific element.

We are going to see how it can be used in our case study:

In [None]:
immasked = np.ma.masked_greater(imdata, 25)

#mask = imdata > 25
#masked1 = np.ma.array(imdata, mask=mask)

print(immasked)

Matplotlib is aware of this numpy object and will plot the image by looking **only** at the pixels with the value **True** in the associated mask:

Now we are going to add some random noise to that curve. 
To do it we can use the numpy function *normal* from the module *random* provided by numpy library:

In [None]:
g = 100*g   # To have something visible we are multiplying the function by 100.

# Creation of the noise
noisy = np.random.normal(g)
plt.plot(x, g+noisy)
plt.show()

We can calculate the Signal to Noise ratio of the previous data set by dividing the noisy function by the standard deviation (the noise).

In [None]:
rms = np.std(noisy) # root mean square
SN = noisy / rms
plt.plot(SN)
plt.show()

In [None]:
mask = SN < 1
print(mask[:5])
print(mask.shape)
noisy_ma = np.ma.array(noisy, mask=mask)
plt.plot(noisy_ma)
plt.show()

# Working with images

We are going to learn some commands to deal with images. Since most of the scientific domain are using their own file format, we obviously not learn all of them and I will use it an astronomical image using the format used mainly in this domain, the fits file. 

In the *data* directory you should find a file called *502nmos.fits*. 

We can verify that the file is indeed here:

In [None]:
ls *.fits

To be able to manipulate data in this file, we need to import a library which will be able to open the file and put the data in a numpy array. 

This is a good occasion to install a new library. To realise that, open a terminal. On Microsoft Windows, can you start the **Anaconda prompt** software: 

![MS Windows terminal](images/anaconda-prompt.png "Anaconda Prompt terminal")

On Unix system, you can start a terminal or having one started through the Jupyter notebook:

![Starting a Terminal with Jupyter](images/jupyter_terminal.png "Starting a terminal with Jupyter Notebook")

![Terminal using Jupyter](images/jupyter_terminal2.png "Terminal with Jupyter Notebook")

We are going to use the command **pip** which will allows you to install any package available on the python software repository: [Pypi](https://pypi.python.org/pypi). Here we will install the library [pyfits](http://www.astropy.org/) which will provide us the tools to open the fits images files. 

**Note:**

This is only an example to teach you how to install a library not yet available and how to open a specific image format. I am not expecting you to use this specific library in the furutre. Each domain, has their own file format (most of the time): 

- [NetCDF](https://www.unidata.ucar.edu/netcdf/)
- [HDF5](https://www.hdfgroup.org/HDF5/)
- MS Excel
- SQL
- ...

All of these formats can be open using python but you will have to install an additional library to do it.

In the previously open terminal:

```bash

pip install -U pyfits --user
```


The *pip* command will look at the *pypi* repository if the library *pyfits* is available, the *-U* option will look if there are an upgrade available (if *pyfits* is already present on your system) and the *--user* option will install the library in the user account not on the system wide (you do not have to be administrator to install a new python library).

<div style='background:#B1E0A8; padding:10px 10px 10px 10px;'>
<H2> Challenges </H2>

 <ol>
 <li>
 What is the main format for your data?
 </li>
 <li>
 Find a python library which will allows you to open this format and convert it in a numpy array.
 </li>
 </div>

Some available libraries:

- FITS, VOTable...: [astropy](http://docs.astropy.org/en/stable/index.html#files-i-o-and-communication)
- CSV, HDF5, MS Excel, SQL...: [pandas](http://pandas.pydata.org/pandas-docs/stable/io.html)
- NetCDF: [netcdf4](http://unidata.github.io/netcdf4-python/)
- Matlab mat: [scipy](https://docs.scipy.org/doc/scipy/reference/tutorial/io.html)
- Tiff: [Pillow](https://pillow.readthedocs.io/en/latest/), [pylibtiff](https://github.com/pearu/pylibtiff),[matplotlib]()...

We are going to the data directory: