# NumPy arrays

Nikolay Koldunov

koldunovn@gmail.com

This is part of [**Python for Geosciences**](https://github.com/koldunovn/python_for_geosciences) notes.

================

<img  height="100" src="files/numpy.png" >

-    a powerful N-dimensional array object
-    sophisticated (broadcasting) functions
-    tools for integrating C/C++ and Fortran code
-    useful linear algebra, Fourier transform, and random number capabilities


In [None]:
#allow graphics inline
%matplotlib inline 
import matplotlib.pylab as plt #import plotting library
import numpy as np #import numpy library
np.set_printoptions(precision=3) # this is just to make the output look better

## Load data

I am going to use some real data as an example of array manipulations. This will be the AO index downloaded by wget through a system call (you have to be on Linux of course):

In [None]:
!wget www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii

This is how data in the file look like (we again use system call for *head* command):

In [None]:
!head monthly.ao.index.b50.current.ascii

Load data in to a variable:

In [None]:
ao = np.loadtxt('monthly.ao.index.b50.current.ascii')

In [None]:
ao

In [None]:
ao.shape

So it's a *row-major* order. Matlab and Fortran use *column-major* order for arrays.

In [None]:
type(ao)

Numpy arrays are statically typed, which allow faster operations

In [None]:
ao.dtype

You can't assign value of different type to element of the numpy array:

In [None]:
ao[0,0] = 'Year'

Slicing works similarly to Matlab:

In [None]:
ao[0:5,:]

One can look at the data. This is done by matplotlib.pylab module that we have imported in the beggining as `plt`. We will plot only first 780 poins:

In [None]:
plt.plot(ao[:780,2])

## Index slicing

In general it is similar to Matlab

First 12 elements of **second** column (months). Remember that indexing starts with 0:

In [None]:
ao[0:12,1]

First raw:

In [None]:
ao[0,:]

We can create mask, selecting all raws where values in second raw (months) equals 10 (October):

In [None]:
mask = (ao[:,1]==10)

Here we apply this mask and show only first 5 rowd of the array:

In [None]:
ao[mask][:5,:]

You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:

In [None]:
ao[ao[:,1]==10][-5:,:]

You can combine conditions. In this case we select October-December data (only first 10 elements are shown):

In [None]:
ao[(ao[:,1]>=10)&(ao[:,1]<=12)][0:10,:]

You can assighn values to subset of values (*thi expression fixes the problem with very small value at 2015-04*)

In [None]:
ao[ao<-10]=0

## Basic operations

Create example array from first 12 values of second column and perform some basic operations:

In [None]:
months = ao[0:12,1]
months

In [None]:
months+10

In [None]:
months*20

In [None]:
months*months

## Basic statistics

Create *ao_values* that will contain onlu data values:

In [None]:
ao_values = ao[:,2]

Simple statistics:

In [None]:
ao_values.min()

In [None]:
ao_values.max()

In [None]:
ao_values.mean()

In [None]:
ao_values.std()

In [None]:
ao_values.sum()

You can also use *np.sum* function:

In [None]:
np.sum(ao_values)

One can make operations on the subsets:

In [None]:
np.mean(ao[ao[:,1]==1,2]) # January monthly mean

Result will be the same if we use method on our selected data:

In [None]:
ao[ao[:,1]==1,2].mean()

## Saving data

You can save your data as a text file

In [None]:
np.savetxt('ao_only_values.csv',ao[:, 2], fmt='%.4f')

Head of resulting file:

In [None]:
!head ao_only_values.csv

You can also save it as binary:

In [None]:
f=open('ao_only_values.bin', 'w')
ao[:,2].tofile(f)
f.close()