# Numpy (numerical arrays for numeric computation)

Numpy is the basic *Python* module for scientific computing in python.
Its most used object are the multidimensional arrays. 
These objects can have any number of dimensions with an efficient storage in the computer's RAM which makes data easy to handle and pass to other libraries. Furthermore, most ot numpy is implemented in C which makes it efficient and fast.


## multidimensional arrays

This is how numpy is usually imported and used to generate an `numpy array`

In [1]:
import numpy as np

In [2]:
data = [1, 10 , 2, 3, 8.0] # data is a list
a = np.array(data) # a is now a numpy array

In [3]:
type(a)

numpy.ndarray

In [4]:
a

array([  1.,  10.,   2.,   3.,   8.])

This gives the shape of the array

In [5]:
a.shape

(5,)

the number of dimensions

In [6]:
a.ndim

1

the number of elements

In [7]:
a.size

5

the number of bytes

In [8]:
a.nbytes

40

The attribute `dtype` describes the element data type

In [9]:
a.dtype

dtype('float64')

## Creating new arrays

Arrays can be created with nested lists

In [10]:
data = [[0.0, 2.0, 4.0, 6.0], [1.0, 3.0, 5.0, 7.0]]
b = np.array(data)

In [11]:
b

array([[ 0.,  2.,  4.,  6.],
       [ 1.,  3.,  5.,  7.]])

In [12]:
b.shape, b.ndim, b.size, b.nbytes

((2, 4), 2, 8, 64)

The function `arange` is similar to `range` but it creates an array and not a list

In [13]:
c = np.arange(10) 
c

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

the function `linspace` allows for the creation of equally spaced points

In [14]:
e = np.linspace(0.0, 10, 21) # 11 points
e

array([  0. ,   0.5,   1. ,   1.5,   2. ,   2.5,   3. ,   3.5,   4. ,
         4.5,   5. ,   5.5,   6. ,   6.5,   7. ,   7.5,   8. ,   8.5,
         9. ,   9.5,  10. ])

Similar to matlab, there are also functions like `empty`, `zeros` and `ones`.

In [15]:
np.empty((4,4))

array([[ -3.10503618e+231,   1.29074138e-231,   4.44659081e-323,
          0.00000000e+000],
       [  0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000],
       [  0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000],
       [  0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          1.12123085e-308]])

In [16]:
np.zeros((3,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [17]:
np.ones((3,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

## dtype

`dtype` (for data type) is the attribute with the data type for each element. 
This data type is usually implicit but can be enforced at the moment of creating the array

For instance, this is implicitly defined as an integer `dtype` 

In [18]:
a = np.array([0, 1, 2, 3])

In [19]:
a, a.dtype

(array([0, 1, 2, 3]), dtype('int64'))

But you could force the creation of a complex array

In [20]:
b = np.zeros((2,2), dtype=np.complex64)
b

array([[ 0.+0.j,  0.+0.j],
       [ 0.+0.j,  0.+0.j]], dtype=complex64)

or a float array

In [21]:
c = np.arange(0, 10, 2, dtype=np.float)
c

array([ 0.,  2.,  4.,  6.,  8.])

## Operations over arrays

Mathematical operations can be performed over the whole array without running a `for` loop.

For instance

In [22]:
a = np.linspace(0.0, 10.0, 5)
print(a)

b = np.ones(5)
print(b)

[  0.    2.5   5.    7.5  10. ]
[ 1.  1.  1.  1.  1.]


In [23]:
a * 2 # every element in the array is multiplied by 2

array([  0.,   5.,  10.,  15.,  20.])

In [24]:
a + b   #addition works element by element. The same goes for every operation

array([  1. ,   3.5,   6. ,   8.5,  11. ])

## Slicing

Slicing also works on arrays, only that this time it can be multidimensional

In [25]:
a = np.random.rand(5, 5)#this creates a two dimensional array of random numbers

In [26]:
print(a)

[[ 0.46673384  0.93816446  0.15089128  0.90435826  0.89789893]
 [ 0.1315048   0.75567328  0.70395713  0.43130335  0.48241155]
 [ 0.69224862  0.12928688  0.62040299  0.51077978  0.58189529]
 [ 0.31958193  0.1606222   0.7951777   0.05784479  0.88745521]
 [ 0.89240574  0.96131128  0.18703096  0.06485492  0.19988045]]


Each dimension has its own index

In [27]:
print(a[0,0], a[0,1])

0.466733843323 0.938164462229


to extract the values of a whole column the following syntax can be used

In [29]:
a[:,0]

array([ 0.46673384,  0.1315048 ,  0.69224862,  0.31958193,  0.89240574])

The last row could be extracted as follows

In [30]:
a[-1,:]

array([ 0.89240574,  0.96131128,  0.18703096,  0.06485492,  0.19988045])

slicing also works in ranges

In [31]:
a[0:2,0:3]

array([[ 0.46673384,  0.93816446,  0.15089128],
       [ 0.1315048 ,  0.75567328,  0.70395713]])

assignation also works with slicing

In [32]:
a[0:2,0:3] = -4.0

In [33]:
a

array([[-4.        , -4.        , -4.        ,  0.90435826,  0.89789893],
       [-4.        , -4.        , -4.        ,  0.43130335,  0.48241155],
       [ 0.69224862,  0.12928688,  0.62040299,  0.51077978,  0.58189529],
       [ 0.31958193,  0.1606222 ,  0.7951777 ,  0.05784479,  0.88745521],
       [ 0.89240574,  0.96131128,  0.18703096,  0.06485492,  0.19988045]])

### Exercise 1.1

Create an bidimensional array of random numbers with shape (4,8).
First, set the first column to `-1` and then set the second row to `2`

### Boolean indexint

arrays can be indexed using other boolean arrays.

For instance consider these two arrays considering the age and gender of a set of people

In [37]:
age = np.array([23, 56, 67, 89, 23, 56, 27, 12, 2, 72])
gender= np.array(['m', 'o', 'f', 'f', 'm', 'f', 'm', 'o' ,'m', 'o'])

Say that we want to select only the gender of people marked as `'o'` (other).

The following statement gives me a new boolean array. Each element tells me whether the condition is True or False 

In [40]:
ii = (gender == 'o')
print(ii)

[False  True False False False False False  True False  True]


Now if I want to have the ages of the people with gender `o` all I have to do is:

In [41]:
age[ii]

array([56, 12, 72])

In [None]:
This logic can be extended to different conditions

In [44]:
ii = (age > 10) & (age < 50) # & is the symbol for the logical AND
print(age[ii])
print(gender[ii])

[23 23 27 12]
['m' 'm' 'm' 'o']


The following is also a valid syntax

In [46]:
age[age>30]

array([56, 67, 89, 56, 72])

### Exercise 1.2


## Universal functions

Universal functions (or `ufuncs`) are functions that take arrays as inputs and return either arrays or scalar. They are characterized for being fast (implemented in C) and allowing to write simpler python code without using `for` loops. 
Here is a list of [all universal functions in numpy](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs)

Las funciones universales, o "ufuncs," son funciones que toman y retornan arreglos o números. Éstas tienen las siguientes características:

* Implementaciones vectorizadas en C, las cuales son mucho más rápidas que ciclos `for` en Python.
* Permiten escribir código mucho más compacto
* Aquí se encuentra una lista completa de  (en inglés).

For instance one could generate an array of values

In [51]:
t = np.linspace(0.0, np.pi, 10)
print(t)

[ 0.          0.34906585  0.6981317   1.04719755  1.3962634   1.74532925
  2.0943951   2.44346095  2.7925268   3.14159265]


and the compute the values of the `sin` function

In [52]:
print(np.sin(t))

[  0.00000000e+00   3.42020143e-01   6.42787610e-01   8.66025404e-01
   9.84807753e-01   9.84807753e-01   8.66025404e-01   6.42787610e-01
   3.42020143e-01   1.22464680e-16]
