# NUMPY. Numerical Computing with Python

Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types and totally Zen.

---

However, it was not designed specifically for mathematical and scientific computing.
In particular, Python lists are very flexible containers, but they are poorly suited to represent efficiently common mathematical constructs like vectors and matrices. 

Fortunately, exists the **numpy** package (module) which is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. It is used in almost all numerical computation using Python.



Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementating such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are statically typed and homogeneous. The type of the elements is determined when array is created.
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).


# 1. Basics of Numpy

To use **numpy** it is needed to import the module:

In [1]:
import numpy as np

## Creating numpy arrays
There are a number of ways to initialize new numpy arrays, for example from

1. A Python list or tuples
2. Using array-generating functions, such as `arange`, `linspace`, etc.
3. Reading data from files

### 1. From a list
For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [2]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [3]:
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [4]:
M = np.array([[1, 2], [3, 4]], dtype=int)
M

array([[1, 2],
       [3, 4]])

Common type that can be used with dtype are: int, float, complex, bool, object, etc.

We can also explicitly define the bit size of the data types, for example: int64, int16, float128, complex128.

### 2. Using array-generating functions
For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generates arrays of different forms. Some of the more common are:

**Zeros and Ones**

In [5]:
np.zeros(5, dtype=float)

array([0., 0., 0., 0., 0.])

In [6]:
np.ones(5,dtype=float)

array([1., 1., 1., 1., 1.])

In [7]:
np.zeros((2,3),dtype=np.int64)

array([[0, 0, 0],
       [0, 0, 0]])

**arange**

In [9]:
x = np.arange(0, 20, 1) # arguments: start, stop, step
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

**linspace and logspace**

In [10]:
print ("A linear grid of 5 elements between 0 and 1:")
print (np.linspace(0, 1, 5))


A linear grid of 5 elements between 0 and 1:
[0.   0.25 0.5  0.75 1.  ]


In [11]:
print ("A logarithmic grid of 10 elenebts between 10**0 and 10**3:")
print (np.logspace(0, 3, 10))

A logarithmic grid of 10 elenebts between 10**0 and 10**3:
[   1.            2.15443469    4.64158883   10.           21.5443469
   46.41588834  100.          215.443469    464.15888336 1000.        ]


**Creating random arrays**

In [12]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

array([[1.49538785e-01, 2.48257779e-03, 7.95542421e-01, 6.00194359e-01,
        9.56517268e-01],
       [1.07225764e-04, 5.04863029e-01, 2.77767465e-01, 7.83322052e-02,
        3.03566065e-01],
       [3.87577154e-01, 4.53217450e-01, 7.87776669e-01, 9.33270969e-01,
        5.02237278e-01],
       [8.71437783e-01, 2.95223916e-01, 5.83776032e-01, 1.25338729e-01,
        7.71510939e-01],
       [8.62039266e-01, 2.94228554e-01, 5.18622836e-01, 6.32431264e-02,
        8.25150254e-01]])

In [12]:
# 5 samples from a normal distribution with a mean of 10 and a variance of 3:
np.random.normal(10, 3, 5)

array([11.62110373, 11.30734275, 10.07311913, 11.62262477,  7.66663036])

**diag**

In [13]:
# a diagonal matrix
np.diag([1,1,1,1])

array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])

### 3. From files


**Comma-separated values (CSV)**

A very common file format for data files are the comma-separated values (CSV), or related format such as TSV (tab-separated values).
Open data from https://github.com/datasets

[vix-daily.csv](https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv)


In [14]:
np.genfromtxt?

In [11]:
# Open data from https://github.com/datasets
data = np.genfromtxt('https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv'\
                     ,skip_header=1,delimiter=',')
data

array([[  nan, 17.96, 18.68, 17.54, 18.22],
       [  nan, 18.45, 18.49, 17.44, 17.49],
       [  nan, 17.66, 17.67, 16.19, 16.73],
       ...,
       [  nan, 11.51, 12.59, 11.39, 12.25],
       [  nan, 12.06, 12.83, 11.55, 12.23],
       [  nan, 12.2 , 12.45, 11.1 , 11.28]])

## NaN

By definition, NaN is a float point number which is not equal to any other number 


In [16]:
np.nan != np.nan

True

Thus, the equal operator can not be used for detecting NaN

In [17]:
data== np.nan

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       ...,
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

Instead, isnan function is used:

In [18]:
np.isnan(data)

array([[ True, False, False, False, False],
       [ True, False, False, False, False],
       [ True, False, False, False, False],
       ...,
       [ True, False, False, False, False],
       [ True, False, False, False, False],
       [ True, False, False, False, False]])

We can skip one or more columns when importing:

In [12]:
# Open data from https://github.com/datasets
data = np.genfromtxt('https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv'\
                     ,skip_header=1,delimiter=',',usecols=[1,2,3,4])
data

array([[17.96, 18.68, 17.54, 18.22],
       [18.45, 18.49, 17.44, 17.49],
       [17.66, 17.67, 16.19, 16.73],
       ...,
       [11.51, 12.59, 11.39, 12.25],
       [12.06, 12.83, 11.55, 12.23],
       [12.2 , 12.45, 11.1 , 11.28]])

Using `numpy.savetxt` we can store a Numpy array to a file in CSV format:

In [20]:
M = np.random.rand(3,3)
np.savetxt("random-matrix.csv", M, delimiter=',')
print (M)
%cat random-matrix.csv

[[0.35313319 0.90071611 0.14922998]
 [0.83083794 0.31411154 0.54893899]
 [0.39137723 0.58563865 0.79094426]]
3.531331927601987219e-01,9.007161086681390039e-01,1.492299759353657995e-01
8.308379448470587514e-01,3.141115383857504550e-01,5.489389911650369713e-01
3.913772326594895379e-01,5.856386495695070638e-01,7.909442589560397030e-01


In [21]:
np.savetxt("random-matrix.csv", M, fmt='%.5f', delimiter= ',') # fmt specifies the format
%cat random-matrix.csv

0.35313,0.90072,0.14923
0.83084,0.31411,0.54894
0.39138,0.58564,0.79094


To read data from such file into Numpy arrays we can use the `numpy.genfromtxt` function.


In [22]:
data = np.genfromtxt('random-matrix.csv',delimiter=',')
data

array([[0.35313, 0.90072, 0.14923],
       [0.83084, 0.31411, 0.54894],
       [0.39138, 0.58564, 0.79094]])

**Numpy's native file format**

In [23]:
np.save("random-matrix.npy", M)
#!file random-matrix.npy

In [24]:
np.load("random-matrix.npy")


array([[0.35313319, 0.90071611, 0.14922998],
       [0.83083794, 0.31411154, 0.54893899],
       [0.39137723, 0.58563865, 0.79094426]])

## Manipulating arrays


In [25]:
lst = [10, 20, 30, 40] #python list
arr = np.array([10, 20, 30, 40],dtype='int64') #numpy array
M = np.array([[10, 20, 30, 40],[50, 60, 70, 80]]) #numpy matrix

### Element indexing 


In [26]:
#get the first element of list
lst[0]

10

In [27]:
#get the first element of array
arr[0]

10

In [28]:
# M is a matrix, or a 2 dimensional array, taking two indices 
print (M)
#M[row][col] or M[row,col]
print (M[0][0]) # element from first row first column 
print (M[0,0]) # element from first row first column 
print (M[1,1]) # element from second row second column
print (M[1,2]) 

[[10 20 30 40]
 [50 60 70 80]]
10
10
60
70


If we omit an index of a multidimensional array it returns the whole row
(or, in general, a N-1 dimensional array)

In [29]:
M[1] # second row

array([50, 60, 70, 80])

The same thing can be achieved with using `:` instead of an index: 

In [30]:
M[1,:] # second row, all columns 

array([50, 60, 70, 80])

In [31]:
M[:,3] # all rows, fourth column 

array([40, 80])

We can assign new values to elements in an array using indexing:

In [32]:
M[0,0] = 1
M

array([[ 1, 20, 30, 40],
       [50, 60, 70, 80]])

In [33]:
# also works for rows and columns
M[1,:] = 0
M

array([[ 1, 20, 30, 40],
       [ 0,  0,  0,  0]])

In [34]:
M[:,2] = -1
M

array([[ 1, 20, -1, 40],
       [ 0,  0, -1,  0]])

Arrays are homogeneous; i.e. all elements of an array must be of the same type


In [35]:
#Lists are heterogeneous
lst[1] = 'a string inside a list'
lst

[10, 'a string inside a list', 30, 40]

In [36]:
#Arrays are homogeneous
print( arr.dtype)
arr[1] = 'a string inside an array'


int64


ValueError: invalid literal for int() with base 10: 'a string inside an array'

Once an array has been created, its dtype is fixed and it can only store elements of the same type. For this example where the dtype is integer, if we store a floating point number it will be automatically converted into an integer:

In [37]:
arr[1] = 1.99
arr

array([10,  1, 30, 40])

### Index slicing 

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [38]:
A = np.array([1,2,3,4,5])
#slice from second to fourth element, step is one
A[1:3:1]

array([2, 3])

Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:

In [39]:
A[1:3:1] = [-2,-3]
A

array([ 1, -2, -3,  4,  5])

We can omit any of the three parameters in `M[lower:upper:step]`, by default `lower` is the beginning , `upper` is the end of the array, and `step` is one

In [40]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

array([ 1, -3,  5])

In [41]:
A[:3] # first three elements

array([ 1, -2, -3])

In [42]:
A[3:] # elements from index 3

array([4, 5])

Negative indices counts from the end of the array:

In [43]:
A[-1] # the last element in the array

5

In [44]:
A[-3:] # the last three elements

array([-3,  4,  5])

In [45]:
print(A)
A[::-1] #Step backwards, it returns an array with elements in reverse order

[ 1 -2 -3  4  5]


array([ 5,  4, -3, -2,  1])

Index slicing works exactly the same way for multidimensional arrays, but every dimension separated by comma:

In [46]:
M

array([[ 1, 20, -1, 40],
       [ 0,  0, -1,  0]])

In [47]:
#a block from the original array
#all rows, two central columns
M[:, 1:3]


array([[20, -1],
       [ 0, -1]])

In [48]:
# all row, skiping even columns
M[:, ::2]

array([[ 1, -1],
       [ 0, -1]])

You can master your **index slicing** abilities by resolving the exercises at the end of this
notebook

In [120]:
# LGM: Podries mirar d’incloure un exercici molt senzill: Crear un array, modificar valors, utilitzar slicing, etc… ? Donar 5 - 10 mins pq ho facin i després fer la demo

### Comparison operators and value testing 

Boolean comparisons can be used to compare members elementwise on arrays of equal size.

In [49]:
a = np.array([1, 3, 0], float) 
b = np.array([0, 3, 2], float) 
print (a > b )
print (a == b )
print (a <= b )

[ True False False]
[False  True False]
[False  True  True]


In [50]:
a = np.array([1, 3, 0], float) 
a > 2

array([False,  True, False])

The <code>any</code> and <code>all</code> operators can be used to determine whether or not any or all elements of a 
Boolean array are true: 

In [51]:
c = np.array([ True, False, False], bool) 
print (any(c), all(c))
any([False,False])

True False


False

The ``where`` function forms a new array from two arrays of equivalent size using a Boolean filter  to choose between elements of the two. Its basic syntax is: <br>
<code>where(boolarray, truearray, falsearray)</code>

In [52]:
a = np.array([1, 3, 0], float) 
np.where(a != 0, 1/a, 0) 


  np.where(a != 0, 1/a, 0)


array([1.        , 0.33333333, 0.        ])

### Indexing with other arrays  (*Fancy indexing*)

Arrays allow for a more sophisticated kind of indexing: you can index an array with another array, and in particular with an array of boolean values.  This is particluarly useful to **filter**
information from an array that matches a certain condition.

In [53]:
arr = np.array([10,8,30,40])
print (arr)
mask = arr < 9 # construct a boolean array 
               #where i-th eleement is True if the i-th element of arr is less than 9
mask

[10  8 30 40]


array([False,  True, False, False])

In [54]:
print ('Values below 9:', arr[mask])

Values below 9: [8]


The index mask can be converted to position index using the `where` function

In [55]:
print (mask)
indices = np.where(mask)
indices

[False  True False False]


(array([1]),)

In [56]:
print ('Resetting all values below 9 to 10...')
print (arr < 9)
arr[arr < 9] = 10
print (arr)
arr < 9

Resetting all values below 9 to 10...
[False  True False False]
[10 10 30 40]


array([False, False, False, False])

It is also possible to select using **integer arrays** that represent indexes.

In [57]:
print (arr)
row_indices = [1, 2 ,3]
arr[row_indices]

[10 10 30 40]


array([10, 30, 40])

In [58]:
a = np.array([2, 4, 6, 8], float) 
indices = np.array([0, 0, 1, 3, 2, 1], int)  # the 0th, 0th, 1st, 3rd, 2nd, and 1st elements of a
a[indices] 

array([2., 2., 4., 8., 6., 4.])

For multidimensional arrays, we have to set up one one-dimensional integer array for each axis.

In [59]:
a = np.array([[1, 4], [9, 16]], float) 
print (a)
b = np.array([0, 0, 1, 1, 1], int) 
c = np.array([0, 1, 1, 1, 0], int) 
a[b,c] 

[[ 1.  4.]
 [ 9. 16.]]


array([ 1.,  4., 16., 16.,  9.])

In [121]:
# LGM: Incloure petit exercici. Podria ser una pregunta tipus test?

## Array Attributes and Methods
The information about the type of an array is contained in its *dtype* attribute:

In [60]:
# arr is an object of the type ndarray that the numpy module provides.
type(arr)

numpy.ndarray

In [61]:
arr.dtype

dtype('int64')

The difference between the `arr` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

In [62]:
arr = np.array([10,10,30,40])
print (arr)
arr.shape

[10 10 30 40]


(4,)

In [63]:
print( M)
M.shape

[[ 1 20 -1 40]
 [ 0  0 -1  0]]


(2, 4)

**Don't confuse a matrix with only one row with a vector!!!**, the shapes are not equal!

In [64]:
a1 = np.array([[10,10,30,40]])
print (arr)
print (arr.shape)
print (a1)
print (a1.shape)

[10 10 30 40]
(4,)
[[10 10 30 40]]
(1, 4)


The number of elements in the array is available through the `ndarray.size` property:

In [65]:
M.size

8

Equivalently, we could use the function `numpy.shape` and `numpy.size`

In [66]:
np.shape(M)

(2, 4)

In [67]:
np.size(M)

8

### More atrributes 

In [68]:
arr.itemsize # bytes per element, int64 -> (8bytes)

8

In [69]:
arr.nbytes # number of bytes 8*4

32

In [70]:
print ("Num dim arr:", arr.ndim, "Num dim M:", M.ndim) # number of dimensions

Num dim arr: 1 Num dim M: 2


### Useful Methods 

NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. Among these are the functions: <code> abs,sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, </code> and <code>arctanh </code>. 


In [71]:
a = np.array([1, 4, 9], float) 
np.sqrt(a)

array([1., 2., 3.])

In [72]:
a[0] = 8
print (a)
print ('Minimum and maximum             :', a.min(), a.max())
print ('Sum and product of all elements :', a.sum(), a.prod())
print ('Mean and standard deviation     :', a.mean(), a.std())

[8. 4. 9.]
Minimum and maximum             : 4.0 9.0
Sum and product of all elements : 21.0 288.0
Mean and standard deviation     : 7.0 2.160246899469287


If we want to know which index is the maximum or minimum, it can be done using `argmax` and `argmin`

In [73]:
print (a)
np.argmax(a)

[8. 4. 9.]


2

For these methods, the above operations area all computed on all the elements of the array.  But for a multidimensional array, it's possible to do the computation along a single dimension, by passing the `axis` parameter; for example:

In [74]:
print ('For the following array:\n', M)
print ('The sum of all elements is    :', M.sum())
print ('The sum of elements along the columns is :', M.sum(axis=0))
print ('The sum of elements along the rows is    :', M.sum(axis=1))


For the following array:
 [[ 1 20 -1 40]
 [ 0  0 -1  0]]
The sum of all elements is    : 59
The sum of elements along the columns is : [ 1 20 -2 40]
The sum of elements along the rows is    : [60 -1]


To find unique values in array, we can use the `unique` function:

In [75]:
print (arr)
np.unique(arr)

[10 10 30 40]


array([10, 30, 40])

### Reshaping, resizing and stacking arrays

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays.

In [76]:
print (M)
n, m = M.shape
n,m

[[ 1 20 -1 40]
 [ 0  0 -1  0]]


(2, 4)

In [77]:
B = M.reshape(n*m) #matrix to array
print (B.shape)
B

(8,)


array([ 1, 20, -1, 40,  0,  0, -1,  0])

Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:

In [78]:
a = np.array([[1, 2], [3, 4]])
a

array([[1, 2],
       [3, 4]])

In [79]:
# repeat each element 3 times
np.repeat(a, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [80]:
# tile the matrix 3 times 
np.tile(a, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

In [81]:
np.concatenate((a, np.array([[5, 6]])), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

For transposing a matrix, it can be done using the array property T :

In [82]:
np.concatenate((a, np.array([[5, 6]]).T), axis=1)

array([[1, 2, 5],
       [3, 4, 6]])

**hstack** and **vstack** : shortcuts for concatenate horizontally and vertically

In [83]:
np.vstack((a,np.array([[5, 6]])))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [84]:
np.hstack((a,np.array([[5, 6]]).T))

array([[1, 2, 5],
       [3, 4, 6]])

## Copy and "deep copy"

To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: pass by reference).

In [85]:
A = np.array([[1, 2], [3, 4]])
A

array([[1, 2],
       [3, 4]])

In [86]:
# now B is referring to the same array data as A 
B = A 

In [87]:
# changing B affects A
B[0,0] = 10
B

array([[10,  2],
       [ 3,  4]])

In [88]:
A

array([[10,  2],
       [ 3,  4]])

If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called "deep copy" using the function copy:

In [89]:
B = np.copy(A)

In [90]:
# now, if we modify B, A is not affected
B[0,0] = -5
B

array([[-5,  2],
       [ 3,  4]])

In [91]:
A

array([[10,  2],
       [ 3,  4]])

## Operating with arrays
Arrays support all regular arithmetic operators, and the numpy library also contains a complete collection of basic mathematical functions that operate on arrays.  It is important to remember that in general, all operations with arrays are applied *element-wise*, i.e., are applied to all the elements of the array at the same time. 

In [92]:
v1 = np.arange(0, 4)
v1

array([0, 1, 2, 3])

In [93]:
v1 * 2

array([0, 2, 4, 6])

In [94]:
v1 + 2

array([2, 3, 4, 5])

In [95]:
M*2

array([[ 2, 40, -2, 80],
       [ 0,  0, -2,  0]])

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:

In [96]:
print (M)
M*M

[[ 1 20 -1 40]
 [ 0  0 -1  0]]


array([[   1,  400,    1, 1600],
       [   0,    0,    1,    0]])

In [97]:
v1*v1

array([0, 1, 4, 9])

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [98]:
M.shape, v1.shape

((2, 4), (4,))

In [99]:
print (M)
print (v1)
M * v1

[[ 1 20 -1 40]
 [ 0  0 -1  0]]
[0 1 2 3]


array([[  0,  20,  -2, 120],
       [  0,   0,  -2,   0]])

What about matrix mutiplication?  We can  use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: 

In [100]:
print (M)
print (v1)
np.dot(M,v1)

[[ 1 20 -1 40]
 [ 0  0 -1  0]]
[0 1 2 3]


array([138,  -2])

In [101]:
np.dot(v1,v1)

14

### Broadcasting 

Broadcasting means that, in principle, arrays must always match in their dimensionality in order for an operation to be valid, numpy will *broadcast* dimensions when possible. Previous examples of operations with an scalar and a vector is broadcasting:

In [102]:
print (v1)
v1 + 5   # broadcasting => [0 1 2 3] + [5 5 5 5]

[0 1 2 3]


array([5, 6, 7, 8])

We can also broadcast a 1D array to a 2D array, in this case adding a vector to all rows of a matrix:

In [103]:
np.ones((4, 4)) + v1 # broadcasting = np.ones(4,4) + np.tile(v1,4)

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

We can also broadcast in two directions at a time:

In [104]:
print (v1.reshape((4, 1)))
print (np.arange(4))
v1.reshape((4, 1)) + np.arange(4)

[[0]
 [1]
 [2]
 [3]]
[0 1 2 3]


array([[0, 1, 2, 3],
       [1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6]])

**Rules of Broadcasting**

Broadcasting follows the next algorithm:

1. If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.

2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Note that all of this happens without ever actually creating the stretched arrays in memory! This broadcasting behavior is in practice enormously powerful, especially because when numpy broadcasts to create new dimensions or to `stretch` existing ones, it doesn't actually replicate the data. 


In the first example: 

    v1 + 5

the operation is carried as if the 5 was a 1-d array with 5 in all of its entries, but no actual array was ever created.

In the example

    v1.reshape((4, 1)) + np.arange(4)
    
- the second array is 'promoted' to a 2-dimensional array of shape (1, 4)
- the second array is 'stretched' to shape (4, 4)
- the first array is 'stretched' to shape (4, 4)

Then the operation proceeds as if on two 4 $\times$ 4 arrays.

In [105]:
#Broadcasting unrolled
print (np.tile(np.arange(4),(4,1)))
print (np.tile(v1.reshape((4,1)),4))
np.tile(np.arange(4),(4,1)) +  np.tile(v1.reshape((4,1)),4)

[[0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]]
[[0 0 0 0]
 [1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]


array([[0, 1, 2, 3],
       [1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6]])

### Visualizing Broadcasting

<img src="http://www.astroml.org/_images/fig_broadcast_visual_1.png">

([image source](http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html))

Sometimes, however, we can use the ``newaxis`` constant to add a new dimension to any array to specify how we want to broadcast:

In [55]:
a = np.zeros((2,2), float) 
b = np.array([-1., 3.], float) 
print (a,"\n", b,"\n")
print (a + b,"\n") 

print(b[np.newaxis,:],"\n")
print (a + b[np.newaxis,:],"\n") 

print(b[:,np.newaxis],"\n")
print (a + b[:,np.newaxis]) 

b[:,np.newaxis]

[[0. 0.]
 [0. 0.]] 
 [-1.  3.] 

[[-1.  3.]
 [-1.  3.]] 

[[-1.  3.]] 

[[-1.  3.]
 [-1.  3.]] 

[[-1.]
 [ 3.]] 

[[-1. -1.]
 [ 3.  3.]]


array([[-1.],
       [ 3.]])

## Exercises

1. Create a 4X2 random integer array and rints its attributes. Print the following Attributes: 

    * The shape of an array.
    * Array dimensions.
    * The size of each element of the array in bytes.

In [6]:
import numpy as np
#Your Code here

a = np.random.rand(4,2)
print(a)
print(a.dtype)
print("Shape: {}, N.Dim: {}, ItemSize: {}".format(a.shape, a.ndim, a.itemsize))


[[0.97506704 0.12663932]
 [0.58452672 0.25993392]
 [0.60203548 0.46049815]
 [0.23478922 0.3088124 ]]
float64
Shape: (4, 2), N.Dim: 2, ItemSize: 8


    
2. Create a 5X2 integer array from a range between 100 to 200 such that the difference between each element is 10:

OUTPUT:

```array([[100, 110],
       [120, 130],
       [140, 150],
       [160, 170],
       [180, 190]])
```

In [8]:
#Your Code here
sampleArray = np.arange(100, 200, 10)
print(sampleArray)
sampleArray = sampleArray.reshape(5,2)
sampleArray

[100 110 120 130 140 150 160 170 180 190]


array([[100, 110],
       [120, 130],
       [140, 150],
       [160, 170],
       [180, 190]])

3. Following is the provided numPy array. Return array of items by taking the third column from all rows


In [9]:
sampleArray = np.array([[11 ,22, 33], [44, 55, 66], [77, 88, 99]])
#Your Code here
sampleArray[:,2]

array([33, 66, 99])

4. Create a result array by adding the following two NumPy arrays, in such way:  (Hint use newaxis in b) 


$[1,2,3] + [1], [4,5,6]+ [2], [1,2,3]+ [3], [4,5,6]+[4] ...$

    
    Next,  calculate the square of each element

OUTPUT

array([[[  4,   9,  16],
        [ 36,  49,  64]],

       [[ 16,  25,  36],
        [ 64,  81, 100]],

       [[ 36,  49,  64],
        [100, 121, 144]]])


In [30]:
a = np.array([[1, 2, 3], [4 ,5, 6]])
b = np.array([[1, 2], [3 ,4],[5,6]])
#Your Code here

print(a)
print(b[:,:,np.newaxis])
result = a+b[:,:,np.newaxis]
result*result

[[1 2 3]
 [4 5 6]]
[[[1]
  [2]]

 [[3]
  [4]]

 [[5]
  [6]]]


array([[[  4,   9,  16],
        [ 36,  49,  64]],

       [[ 16,  25,  36],
        [ 64,  81, 100]],

       [[ 36,  49,  64],
        [100, 121, 144]]])

5. In the following table we have expression values for 5 genes at 4 time points. 

In [31]:
from IPython.lib import display as dp
%cat 'genes.csv' 
dp.FileLink('genes.csv')



Gene name,4h,12h,24h,48h
A2M,0.12,0.08,0.06,0.02
FOS,0.01,0.07,0.11,0.09
BRCA2,0.03,0.04,0.04,0.02
CPOX,0.05,0.09,0.11,0.14

   - Create a single array for the data (4x4)
   - Find the mean value per gene
   - Find the mean value per time point
   - Which gene has the maximum mean expression value? (return the whole gene row)

In [33]:
#Read File skiping header, and first column
gen_a = np.genfromtxt('genes.csv',delimiter=',',skip_header=1, usecols=[1,2,3,4])
print(gen_a)
print("Mean Value per gene: {}".format(np.mean(gen_a,axis=1)))
print("Mean Value per time point: {}".format(np.mean(gen_a,axis=0)))
print("Gene with maximum mean expression value: {}".format(gen_a[np.argmax(np.mean(gen_a,axis=1)),:]))


#Your Code here

[[0.12 0.08 0.06 0.02]
 [0.01 0.07 0.11 0.09]
 [0.03 0.04 0.04 0.02]
 [0.05 0.09 0.11 0.14]]
Mean Value per gene: [0.07   0.07   0.0325 0.0975]
Mean Value per time point: [0.0525 0.07   0.08   0.0675]
Gene with maximum mean expression value: [0.05 0.09 0.11 0.14]


# Further reading

* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

<div class="alert alert-info">`ipythonblocks` is a teaching tool that allows students to experiment with Python flow control concepts and immediately see the effects of their code represented in a colorful, attractive way. BlockGrid objects can be **indexed and sliced like 2D NumPy arrays** making them good practice for learning how to access arrays. </div>

In [107]:
import os
import numpy as np
os.chdir('./modules/')
from ipythonblocks import BlockGrid
from ipythonblocks import colors
os.chdir('..')
grid = BlockGrid(8, 8, fill=(123, 234, 123))
grid.show()

In [108]:
a = np.array(np.zeros([8,8],dtype='int64'))
a

array([[0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [109]:
grid[0, 0] #access to [0,0] element

In [110]:
grid[0:2,:] = colors['Teal']
grid.show()

In [111]:
a[0:2,:] = 1
a

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [112]:
grid[2,1:] = colors['Blue']
grid.show()

In [113]:
a[2,1:] = 2
a

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [0, 2, 2, 2, 2, 2, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [114]:
grid[:2,2:3] = colors['Peru']
grid.show()

In [115]:
a[:2,2:3] = 3
a

array([[1, 1, 3, 1, 1, 1, 1, 1],
       [1, 1, 3, 1, 1, 1, 1, 1],
       [0, 2, 2, 2, 2, 2, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [116]:
grid[:,::2] = colors['Peru']
grid.show()

In [117]:
a[:,::2] = 4
a

array([[4, 1, 4, 1, 4, 1, 4, 1],
       [4, 1, 4, 1, 4, 1, 4, 1],
       [4, 2, 4, 2, 4, 2, 4, 2],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [4, 0, 4, 0, 4, 0, 4, 0]])

In [118]:
grid[::2,::3] = colors['Red']
grid.show()

In [119]:
a[::2,::3] = 5
a

array([[5, 1, 4, 5, 4, 1, 5, 1],
       [4, 1, 4, 1, 4, 1, 4, 1],
       [5, 2, 4, 5, 4, 2, 5, 2],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [5, 0, 4, 5, 4, 0, 5, 0],
       [4, 0, 4, 0, 4, 0, 4, 0],
       [5, 0, 4, 5, 4, 0, 5, 0],
       [4, 0, 4, 0, 4, 0, 4, 0]])