# NumPy
**This note is based on Python for Data Analysis by Wes McKinney and uses Python 2.7**

Numerical Python (**NumPy**) is one of the fundamental packages for data analysis. One of the key features of NumPy is its N-dimensional array object, ndarray. One key thing to remember is that in an ndarray all the elements must be of the same type.

## The NumPy ndarray
The easiest way to create an array is to use the **array** function as

In [1]:
import numpy as np

myData1 = [0, 1, 2, 3, 4, 5, 6]
myArray1 = np.array(myData1)
myArray1

array([0, 1, 2, 3, 4, 5, 6])

Multidimensional array is also possible as

In [2]:
myData2 = [myData1, [6, 5, 4, 3, 2, 1, 0], [-3, -2, -1, 0, 1, 2, 3], [3, 2, 1, 0, -1, -2, -3]]
myArray2 = np.array(myData2)
myArray2

#In addition, np.zeros(), np.ones(), np.empty(), np.eye() and np.arange() create new but specific arrays. 
#np.eye() creates a square NxN identity matrix

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 6,  5,  4,  3,  2,  1,  0],
       [-3, -2, -1,  0,  1,  2,  3],
       [ 3,  2,  1,  0, -1, -2, -3]])

Every array has: 
* a shape, a tuple indicating the size of each dimension
* a dtype, an object describing the data type of the array

In [3]:
myArray2.shape

(4L, 7L)

In [4]:
myArray2.dtype

dtype('int32')

Due to vectorization, mathematical operations similar to between scalar elements can be performed on whole blocks of data elements. Any arithmetic operations between equal-sized arrays apply the operation elementwise. 

In [5]:
(myArray2 * 2) - myArray2

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 6,  5,  4,  3,  2,  1,  0],
       [-3, -2, -1,  0,  1,  2,  3],
       [ 3,  2,  1,  0, -1, -2, -3]])

An array of one dtype can be **cast** into another using astype method.

In [6]:
fMyArray2 = myArray2.astype(np.float)
fMyArray2

#While casting float to int dtype, the decimal part will be truncated. For example, 3.7 will be cast as 3.

array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.],
       [ 6.,  5.,  4.,  3.,  2.,  1.,  0.],
       [-3., -2., -1.,  0.,  1.,  2.,  3.],
       [ 3.,  2.,  1.,  0., -1., -2., -3.]])

An important distinction between NumPy array and Python's lists is that array slices are **views on (not copies of)** the original array.

In [7]:
arraySlice = myArray1[0:3]
arraySlice

array([0, 1, 2])

In [8]:
arraySlice[1] = -321
arraySlice

array([   0, -321,    2])

In [9]:
myArray1

#To make a copy of a slice, do myArray1[0:3].copy()

array([   0, -321,    2,    3,    4,    5,    6])

For higher dimensional arrays, individual elements can be accessed as

In [10]:
myArray2[0,2] #or myArray2[0][2]

2

and slicing can be performed as

In [11]:
myArray2[:, 1:]

array([[ 1,  2,  3,  4,  5,  6],
       [ 5,  4,  3,  2,  1,  0],
       [-2, -1,  0,  1,  2,  3],
       [ 2,  1,  0, -1, -2, -3]])

or by using boolean array

In [12]:
names = np.array(["Bob", "Joe", "Mary", "Jim"])
names == "Bob"

array([ True, False, False, False], dtype=bool)

In [13]:
myArray2[names == "Bob", 1:]

array([[1, 2, 3, 4, 5, 6]])

Certain values in an array can be set to different values as

In [14]:
myArray2[myArray2 < 2] = 0
myArray2

array([[0, 0, 2, 3, 4, 5, 6],
       [6, 5, 4, 3, 2, 0, 0],
       [0, 0, 0, 0, 0, 2, 3],
       [3, 2, 0, 0, 0, 0, 0]])

or by using conditional logic. Suppose we want to replace all values > 2 with 6 and all other values with -6; we can use **np.where(cond, x, y)** as

In [15]:
np.where(myArray2 > 2, 6, -6)

#np.where(myArray2>0, 6, myArray2) will set only positive values

array([[-6, -6, -6,  6,  6,  6,  6],
       [ 6,  6,  6,  6, -6, -6, -6],
       [-6, -6, -6, -6, -6, -6,  6],
       [ 6, -6, -6, -6, -6, -6, -6]])

Like Python's lists, NumPy arrays can be sorted.

In [16]:
myArray1.sort()
myArray1

array([-321,    0,    2,    3,    4,    5,    6])

Multidimensional arrays can have each 1D section of values sorted in-place along a axis by passing the axis number to sort.

In [17]:
#0-sort along column
myArray2.sort(0)
myArray2

array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [3, 2, 2, 3, 2, 2, 3],
       [6, 5, 4, 3, 4, 5, 6]])

In [18]:
#1-sort along row
myArray2.sort(1)
myArray2

array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [2, 2, 2, 2, 3, 3, 3],
       [3, 4, 4, 5, 5, 6, 6]])

### Extra stuff

**np.unique(x)** computes the sorted, unique elements in x.

In [19]:
np.unique(myArray2)

array([0, 2, 3, 4, 5, 6])

**np.in1d(x, y)** computes a boolean array indicating whether each element of x is contained in y

In [20]:
np.in1d(myArray1, myArray2)

array([False,  True,  True,  True,  True,  True,  True], dtype=bool)

The **numpy.random** module supplements the built-in Python random with functions for
efficiently generating whole arrays of sample values from many kinds of probability distribution.

In [21]:
myRandom = np.random.normal(size=(4, 4))
myRandom

array([[ 0.76601719,  0.06328602, -0.61371297,  1.0574164 ],
       [-0.93071015, -1.09099509, -0.27395115,  0.13082839],
       [ 0.56649714,  0.19013713, -2.35561508,  1.40942617],
       [ 0.20024395, -0.80631123,  0.75708554, -0.49743899]])