# python-for-data-analysis-part-8-numpy

http://hamelg.blogspot.com/2015/11/python-for-data-analysis-part-8-numpy.html

Python's built-in datatypes lack features required for data analysis and machine learning; Operations like adding rows or columns, math operation, matrix multiplication, etc.

numpy is the library - numeric python
    
numpy has datatype ndarray. ndarray is n-dimensional array.

Diff between list and ndarray:
1. ndarray are homogeneous. They can contain items of one datatype only
2. ndarray can be multidimensional. Representing matrices is easy here

Numpy's arrays are great for performing calculations on numerical data, but most data sets you encounter in real life aren't homogeneous. Many data sets include a mixture of data types including numbers, text and dates, so they can't be stored in a single numpy array. Pandas DataFrames, a powerful data container that mirrors the structure of data tables you'd find in databases and spreadsheet programs like Microsoft Excel.

In [2]:
import numpy as np

#Create an ndarray by passing a list
my_lst = [1,2,3,4]

my_array = np.array(my_lst)
print(type(my_array))
print(my_array)

<class 'numpy.ndarray'>
[1 2 3 4]


To create an array with more than one dimension, pass a nested list to <font size="5">__np.array()__</font>

In [6]:
second_lst = [5,6,7,8]
two_d_array = np.array([my_lst, second_lst])
print(two_d_array)
print(type(two_d_array))

[[1 2 3 4]
 [5 6 7 8]]
<class 'numpy.ndarray'>


In [7]:
# To create an array with more than one dimension, pass a nested list to np.array()
# Change the dimensions and see the results
second_lst = [5,6,7]
two_d_array2 = np.array([my_lst, second_lst])
print(two_d_array2)
print(type(two_d_array2))

[list([1, 2, 3, 4]) list([5, 6, 7])]
<class 'numpy.ndarray'>


Characteristics of an ndarray:
1. number of dimensions
2. size of each dimension
3. type of data it holds

<font size="5">__ndarray.shape__</font> gives the number of dimensions and size of each dimension

In [13]:
print("shape of two_d_array:", two_d_array.shape)
print("shape of two_d_array2:", two_d_array2.shape)

shape of two_d_array: (2, 4)
shape of two_d_array2: (2,)


<font size="5">__ndarray.size__</font> gives the total number of items in the array

In [14]:
print("size of two_d_array:", two_d_array.size)
print("size of two_d_array2:", two_d_array2.size)

size of two_d_array: 8
size of two_d_array2: 2


<font size="5">__ndarray.dtype__</font> gives the type of the data in an ndarray

In [21]:
two_d_array.dtype

dtype('int32')

### Special array creation functions:
<font size="5">__np.identity__</font> to create a square 2d array with 1's across the diagonal
<font size="5">__np.eye__</font> to create a 2d array with 1's across a specified diagonal
<font size="5">__np.ones__</font> to create an array filled with ones
<font size="5">__np.zeros__</font> to create an array filled with zeros

In [23]:
np.identity(n = 5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [24]:
np.eye(N = 3,  # Number of rows
       M = 5,  # Number of columns
       k = 1)  # Index of the diagonal (main diagonal (0) is default)

array([[0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

In [25]:
np.ones(shape= [2,4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [26]:
np.zeros(shape= [4,6])

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

### Array Indexing and slicing

In [31]:
one_d_array = np.array([1,2,3,4,5,6])
one_d_array[3]

4

In [32]:
one_d_array[3:] # Get a slice from index  3 to the end

array([4, 5, 6])

In [33]:
one_d_array[::-1] # Reverse the array

array([6, 5, 4, 3, 2, 1])

In [36]:
# one another way to create a ndarray
two_d_array = np.array([one_d_array, one_d_array + 6, one_d_array + 12]) # seperate each row item by comma
two_d_array

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18]])

In [37]:
# Get the item at row 2, column 5
two_d_array[2,5]

18

In [38]:
# Reverse both dimensions (180 degree rotation)
two_d_array[::-1, ::-1]

array([[18, 17, 16, 15, 14, 13],
       [12, 11, 10,  9,  8,  7],
       [ 6,  5,  4,  3,  2,  1]])

In [39]:
two_d_array[::-1] 

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

In [41]:
two_d_array[::,::-1] 

array([[ 6,  5,  4,  3,  2,  1],
       [12, 11, 10,  9,  8,  7],
       [18, 17, 16, 15, 14, 13]])

In [45]:
two_d_array

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18]])

## Reshaping arrays
<font size="5">__np.reshape__</font> Use an array to create another array with same data but different shape
<font size="5">__np.unravel__</font> Unravel Multidimensional array into 1-dimensional array
<font size="5">__ndarray.flatten__</font> Flatten the ndarray into 1-d array
<font size="5">__ndarray.T__</font> Transpose of a matrix
<font size="5">__np.flipud__</font> Flip an array vertically
<font size="5">__np.flipur__</font> Flip an array horizontally
<font size="5">__np.rot90__</font> Rotate an array 90 degrees counter-clockwise
<font size="5">__np.roll__</font> Shift elements in an array along a given dimension
<font size="5">__np.concatenate__</font> Join arrays along an axis

In [46]:
np.reshape(a=two_d_array,  # array to reshape
           newshape=(6,3))  # dimensions of the new array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [51]:
np.ravel(a=two_d_array,
         order='C') # C style unravelling - by rows

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

In [52]:
np.ravel(a=two_d_array, 
         order='F') # Fortran style unravelling - by columns

array([ 1,  7, 13,  2,  8, 14,  3,  9, 15,  4, 10, 16,  5, 11, 17,  6, 12,
       18])

In [53]:
two_d_array.flatten()  # flatten the ndarray into 1-d array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

In [54]:
two_d_array.T  # transpose of an array

array([[ 1,  7, 13],
       [ 2,  8, 14],
       [ 3,  9, 15],
       [ 4, 10, 16],
       [ 5, 11, 17],
       [ 6, 12, 18]])

In [55]:
np.flipud(two_d_array) # flip vertically

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

In [56]:
np.fliplr(two_d_array) # flip horizontally

array([[ 6,  5,  4,  3,  2,  1],
       [12, 11, 10,  9,  8,  7],
       [18, 17, 16, 15, 14, 13]])

In [58]:
np.rot90(two_d_array, 
         k=1)   # number of 90 degree rotations

array([[ 6, 12, 18],
       [ 5, 11, 17],
       [ 4, 10, 16],
       [ 3,  9, 15],
       [ 2,  8, 14],
       [ 1,  7, 13]])

In [62]:
np.roll(a=two_d_array,
       shift=2,   # shift elements 2 positions
       axis=1)    # In each row

array([[ 5,  6,  1,  2,  3,  4],
       [11, 12,  7,  8,  9, 10],
       [17, 18, 13, 14, 15, 16]])

In [61]:
# leave the axis argument to shift on a flattened version of the array, i.e., to shift on all dimensions
np.roll(a=two_d_array,
       shift=2,   # shift elements 2 positions
       )

array([[17, 18,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16]])

In [63]:
# join arrays along an axis
array_to_join = np.array([[10,20,30], [40,50,60], [70,80,90]])
np.concatenate((two_d_array, array_to_join), axis=1)

array([[ 1,  2,  3,  4,  5,  6, 10, 20, 30],
       [ 7,  8,  9, 10, 11, 12, 40, 50, 60],
       [13, 14, 15, 16, 17, 18, 70, 80, 90]])

## Array Math Operations

In [64]:
two_d_array + 100  # add 100 to each item

array([[101, 102, 103, 104, 105, 106],
       [107, 108, 109, 110, 111, 112],
       [113, 114, 115, 116, 117, 118]])

In [66]:
two_d_array - 100   # subtract 100 from each item

array([[-99, -98, -97, -96, -95, -94],
       [-93, -92, -91, -90, -89, -88],
       [-87, -86, -85, -84, -83, -82]])

In [67]:
two_d_array*2   # Multiply each element by 2

array([[ 2,  4,  6,  8, 10, 12],
       [14, 16, 18, 20, 22, 24],
       [26, 28, 30, 32, 34, 36]])

In [68]:
two_d_array/2   # Divide each element by 2

array([[0.5, 1. , 1.5, 2. , 2.5, 3. ],
       [3.5, 4. , 4.5, 5. , 5.5, 6. ],
       [6.5, 7. , 7.5, 8. , 8.5, 9. ]])

In [69]:
two_d_array ** 2     # square each array element

array([[  1,   4,   9,  16,  25,  36],
       [ 49,  64,  81, 100, 121, 144],
       [169, 196, 225, 256, 289, 324]], dtype=int32)

In [70]:
two_d_array % 2    # Modulus of each array element

array([[1, 0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1, 0],
       [1, 0, 1, 0, 1, 0]], dtype=int32)

All the above operations have been done with a single scalar value and an ndarray. The same operations can be done on two arrays with the same shape

In [73]:
small_array1 = np.array([[1,2],[3,4]])
small_array1 + small_array1

array([[2, 4],
       [6, 8]])

In [74]:
small_array1 - small_array1

array([[0, 0],
       [0, 0]])

In [75]:
small_array1 * small_array1

array([[ 1,  4],
       [ 9, 16]])

In [76]:
small_array1 / small_array1

array([[1., 1.],
       [1., 1.]])

In [77]:
small_array1 ** small_array1

array([[  1,   4],
       [ 27, 256]], dtype=int32)

## Named math functions:
<font size="5">__np.mean__</font> 
<font size="5">__np.std__</font> 
<font size="5">__np.sum__</font> 
<font size="5">__np.log__</font> 
<font size="5">__np.sqrt__</font> 
<font size="5">__np.dot__</font>  dot product of two vectors

In [78]:
np.mean(two_d_array)

9.5

In [81]:
# provide axis argument, to get mean across that dimension
np.mean(two_d_array, axis=1)  # mean along axis=1, i.e., mean of each row

array([ 3.5,  9.5, 15.5])

In [85]:
np.std(two_d_array) # std of the whole array

5.188127472091127

In [86]:
np.std(two_d_array, axis=1) # std along each row

array([1.70782513, 1.70782513, 1.70782513])

In [87]:
np.std(two_d_array, axis=0)   # std along each column

array([4.89897949, 4.89897949, 4.89897949, 4.89897949, 4.89897949,
       4.89897949])

In [88]:
np.sum(two_d_array, axis=0)

array([21, 24, 27, 30, 33, 36])

In [89]:
np.sum(two_d_array)

171

In [90]:
np.sum(two_d_array, axis=1)

array([21, 57, 93])

In [92]:
np.log(two_d_array)   # Take the log of each element in an array with np.log()

array([[0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
        1.79175947],
       [1.94591015, 2.07944154, 2.19722458, 2.30258509, 2.39789527,
        2.48490665],
       [2.56494936, 2.63905733, 2.7080502 , 2.77258872, 2.83321334,
        2.89037176]])

In [93]:
np.sqrt(two_d_array) # Take the square root of each element with np.sqrt()

array([[1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
        2.44948974],
       [2.64575131, 2.82842712, 3.        , 3.16227766, 3.31662479,
        3.46410162],
       [3.60555128, 3.74165739, 3.87298335, 4.        , 4.12310563,
        4.24264069]])

In [95]:
# Take the vector dot product of row 0 and row 1
np.dot(two_d_array[0,0:], two_d_array[1,0:])

217

In [96]:
# Do a matrix multiplication
np.dot(small_array1, small_array1)

array([[ 7, 10],
       [15, 22]])