# Basics of Numpy and Pandas


---

This notebook discusses basics of two most important Python libraries for data analytics and statistical modeling - `Numpy` and `Pandas`,

### Numpy

---

* Numpy array - from list, special functions
* Array operations
* 2-D arrays
* Indexing and slicing
* Conditional subsetting
* Array-array operations

### Pandas

---

* Pandas series
* DataFrame - creation, read from files
* Quick checking DataFrame
* Descriptive stats on DataFrame
* Indexing, slicing, conditional subsetting
* Operations on specific rows/columns

## Numpy array from a Python list
Numpy arrays behave like **true numerical vectors**, not ordinary lists. That's why they are used for all mathematical operations, machine learning algorithms, and as basis of Pandas DataFrame for data analytics.

In [36]:
import numpy as np
lst1=[1,2,3]
array1 = np.array(lst1)
print(lst1)

[1, 2, 3]


In [37]:
type(lst1)

list

In [38]:
type(array1)

numpy.ndarray

In [39]:
lst2=[10,11,12]
array2 = np.array(lst2)
print(array2)

[10 11 12]


In [40]:
print(f"Adding two lists {lst1} and {lst2} together: {lst1+lst2}")

Adding two lists [1, 2, 3] and [10, 11, 12] together: [1, 2, 3, 10, 11, 12]


In [41]:
print(f"Adding two numpy arrays {array1} and {array2} together: {array1+array2}")

Adding two numpy arrays [1 2 3] and [10 11 12] together: [11 13 15]


## Mathematical operations with/on Numpy arrays

In [42]:
print("array2 multiplied by array1: ",array1*array2)

array2 multiplied by array1:  [10 22 36]


In [43]:
print("array2 divided by array1: ",array2/array1)

array2 divided by array1:  [10.   5.5  4. ]


In [44]:
print("array2 raised to the power of array1: ",array2**array1)

array2 raised to the power of array1:  [  10  121 1728]


In [45]:
# sine function
print("Sine: ",np.sin(array1))

Sine:  [0.84147098 0.90929743 0.14112001]


In [46]:
# logarithm

print("Natural logarithm: ",np.log(array1))

Natural logarithm:  [0.         0.69314718 1.09861229]


In [47]:
print("Base-10 logarithm: ",np.log10(array1))

Base-10 logarithm:  [0.         0.30103    0.47712125]


In [48]:
print("Base-2 logarithm: ",np.log2(array1))

Base-2 logarithm:  [0.        1.        1.5849625]


In [49]:
# Exponential
print("Exponential: ",np.exp(array1))

Exponential:  [ 2.71828183  7.3890561  20.08553692]


## How to generate arrays easily?
* `np.zeros`
* `np.ones`
* `np.arange`
* `np.linspace`

In [50]:
print("A series of zeroes:",np.zeros(7))

A series of zeroes: [0. 0. 0. 0. 0. 0. 0.]


In [51]:
print("A series of ones:",np.ones(9))

A series of ones: [1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [52]:
print("A series of numbers:",np.arange(5,16))

A series of numbers: [ 5  6  7  8  9 10 11 12 13 14 15]


In [53]:
print("Numbers spaced apart by 2:",np.arange(0,11,2))

Numbers spaced apart by 2: [ 0  2  4  6  8 10]


In [54]:
print("Numbers spaced apart by float:",np.arange(0,11,2.5))

Numbers spaced apart by float: [ 0.   2.5  5.   7.5 10. ]


In [55]:
print("Every 5th number from 30 in reverse order: ",np.arange(30,-1,-5))

Every 5th number from 30 in reverse order:  [30 25 20 15 10  5  0]


In [56]:
print("11 linearly spaced numbers between 1 and 5: ",np.linspace(1,5,11))

11 linearly spaced numbers between 1 and 5:  [1.  1.4 1.8 2.2 2.6 3.  3.4 3.8 4.2 4.6 5. ]


## Multi-dimensional arrays

In [59]:
my_mat = [[1,2,3],[4,5,6],[7,8,9]]

print("\n", my_mat)

mat = np.array(my_mat)

print("\n",mat)

print("\n Type/Class of this object:",type(mat))

print("\n Here is the matrix\n----------\n",mat,"\n----------")


 [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

 [[1 2 3]
 [4 5 6]
 [7 8 9]]

 Type/Class of this object: <class 'numpy.ndarray'>

 Here is the matrix
----------
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 
----------


In [12]:
my_tuple = np.array([(1.5,2,3), (4,5,6)])

mat_tuple = np.array(my_tuple)

print (mat_tuple)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


## Dimension, shape, size, and data type of the 2D array

In [60]:
print("\n Dimension of this matrix: ",mat.ndim,sep='') 


 Dimension of this matrix: 2


In [61]:
print("\n Size of this matrix: ", mat.size,sep='') 


 Size of this matrix: 9


In [62]:
print("\n Shape of this matrix: ", mat.shape,sep='')


 Shape of this matrix: (3, 3)


In [63]:
print("\n Data type of this matrix: ", mat.dtype,sep='')


 Data type of this matrix: int32


## Zeros, Ones, Random, and Identity Matrices and Vectors

In [64]:
print("Vector of zeros: ",np.zeros(5))

Vector of zeros:  [0. 0. 0. 0. 0.]


In [68]:
print("\n Matrix of zeros: \n",np.zeros((3,4)))


 Matrix of zeros: 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [69]:
print("\n Vector of ones: ",np.ones(4))


 Vector of ones:  [1. 1. 1. 1.]


In [70]:
print("\n Matrix of ones: ",np.ones((4,2)))


 Matrix of ones:  [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


In [71]:
print("\n Matrix of 5’s: ",5*np.ones((3,3)))


 Matrix of 5’s:  [[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]


In [72]:
print("\n Identity matrix of dimension 2:",np.eye(2))


 Identity matrix of dimension 2: [[1. 0.]
 [0. 1.]]


In [76]:
print("\n Identity matrix of dimension 4:\n",np.eye(4))


 Identity matrix of dimension 4:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [77]:
print("\n Random matrix of shape (4,3):\n",np.random.randint(1,10,size=(4,3)))


 Random matrix of shape (4,3):
 [[9 5 1]
 [4 9 7]
 [4 8 5]
 [7 8 7]]


## Reshaping, Ravel, Min, Max, Sorting

In [78]:
a = np.random.randint(1,100,30)
print(a)

[46 58 55 92 72 15 45 23 43 94 74 68  8 96 53 76 98 30 85 12 32  3 59 59
 12 54 75 59 82 77]


In [79]:
b = a.reshape(2,3,5)
print("\n",b) 


 [[[46 58 55 92 72]
  [15 45 23 43 94]
  [74 68  8 96 53]]

 [[76 98 30 85 12]
  [32  3 59 59 12]
  [54 75 59 82 77]]]


In [87]:
c = a.reshape(1,3,10)
print("\n",c)


 [[[46 58 55 92 72 15 45 23 43 94]
  [74 68  8 96 53 76 98 30 85 12]
  [32  3 59 59 12 54 75 59 82 77]]]


In [88]:
print ("\n Shape of a:\n", a.shape) 

print ("\n Shape of b:\n", b.shape) 

print ("\n Shape of c:\n ", c.shape)


 Shape of a:
 (30,)

 Shape of b:
 (2, 3, 5)

 Shape of c:
  (1, 3, 10)


In [30]:
print("\na looks like:\n",a)

print("\nb looks like:\n",b)

print("\nc looks like:\n",c)


a looks like:
 [29  8 73 20 57 77  9 84 16 11 35 62 52 81 42 38 87 90 37 12 86 27 99 80
 57 84 62 74 71 74]

b looks like:
 [[[29  8 73 20 57]
  [77  9 84 16 11]
  [35 62 52 81 42]]

 [[38 87 90 37 12]
  [86 27 99 80 57]
  [84 62 74 71 74]]]

c looks like:
 [[29  8 73 20 57]
 [77  9 84 16 11]
 [35 62 52 81 42]
 [38 87 90 37 12]
 [86 27 99 80 57]
 [84 62 74 71 74]]


In [90]:
b_flat = b.ravel()

print(b_flat)

[46 58 55 92 72 15 45 23 43 94 74 68  8 96 53 76 98 30 85 12 32  3 59 59
 12 54 75 59 82 77]


## Indexing and slicing

In [91]:
arr = np.arange(0,11)

print("Array:",arr)

Array: [ 0  1  2  3  4  5  6  7  8  9 10]


In [92]:
print("\n Element at 7th index is:", arr[7])


 Element at 7th index is: 7


In [93]:
print("\n Elements from 3rd to 5th index are:", arr[3:6])


 Elements from 3rd to 5th index are: [3 4 5]


In [94]:
print("\n Elements up to 4th index are:", arr[:4])


 Elements up to 4th index are: [0 1 2 3]


In [96]:
print("\n Elements from last backwards are:", arr[-1::-1])


 Elements from last backwards are: [10  9  8  7  6  5  4  3  2  1  0]


In [97]:
print("\n 3 Elements from last backwards are:", arr[-1:-6:-2])


 3 Elements from last backwards are: [10  8  6]


In [98]:
arr2 = np.arange(0,21,2)

In [99]:
print("\n New array:",arr2)


 New array: [ 0  2  4  6  8 10 12 14 16 18 20]


In [100]:
print("\n Elements at 2nd, 4th, and 9th index are:", arr2[[2,4,9]]) # Pass a list as a index to subset


 Elements at 2nd, 4th, and 9th index are: [ 4  8 18]


In [103]:
mat = np.random.randint(10,20,15).reshape(3,5)

print("\n Matrix of random 2-digit numbers\n",mat)


 Matrix of random 2-digit numbers
 [[13 17 16 15 16]
 [11 11 11 18 12]
 [10 16 10 14 11]]


In [104]:
print("\nDouble bracket indexing\n")
print("\n Element in row index 1 and column index 2:", mat[1][2])


Double bracket indexing


 Element in row index 1 and column index 2: 11


In [105]:
print("\nSingle bracket with comma indexing\n")
print("\n Element in row index 1 and column index 2:", mat[1,2])
print("\n Row or column extract\n")


Single bracket with comma indexing


 Element in row index 1 and column index 2: 11

 Row or column extract



In [107]:
print("\n Matrix of random 2-digit numbers\n",mat)
print("\n Entire row at index 2:", mat[2])
print("\n Entire column at index 3:", mat[:,3])


 Matrix of random 2-digit numbers
 [[13 17 16 15 16]
 [11 11 11 18 12]
 [10 16 10 14 11]]

 Entire row at index 2: [10 16 10 14 11]

 Entire column at index 3: [15 18 14]


In [108]:
print("\n Matrix of random 2-digit numbers\n",mat)
print("\nSubsetting sub-matrices\n")
print("\n Matrix with row indices 1 and 2 and column indices 3 and 4\n", mat[1:3,3:5])
print("\n Matrix with row indices 0 and 1 and column indices 1 and 3\n", mat[0:2,[1,3]])


 Matrix of random 2-digit numbers
 [[13 17 16 15 16]
 [11 11 11 18 12]
 [10 16 10 14 11]]

Subsetting sub-matrices


 Matrix with row indices 1 and 2 and column indices 3 and 4
 [[18 12]
 [14 11]]

 Matrix with row indices 0 and 1 and column indices 1 and 3
 [[17 15]
 [11 18]]


## Conditional subsetting

In [None]:
mat = np.random.randint(10,100,15).reshape(3,5)

print("\n Matrix of random 2-digit numbers\n",mat)

print ("\n Elements greater than 50\n", mat[mat>50])

In [None]:
mat>50

In [None]:
mat*(mat>50)

## Array operations (array-array, array-scalar, universal functions)

In [None]:
mat1 = np.random.randint(1,10,9).reshape(3,3)

mat2 = np.random.randint(1,10,9).reshape(3,3)

print("\n1st Matrix of random single-digit numbers\n",mat1)
print("\n2nd Matrix of random single-digit numbers\n",mat2)

print("\nAddition\n", mat1+mat2)

print("\nMultiplication\n", mat1*mat2)

print("\nDivision\n", mat1/mat2)

print("\nLineaer combination: 3*A - 2*B\n", 3*mat1-2*mat2)

print("\nAddition of a scalar (100)\n", 100+mat1)

print("\nExponentiation, matrix cubed here\n", mat1**3)

print("\nExponentiation, sq-root using pow function\n",pow(mat1,0.5))