# Packages in Python for Machine Learning

### Packages:
    
    1) Numpy  => Numeric array, high performance & learning process - quick
    
    2) Pandas => Pandas Dataframe
    
    3) Scipy => Complex Statistical Calculations
    
    4) Scikit => Algorithm implementation
    
    5) Matplotlib => Data Visualization
    
    6) Seaborn => Data Visualization

## Packages and Modules: 

### Modules: 

- Modules in Python are simply Python files with a .py extension 

- The name of the module will be the name of the file.  

- A Python module can have a set of functions, classes or variables defined and implemented. 

- These are prebuilt. 

### Example: 

Module color (color.py) 

Function red() 

Function blue() 

Function green() 

### We will see how to import the module and use it. 

import color 

color.red() 

color.green() 

OR  

from color import red, blue 

from color import * 

### Packages: 

- Packages are namespaces which contain multiple packages and module themselves. They are simply directories. 

- We create a directory “drawing” include module in it: color, line, rectangle, square and circle 

- To use line module from drawing package 

import drawing.line 

from drawing import circle 

import matplotlib.pyplot as plt 

from matplotlib import pyplot as plt2 

### To Install a new packages: 

conda install <package_name> 

Or  

pip install <package_name> 

### Importing the Numpy Array

In [2]:
import numpy as np

In [3]:
arr = np.array([11,22,33,44,55,66,77,88,99])
arr

array([11, 22, 33, 44, 55, 66, 77, 88, 99])

In [4]:
print("Value is array: ", arr)
print("Datatype: ", type(arr))
print("Depth / Dimension: ", arr.ndim)
print("Size of each dimension: ", arr.shape)
print("Value at index 5: ", arr[5])

Value is array:  [11 22 33 44 55 66 77 88 99]
Datatype:  <class 'numpy.ndarray'>
Depth / Dimension:  1
Size of each dimension:  (9,)
Value at index 5:  66


### Data retrival is something you can perform with numpy array

### Slicing or traversing the array: 

Slicing or traversing the array: 

Syntax: letters[start: stop: step] 

Positive direction: letters[start: stop: 1] 

Negative direction: letters[start: stop: -1] 

In [10]:
letters  = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
letters

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

In [11]:
letter_np = np.array(letters)
letter_np

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [12]:
letter_np[2:7]

array(['C', 'D', 'E', 'F', 'G'], dtype='<U1')

In [13]:
letter_np[-8:-3]

array(['C', 'D', 'E', 'F', 'G'], dtype='<U1')

In [15]:
letter_np[6:2:-1]

array(['G', 'F', 'E', 'D'], dtype='<U1')

In [16]:
letter_np[-3:-9:-1]

array(['H', 'G', 'F', 'E', 'D', 'C'], dtype='<U1')

In [17]:
letter_np[3:-2:1]

array(['D', 'E', 'F', 'G', 'H'], dtype='<U1')

In [18]:
# Get values from index 2 to end of array
letter_np[2:] # Start: 2 End: end of list step: 1

array(['C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [19]:
# Get values from index 0 to 7th index
letter_np[:7] # Start: 0 End: 7 step: 1 (End is not included)

array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='<U1')

In [20]:
# Get values from index 2 to 7 index
letter_np[2:7] # Start: 2 End: 7 step: 1 

array(['C', 'D', 'E', 'F', 'G'], dtype='<U1')

In [21]:
# Get all values
letter_np[:]

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [22]:
# Get all values
letter_np[::]

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [23]:
# Get all values in reverse order
letter_np[::-1]

array(['J', 'I', 'H', 'G', 'F', 'E', 'D', 'C', 'B', 'A'], dtype='<U1')

In [24]:
letter_np[::3] # Start: 0 End: end of list step: 1

array(['A', 'D', 'G', 'J'], dtype='<U1')

In [34]:
# Everything except last
letter_np[0:-1]  # Start: 0 End: end of list step: 1

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'], dtype='<U1')

In [35]:
letter_np[:-1] 

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'], dtype='<U1')

In [33]:
# Last 5 elements in the array
letter_np[:4:-1]

array(['J', 'I', 'H', 'G', 'F'], dtype='<U1')

In [31]:
letter_np[-5:]

array(['F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [36]:
# Reverse order
letter_np[::-1] 

array(['J', 'I', 'H', 'G', 'F', 'E', 'D', 'C', 'B', 'A'], dtype='<U1')

In [37]:
# Error: IndexError: index 20 is out of bounds for axis 0 with size 10
letter_np[20]

IndexError: index 20 is out of bounds for axis 0 with size 10

In [39]:
letter_np[0:20] # using slicing, it will ignore the index error

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [40]:
letter_np[20:]

array([], dtype='<U1')

In [42]:
letter_np[0:-6:1]

array(['A', 'B', 'C', 'D'], dtype='<U1')

In [44]:
letter_np[0:20:1]

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1')

In [45]:
letter_np[0:20:-1]

array([], dtype='<U1')

In [47]:
x = 1
y = 2
letter_np[x:x+y]

array(['B', 'C'], dtype='<U1')

In [48]:
letter_np[0:len(letter_np)-3]

array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='<U1')

### Arrays can manage only one datatype

#### Integer array

In [49]:
mArr = np.array([11,2,2,34,5])
mArr

array([11,  2,  2, 34,  5])

#### All integer are promoted to the float

In [56]:
mArr = np.array([11,2,2,34,5,45,64.45,34.68])
mArr

array([11.  ,  2.  ,  2.  , 34.  ,  5.  , 45.  , 64.45, 34.68])

#### Boolean --> Integer

In [57]:
mArr = np.array([11,2,2,34,5, True, False, True])
mArr

array([11,  2,  2, 34,  5,  1,  0,  1])

In [58]:
mArr = np.array([11,2,2,34,5, True, False, True])
mArr

array([11,  2,  2, 34,  5,  1,  0,  1])

#### Boolean --> Integer  --> Float

In [59]:
mArr = np.array([11,2,2,34,5, True, False, True, 2.3])
mArr

array([11. ,  2. ,  2. , 34. ,  5. ,  1. ,  0. ,  1. ,  2.3])

#### Boolean --> Integer  --> Float --> String

In [61]:
mArr = np.array([11,2,2,34,5, True, False, True, 2.3,"vishali"])
mArr

array(['11', '2', '2', '34', '5', 'True', 'False', 'True', '2.3',
       'vishali'], dtype='<U32')

### List:

- Data retrival - different as we need to look into multiple paritions.

- It has got loop overheads

### Numpy array:

- Parallel process like a multi thread application

- Faster

- List can't go ahead and achieve it

- Python utilizer uses c++ engine as internal engine. so it is much faster than regulart python interpreter.

- It won't cost you on the loop overheads

- Implementation in base language - C++

- Vector operations

The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

Numpy data structures perform better in: Size - Numpy data structures take up less space. Performance - they have a need for speed and are faster than lists.

In [62]:
mList = [11,22,33,44,55,66,77,88,99]
for i in range(len(mList)):
    print(mList[i]*10)

110
220
330
440
550
660
770
880
990


In [67]:
mArr = [11,22,33,44,55,66,77,88,99]

In [68]:
mArr = np.array(mArr)
mArr

array([11, 22, 33, 44, 55, 66, 77, 88, 99])

In [69]:
mArr + 100

array([111, 122, 133, 144, 155, 166, 177, 188, 199])

In [70]:
mArr * 100

array([1100, 2200, 3300, 4400, 5500, 6600, 7700, 8800, 9900])

In [71]:
mArr / 100

array([0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, 0.99])

In [77]:
mArr = [11,22,33,44,55,66,77,88,99]
mArr = np.array(mArr)
mArr > 50

array([False, False, False, False,  True,  True,  True,  True,  True])

In [79]:
mArr[mArr > 50]

array([55, 66, 77, 88, 99])

In [80]:
mArr[mArr > 50]*100

array([5500, 6600, 7700, 8800, 9900])

In [76]:
# index of values in array satisfying the conditions
np.where(mArr > 50)

(array([4, 5, 6, 7, 8], dtype=int64),)

In [81]:
mArr[-3:]*100

array([7700, 8800, 9900])

In [82]:
mArr

array([11, 22, 33, 44, 55, 66, 77, 88, 99])

In [84]:
mArr[-3:] = mArr[-3:]*100
mArr

array([    11,     22,     33,     44,     55,     66, 770000, 880000,
       990000])

In [86]:
mArr = mArr[-3:]
mArr

array([770000, 880000, 990000])

In [94]:
mArr = [11,22,33,44,55,66,77,88,99.0]
mArr = np.array(mArr)
mArr[mArr > 50] = mArr[mArr > 50]/10.0
mArr

array([11. , 22. , 33. , 44. ,  5.5,  6.6,  7.7,  8.8,  9.9])

In [96]:
np.linspace(1,10,num=10)
# Return the value in this range

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [97]:
np.linspace(2,5,num=10)
# Return the value in this range

array([2.        , 2.33333333, 2.66666667, 3.        , 3.33333333,
       3.66666667, 4.        , 4.33333333, 4.66666667, 5.        ])

In [101]:
arr1 = np.round(np.linspace(5, 20, num=10),2)
arr2 = np.round(np.linspace(10, 25, num=10),2)

In [102]:
arr1

array([ 5.  ,  6.67,  8.33, 10.  , 11.67, 13.33, 15.  , 16.67, 18.33,
       20.  ])

In [103]:
arr2

array([10.  , 11.67, 13.33, 15.  , 16.67, 18.33, 20.  , 21.67, 23.33,
       25.  ])

In [111]:
arr1.shape

(10,)

In [112]:
arr2.shape

(10,)

In [109]:
arr1.shape == arr2.shape

True

In [110]:
# Index to index operations
arr1 + arr2

array([15.  , 18.34, 21.66, 25.  , 28.34, 31.66, 35.  , 38.34, 41.66,
       45.  ])

In [113]:
# Power of - operations
arr1**arr2

array([9.76562500e+06, 4.14522856e+09, 1.87153464e+12, 1.00000000e+15,
       6.13862594e+17, 4.15084617e+20, 3.32525673e+23, 3.01539665e+26,
       2.94835862e+29, 3.35544320e+32])

In [114]:
# Multiplication operations
arr1*arr2

array([ 50.    ,  77.8389, 111.0389, 150.    , 194.5389, 244.3389,
       300.    , 361.2389, 427.6389, 500.    ])

### 2D arrays/ Matrix

In [116]:
arr = np.arange(10,20,0.1)
arr

array([10. , 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11. ,
       11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12. , 12.1,
       12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13. , 13.1, 13.2,
       13.3, 13.4, 13.5, 13.6, 13.7, 13.8, 13.9, 14. , 14.1, 14.2, 14.3,
       14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15. , 15.1, 15.2, 15.3, 15.4,
       15.5, 15.6, 15.7, 15.8, 15.9, 16. , 16.1, 16.2, 16.3, 16.4, 16.5,
       16.6, 16.7, 16.8, 16.9, 17. , 17.1, 17.2, 17.3, 17.4, 17.5, 17.6,
       17.7, 17.8, 17.9, 18. , 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7,
       18.8, 18.9, 19. , 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7, 19.8,
       19.9])

In [117]:
arr = np.arange(20)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

newShape = (row,col) = row*col = 20

Why 20? Since we have 20 elements.

In [118]:
# Reshape it to 2 Dimension
m1 = np.reshape(arr, (5,4))
m1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [122]:
m1.shape

(5, 4)

In [119]:
m2 = np.array([[0,4,8,12],
               [16,1,5,9],
               [13,17,2,6],
               [10,14,18,3],
               [7,11,15,19]])
m2

array([[ 0,  4,  8, 12],
       [16,  1,  5,  9],
       [13, 17,  2,  6],
       [10, 14, 18,  3],
       [ 7, 11, 15, 19]])

In [120]:
m2.shape

(5, 4)

In [121]:
m1*m2

array([[  0,   4,  16,  36],
       [ 64,   5,  30,  63],
       [104, 153,  20,  66],
       [120, 182, 252,  45],
       [112, 187, 270, 361]])

In [123]:
m1[3] # Third row

array([12, 13, 14, 15])

In [124]:
m1[3,2] # Specific value from the matrix

14

In [126]:
m1[2:4] # Specific rows from the matrix

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [127]:
m1[1:4,0:2] 

array([[ 4,  5],
       [ 8,  9],
       [12, 13]])