# YME: Data Science with Machine Learning
Introduction

---
## The NumPy Library
<div>
<img src="images/numpy_logo.png" width="500px" align = "left"/>
</div>

#### __NumPy is the fundamental package for scientific computing with Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. It features:__

- flexibility with N-dimensional arrays
- broadcasting functions
- linear algebra, operations on data etc..

<div>
<img src="images/numpy_array_t.png" width="500px" align = "left"/>
</div>

#### __An array is an N-dimensional structure that represents any regular data. The dimension in NumPy is called "axis" and it starts from 0.__
#### For instance, in a 2-dimensional array, axis=0 indicates the rows and axis=1 indicates the columns.


In [2]:
import numpy as np  # Python library for scientific numerical computation

---
## Array Creation in NumPy
#### __List of NumPy methods__
| Function    |          Description              |
| --------- | --------------------------------- |
| np.zeros   |   n-D array of zeros                |
| np.ones |   n-D array of ones |
| np.full 	  |   n-D array of constant value      |
| np.eye |   Identity matrix of a specific size |

#### __Create a 1-D array and explicitly inputing values__

In [32]:
input_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
v1 = np.array(input_list, dtype=float)
# v1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype=float)
print(v1)

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]


#### __Create a 1-D array with linearly spaced values__

In [4]:
v2 = np.linspace(start=0, stop=1, num=11, dtype=float)
# v2 = np.linspace(0, 1, 11) # This syntax has the same outcome
print(v2)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### __Create a 1-D array with random values__

In [10]:
v3 = np.random.randn(11)
print(v3)

[ 0.9059037   0.96559971 -0.82803761  1.16395473  0.6499133   0.76129816
  0.08431962  0.61118338 -0.44014893  1.8163801  -1.06633657]


In [35]:
help(np.random.randn)

Help on built-in function randn:

randn(...) method of numpy.random.mtrand.RandomState instance
    randn(d0, d1, ..., dn)
    
    Return a sample (or samples) from the "standard normal" distribution.
    
    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `numpy.random.standard_normal`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.
    
    If positive int_like arguments are provided, `randn` generates an array
    of shape ``(d0, d1, ..., dn)``, filled
    with random floats sampled from a univariate "normal" (Gaussian)
    distribution of mean 0 and variance 1. A single float randomly sampled
    from the distribution is returned if no argument is provided.
    
    Parameters
    ----------
    d0, d1, ..., dn : int, optional
        The dimensions of the returned array, must be non-negative.
        If no argu

---
## Array Manipulation in NumPy
#### __NumPy array arithmetic methods__
```python 
np.add(X1, X2) # Two arrays to be arithmetically computed
```

| Function    |          Description              |
| --------- | --------------------------------- |
| `np.add` |   Add two arrays |
| `np.subtract` |   Subtract two arrays   |
| `np.multiply` |   Multiply two arrays |
| `np.dot` |   Dot product two arrays   |
| `np.cross` |   Cross product two arrays |
| `np.divide` |   Divide two arrays   |
| `np.sin` |   Apply sine to the array      |
| `np.cos`    |   Apply cosine to the array   |
| `np.exp` | Apply exponential to the array |
| `np.log` | Apply logarithm to the array |

#### __Create 2-D arrays with random integers__

In [30]:
X1 = np.random.randint(low=1, high=5, size=(3, 3)) # Create a 3x3 array with random integers from 1 to 5
X2 = np.random.randint(low=1, high=5, size=(3, 3)) # Create a 3x3 array with random integers from 1 to 5
print('Array 1:\n', X1, '\n')
print('Array 2:\n', X2, '\n')

Array 1:
 [[4 2 4]
 [4 2 1]
 [3 1 3]] 

Array 2:
 [[1 1 3]
 [3 4 1]
 [1 2 2]] 



#### __Array Mathematical Operations__

In [31]:
print('Addition:\n', np.add(X1, X2), '\n')
print('Subtraction:\n', np.subtract(X1, X2), '\n')
print('Multiplication:\n', np.multiply(X1, X2), '\n')
print('Dot Product:\n', np.dot(X1, X2), '\n')
print('Cross Product:\n', np.cross(X1, X2), '\n')
print('Division:\n', np.divide(X1, X2), '\n')
print('Reciprocal:\n', np.reciprocal(X1), '\n')

Addition:
 [[5 3 7]
 [7 6 2]
 [4 3 5]] 

Subtraction:
 [[ 3  1  1]
 [ 1 -2  0]
 [ 2 -1  1]] 

Multiplication:
 [[ 4  2 12]
 [12  8  1]
 [ 3  2  6]] 

Dot Product:
 [[14 20 22]
 [11 14 16]
 [ 9 13 16]] 

Cross Product:
 [[ 2 -8  2]
 [-2 -1 10]
 [-4 -3  5]] 

Division:
 [[4.         2.         1.33333333]
 [1.33333333 0.5        1.        ]
 [3.         0.5        1.5       ]] 

Reciprocal:
 [[0 0 0]
 [0 0 1]
 [0 1 0]] 



In [40]:
my_list = [[1,1,1], [2,2,2], [3,3,3]]
my_array = np.array(my_list)


[[1 1 1]
 [2 2 2]
 [3 3 3]]


---
## Array Inspection in NumPy

#### __NumPy array inspection methods__
```python 
my_list = [[1,1,1], [2,2,2], [3,3,3]]
a = np.array(my_list)
a.shape # Call the NumPy method directly on array of interest
```

| Function    |          Description              |
| --------- | --------------------------------- |
| `a.shape` |   Array dimensions |
| `len(a)` |   Length of array |
| `a.ndim` |   Number of array dimensions |
| `a.size` |   Number of array elements |
| `a.dtype` |   Data type of array elements |
| `a.dtype.name` |   Name of data type |
| `a.astype(int)` |   Convert an array to a different type |



#### __Determine size of array__

In [48]:
my_list = [[1,1,1], [2,2,2], [3,3,3]]
a = np.array(my_list)
print(a, '\n')

print('Array shape:', a.shape) # Dimensions of array
print('Array length:', len(a)) # Length of 3 (width of array only)
print('Array no. of dimensions:', a.ndim) # 2-dimensional array
print('Array no. of elements:', a.size) # Size of 9 elements
print('Array element datatype:', a.dtype) # Data type of array elements
print('Array element datatype:', a.dtype.name) # Data type of array elements
print('Array datatype conversion:\n', a.astype(float)) # Convert array elements data types to float

[[1 1 1]
 [2 2 2]
 [3 3 3]] 

Array shape: (3, 3)
Array length: 3
Array no. of dimensions: 2
Array no. of elements: 9
Array element datatype: int64
Array element datatype: int64
Array datatype conversion:
 [[1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


---
## The Pandas Library
<div>
<img src="images/pandas_logo.png" width="500px" align = "left"/>
</div>
Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is fully compatible with the NumPy package.

Instead of axis Pandas uses "index" and "column" to describe the dimensions

<div>
<img src="images/pandas_basic.png" width="500px" align = "left"/>
</div>

---
## The Matplotlib Library

<div>
<img src="images/matplotlib_logo.svg" width="500px" align = "left"/>
</div>

Matplotlib is a visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays. 

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc.

Pyplot is a Matplotlib module which provides a MATLAB-like interface, with the advantage of being free and open-source. 
Some useful pyplot functions:

| Function    |          Description              |
| ----------- | --------------------------------- |
| plt.axhline |   Horizontal line across the axis |
| plt.axvline |   Vertical line across the axis   |
| plt.boxplot |   Box and whisker plot            |
| plt.plot    |   Simple line plot                |
| plt.hist    |   Histogram plot                  |
| plt.scatter |   Scatter plot of y vs. x with varying marker size and/or color |
| plt.xlim 	  |   Get or set the x limits         |
| plt.xticks  |   Get or set the current tick locations and labels of the x-axis |
| plt.ylabel  |   Set the label for the y-axis |
| plt.ylim    |	  Get or set the y limits      |
| plt.yticks  |   Get or set the current tick locations and labels of the y-axis |
| plt.title   |   Set the title of the plot    |

---
## The Scikit-Learn Library

<div>
<img src="images/scikit-learn_logo.png" width="500px" align = "left"/>
</div>

This is a Scikit-Learn library.