# Neoscholar Machine Learning Tutorials
### Session 01. Introduction to Numpy, Pandas, and Matplotlib

### Contents
1. Numpy
2. Pandas
3. Matplotlib
4. EDA(Exploratory Data Analysis)

### Aim
At the end of this session, you will be able to:
- Understand the basics of numpy.
- Understand the basics of pandas.
- Understand the basics of matplotlib.
- Perform an Exploratory Data Analysis (EDA).

## 1. Numpy
Python have been highlighted as a great programming language in the field of data science because it is easy to learn and supported by a number of scientific computing libraries. Numpy is one of the vital libraries that deals with mathematical computation and enables users to compute multi-dimensional data structure more efficiently and easier.

### 1.1 Basics of Numpy

In [None]:
# Run this shell if you didn't install numpy
!pip install numpy

In [None]:
import numpy as np
print(np.__version__)

In [None]:
a = np.array([1, 2, 3, 4]) # create a rank 1 array
print("Type of a: ", type(a))
print("Shape of a: ", a.shape)
print("The first element of a: ", a[0])
print("The last element of a: ", a[-1])

### How do you initialise numpy arrays / matrix?

In [None]:
"""
TODO: Replace 'None's with appropriate answers
e.g) b = np.None((2, 2)) --> np.ones((2, 2))
"""

# create a matrix full of ones
b = np.None((2, 2))
print("Matrix b")
print(b)

# create a matrix full of zeros
c = np.None((2, 3))
print("\nMatrix c")
print(c)

# create an identity matrix
d = np.None(3)
print("\nMatrix d")
print(d)

# create a matrix filled with random numbers between 0 and 1
e = np.None((2, 2))
print("\nMatrix e")
print(e)

# create an array which has 0-9 as its element in sorted order
# expected output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
f = np.None(10)
print("\nMatrix f")
print(f)

# create a matrix placeholder, without initializing entries (elements in the matrix).
g = np.None((5, 3))
print("\nMatrix g")
print(g)


### 1.2 Matrix Calculation

- `np.transpose()` : Transpose of an array
- `np.dot(a, b)` : Dot product of two arrays
- `np.linalg.inv()` : Inverse matrix of an array (only valid to square matrix, whose dimension is n * n)
- `np.diagonal()` : Diagonal components of an array
- `a.reshape(row = x, column = y)` : Reshape an array to the given dimension

In [None]:
x = np.array([
    [3, 11, 1],
    [7, 5, 2],
    [6, 8, 9],
    [0, 10, 4]
])

In [None]:
# To Do: Transpose an array

# Expected outcome:
# [[ 3  7  6  0]
#  [11  5  8 10]
#  [ 1  2  9  4]]
transposed = None
transposed

In [None]:
# To Do: Dot product of two arrays: original x and x_transposed
# (4x3) dot (3x4) should give you (4x4)

# Expected outcome:
# [[131  78 115 114]
#  [ 78  78 100  58]
#  [115 100 181 116]
#  [114  58 116 116]]
y = None
y

In [None]:
# TODO: Do elementwise multiplication with 'broadcaster' and 'x_transposed'
# You will know what I meant by 'broadcast' once you check your result.

# Expected outcome for the varible 'elementwise_broadcasting':
# [[ 0  0  0  0]
#  [11  5  8 10]
#  [ 2  4 18  8]]

broadcaster = np.array([
    [0],
    [1],
    [2]
])
print("broadcaster: \n{}\n".format(broadcaster))
print("transposed: \n{}\n".format(transposed))

elementwise_broadcasting = None
print("broadcasted: \n{}".format(elementwise_broadcasting))

In [None]:
# To Do: Extract the diagonal elements of an array x
# Expected outcome: [3 5 9]
diagonal = None
print(diagonal)

In [None]:
# To Do: Reshape an array x to one that has 6 rows and 2 columns
# Expected outcome: 
# [[ 3 11]
#  [ 1  7]
#  [ 5  2]
#  [ 6  8]
#  [ 9  0]
#  [10  4]]
reshaped = x.None(6, 2)
print(reshaped)

### 1.3 Statistics in Numpy

Most of them are self-explanatory.

- `np.sum()` : sum of all elements in an array
- `np.max()` : returns maximum element in an array
- `np.min()` : Minimum value of an array
- `np.mean()` : Mean of elements in an array
- `np.median()` : Median value among elements
- `np.var()` : Variation
- `np.std()` : Standard deviation

In [None]:
x = np.array(
    [34, 56, 6, 3, 9, 89, 120, 12, 201],
    dtype = np.int32
)

In [None]:
# To Do: Summation of elements 
# Expected outcome: 530
summation = None
print(summation)

In [None]:
# To Do: Minimum element in the array
# Expected outcome: 3
minimum = None
print(minimum)

In [None]:
# To Do: Maximum element in the array
# Expected outcome: 201
maximum = None
print(maximum)

In [None]:
# To Do: Average value of elements in the array
# Expected outcome: 58.89
mean = None
print(mean)

In [None]:
# To Do: Median element in the array
# Expected outcome: 34.0
median = None
print(median)

In [None]:
# TO DO: Variation of x
# Expected outcome: 4008.098765432099
variation = None
print(variation)

In [None]:
# To Do: Standard deviation of the array
# Expected outcome: 63.3095471902311
std = None
print(std)

### 1.4 Exercise

In [None]:
x = np.array([
    [1, 52, 22, 2, 31, 65, 7, 8, 24, 10],
    [12, 2322, 33, 1, 2, 3, 99, 24, 1, 42],
    [623, 24, 3, 56, 5, 2, 7, 85, 22, 110],
    [63, 4, 3, 4, 5, 64, 7, 82, 3, 20],
    [48, 8, 3, 24, 57, 63, 7, 8, 9, 1032],
    [33, 64, 0, 24, 5, 6, 72, 832, 3, 10],
    [12, 242, 2, 11, 52, 63, 32, 8, 96, 2],
    [13, 223, 52, 4, 35, 62, 7, 8, 9, 10],
    [19, 2, 3, 149, 15, 6, 172, 2, 2, 11],
    [34, 23, 32, 24, 54, 63, 1, 5, 92, 7]
])

In [None]:
x.shape

In [None]:
# To Do: Extract the first column of x
# expected outcome: [1 12 623 63 48 33 12 13 19 34]
firstcol_x = None
print(firstcol_x)

In [None]:
# To Do: extract the last row of x
# expected outcome: [34 23 32 24 54 63 1 5 92 7]
lastrow_x = None
print(lastrow_x)

In [None]:
# To Do: calculate the mean of elements in the last row
# expected outcome: 33.5
mean_lastrow = None
print(mean_lastrow)

In [None]:
# To Do : calculate the diagonal components of x
# expected outcome: [1 2322 3 4 57 6 32 8 2 7]
diag_x = None
print(diag_x)

In [None]:
# To Do: calculate the variatoin of the Diagonal components of x
# expected outcome: 479979.9600000001
var_diag = None
print(var_diag)

### 1.5 One more Numpy Problem. (Optional)

In [None]:
def solution():
    prime = [2, 3, 5, 7, 11]

    matrix = [
        # TODO: Try making your own matrix by using the list above, named 'prime'
        # TODO: Try doing sth awesome. Don't just write tons of numbers.
        # Hint: There's no right answer how you do this, but why don't you try out List Comprehension?
                [None],
                [None],
                [None],
                [None],
                [None]
              ]
    # TODO: make it as a numpy array
    matrix = None
    # TODO: What is Diagonal of the above matrix?
    matrix_dia = None

    # TODO: What are the sum and mean of the diagonal components?
    dia_sum = None
    dia_mean = None

    return matrix, dia_sum, dia_mean

In [None]:
# Function for printing your answers
# Do not change this function.
def print_answer(**kwargs):
    for key in kwargs.keys():
        print(key, ":", kwargs[key])

In [None]:
matrix, dia_sum, dia_mean = solution()

In [None]:
print_answer(matrix=matrix, dia_sum=dia_sum, dia_mean=dia_mean)

### What to do next?
Helpful websites for your further study on numpy:
- [A Visual Intro to NumPy and Data Representation](https://jalammar.github.io/visual-numpy/?fbclid=IwAR2MT-imY4dKpUcfHWfjdPOROUBadObVO7Wftf1detHWZCxSwNeA5paVI08)
- [Stanford CS231n Python Numpy Tutorial](http://cs231n.github.io/python-numpy-tutorial/)
- [DataCamp Python Numpy Array Tutorial](https://www.datacamp.com/community/tutorials/python-numpy-tutorial)
- [Machine Learning Plus 101 Numpy Exercises for Data Analysis (Python)](https://www.machinelearningplus.com/python/101-numpy-exercises-python/)