# UCL AI Society Machine Learning Tutorials
### Session 01. Introduction to Numpy, Pandas and Matplotlib

### Contents
1. Numpy
2. Pandas
3. Matplotlib
4. EDA(Exploratory Data Analysis)

### Aim
At the end of this session, you will be able to:
- Understand the basics of numpy.
- Understand the basics of pandas.
- Understand the basics of matplotlib.
- Perform an Exploratory Data Analysis (EDA).


## 1. Numpy
Python have been highlighted as a great programming language in the field of data science because it is easy to learn and supported by a number of scientific computing libraries. Numpy is one of the vital libraries that deals with mathematical computation and enables users to compute multi-dimensional data structure more efficiently and easier.

### 1.1 Basics of Numpy

In [1]:
# Run this shell if you didn't install numpy
!pip install numpy

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
import numpy as np
print(np.__version__)

1.19.2


In [2]:
a = np.array([1, 2, 3, 4]) # create a rank 1 array
print("Type of a: ", type(a))
print("Shape of a: ", a.shape)
print("The first element of a: ", a[0])
print("The last element of a: ", a[-1])

Type of a:  <class 'numpy.ndarray'>
Shape of a:  (4,)
The first element of a:  1
The last element of a:  4


### How do you initialise numpy arrays / matrix?

In [3]:
"""
TODO: Replace 'None's with appropriate answers
e.g) b = np.None((2, 2)) --> np.ones((2, 2))
"""

# create an array full of ones
b = np.ones((2, 2))
print("Matrix b")
print(b)

# create an array full of zeros
c = np.zeros((2, 3))
print("\nMatrix c")
print(c)

# create an identity matrix
d = np.eye(3)
print("\nMatrix d")
print(d)

# create an array filled with random numbers between 0 and 1
e = np.random.random((2, 2))
print("\nMatrix e")
print(e)

# create an array which has 0-9 as its element in sorted order
# expected output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
f = np.arange(10)
print("\nMatrix f")
print(f)

# create a matrix placeholder, without initializing entries (elements in the matrix).
g = np.empty((5, 3))
print("\nMatrix g")
print(g)


Matrix b
[[1. 1.]
 [1. 1.]]

Matrix c
[[0. 0. 0.]
 [0. 0. 0.]]

Matrix d
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Matrix e
[[0.52596048 0.49150044]
 [0.90222195 0.17024504]]

Matrix f
[0 1 2 3 4 5 6 7 8 9]

Matrix g
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### 1.2 Matrix Calculation

- `np.transpose()` : Transpose of an array
- `np.dot(a, b)` : Dot product of two arrays
- `np.linalg.inv()` : Inverse matrix of an array (only valid to square matrix, whose dimension is n * n)
- `np.diagonal()` : Diagonal components of an array
- `a.reshape(row = x, column = y)` : Reshape an array to the given dimension

In [4]:
x = np.array([
    [3, 11, 1],
    [7, 5, 2],
    [6, 8, 9],
    [0, 10, 4]
])

In [5]:
# To Do: Transpose an array
# Expected outcome:
# [[ 3  7  6  0]
#  [11  5  8 10]
#  [ 1  2  9  4]]
transposed = np.transpose(x)
print(transposed)

[[ 3  7  6  0]
 [11  5  8 10]
 [ 1  2  9  4]]


In [6]:
# To Do: Dot product of two arrays: original x and transposed x
# (4x3) dot (3x4) should give you (4x4)
# Expected outcome:
# [[131  78 115 114]
#  [ 78  78 100  58]
#  [115 100 181 116]
#  [114  58 116 116]]
y = np.dot(x, transposed)
print(y)

[[131  78 115 114]
 [ 78  78 100  58]
 [115 100 181 116]
 [114  58 116 116]]


In [7]:
# TODO: Do elementwise multiplication with 'broadcaster' and 'x_transposed'
# You will know what I meant by 'broadcast' once you check your result.

# Expected outcome for the varible 'elementwise_broadcasting':
# [[ 0  0  0  0]
#  [11  5  8 10]
#  [ 2  4 18  8]]

broadcaster = np.array([
    [0],
    [1],
    [2]
])
print("broadcaster: \n{}\n".format(broadcaster))
print("transposed: \n{}\n".format(transposed))

elementwise_broadcasting = broadcaster * transposed
print("broadcasted: \n{}".format(elementwise_broadcasting))

broadcaster: 
[[0]
 [1]
 [2]]

transposed: 
[[ 3  7  6  0]
 [11  5  8 10]
 [ 1  2  9  4]]

broadcasted: 
[[ 0  0  0  0]
 [11  5  8 10]
 [ 2  4 18  8]]


In [8]:
# To Do: Extract the diagonal elements of an array x
# Expected outcome: [3 5 9]
diagonal = np.diagonal(x)
print(diagonal)

[3 5 9]


In [9]:
# To Do: Reshape an array x to one that has 6 rows and 2 columns
# Expected outcome: 
# [[ 3 11]
#  [ 1  7]
#  [ 5  2]
#  [ 6  8]
#  [ 9  0]
#  [10  4]]
reshaped = x.reshape(6, 2)
print(reshaped)

[[ 3 11]
 [ 1  7]
 [ 5  2]
 [ 6  8]
 [ 9  0]
 [10  4]]


### 1.3 Statistics in Numpy

Most of them are self-explanatory.

- `np.sum()` : sum of all elements in an array
- `np.max()` : returns maximum element in an array
- `np.min()` : Minimum value of an array
- `np.mean()` : Mean of elements in an array
- `np.median()` : Median value among elements
- `np.var()` : Variation
- `np.std()` : Standard deviation

In [10]:
x = np.array(
    [34, 56, 6, 3, 9, 89, 120, 12, 201],
    dtype = np.int32
)

In [11]:
# To Do: Summation of elements 
# Expected outcome: 530
summation = np.sum(x)
print(summation)

530


In [12]:
# To Do: Minimum element in the array
# Expected outcome: 3
minimum = np.min(x)
print(minimum)

3


In [13]:
# To Do: Maximum element in the array
# Expected outcome: 201
maximum = np.max(x)
print(maximum)

201


In [14]:
# To Do: Average value of elements in the array
# Expected outcome: 58.89
mean = np.mean(x)
print(mean)

58.888888888888886


In [15]:
# To Do: Median element in the array
# Expected outcome: 34.0
median = np.median(x)
print(median)

34.0


In [16]:
# TO DO: Variation of x
# Expected outcome: 4008.098765432099
variation = np.var(x)
print(variation)

4008.098765432099


In [17]:
# To Do: Standard deviation of the array
# Expected outcome: 63.3095471902311
std = np.std(x)
print(std)

63.3095471902311


### 1.4 Exercise

In [18]:
x = np.array([
    [1, 52, 22, 2, 31, 65, 7, 8, 24, 10],
    [12, 2322, 33, 1, 2, 3, 99, 24, 1, 42],
    [623, 24, 3, 56, 5, 2, 7, 85, 22, 110],
    [63, 4, 3, 4, 5, 64, 7, 82, 3, 20],
    [48, 8, 3, 24, 57, 63, 7, 8, 9, 1032],
    [33, 64, 0, 24, 5, 6, 72, 832, 3, 10],
    [12, 242, 2, 11, 52, 63, 32, 8, 96, 2],
    [13, 223, 52, 4, 35, 62, 7, 8, 9, 10],
    [19, 2, 3, 149, 15, 6, 172, 2, 2, 11],
    [34, 23, 32, 24, 54, 63, 1, 5, 92, 7]
])

In [19]:
x.shape

(10, 10)

In [20]:
# To Do: Extract the first column of x
# expected outcome: [1 12 623 63 48 33 12 13 19 34]
firstcol_x = x[:, 0]
print(firstcol_x)

[  1  12 623  63  48  33  12  13  19  34]


In [21]:
# To Do: extract the last row of x
# expected outcome: [34 23 32 24 54 63 1 5 92 7]
lastrow_x = x[-1, :]
print(lastrow_x)

[34 23 32 24 54 63  1  5 92  7]


In [22]:
# To Do: calculate the mean of elements in the last row
# expected outcome: 33.5
mean_lastrow = np.mean(lastrow_x)
print(mean_lastrow)

33.5


In [23]:
# To Do : calculate the diagonal components of x
# expected outcome: [1 2322 3 4 57 6 32 8 2 7]
diag_x = np.diagonal(x)
print(diag_x)

[   1 2322    3    4   57    6   32    8    2    7]


In [24]:
# To Do: calculate the variatoin of the Diagonal components of x
# expected outcome: 479979.9600000001
var_diag = np.var(diag_x)
print(var_diag)

479979.9600000001


### 1.5 One more Numpy Problem. (Optional)

In [25]:
def solution():
    prime = [2, 3, 5, 7, 11]

    matrix = [
        # TODO: Try making your own matrix by using the list 'prime'
        # TODO: Try doing sth awesome. Don't just write tons of numbers
                [x * prime[0] for x in range(1, 6)],
                [x * prime[1] for x in range(1, 6)],
                [x * prime[2] for x in range(1, 6)],
                [x * prime[3] for x in range(1, 6)],
                [x * prime[4] for x in range(1, 6)]
              ]
    # TODO: make it as a numpy array
    matrix = np.array(matrix)
    # TODO: What is Diagonal of the above matrix?
    matrix_dia = np.diagonal(matrix)

    # TODO: What are the sum and mean of the diagonal components?
    dia_sum = np.sum(matrix_dia)
    dia_mean = np.mean(matrix_dia)

    return matrix, dia_sum, dia_mean

In [26]:
# Function for printing your answers
def print_answer(**kwargs):
    for key in kwargs.keys():
        print(key, ":", kwargs[key])

In [27]:
matrix, dia_sum, dia_mean = solution()

In [28]:
print_answer(matrix=matrix, dia_sum=dia_sum, dia_mean=dia_mean)

matrix : [[ 2  4  6  8 10]
 [ 3  6  9 12 15]
 [ 5 10 15 20 25]
 [ 7 14 21 28 35]
 [11 22 33 44 55]]
dia_sum : 106
dia_mean : 21.2


### What to do next?
Helpful websites for your further study on numpy:
- [A Visual Intro to NumPy and Data Representation](https://jalammar.github.io/visual-numpy/?fbclid=IwAR2MT-imY4dKpUcfHWfjdPOROUBadObVO7Wftf1detHWZCxSwNeA5paVI08)
- [Stanford CS231n Python Numpy Tutorial](http://cs231n.github.io/python-numpy-tutorial/)
- [DataCamp Python Numpy Array Tutorial](https://www.datacamp.com/community/tutorials/python-numpy-tutorial)
- [Machine Learning Plus 101 Numpy Exercises for Data Analysis (Python)](https://www.machinelearningplus.com/python/101-numpy-exercises-python/)