<a href="https://colab.research.google.com/github/SouraVamseekar/demo-repository/blob/main/comp_1801_w2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine learning libraries using Python
- COMP 1801 IT lab: 08 Oct 2021 Part 1


## Import libraries: Do not forget!

In [None]:
# Import NumPy, which can deal with multi-dimensional arrays such as matrix intuitively.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()  # for plot styling
import sklearn.linear_model, sklearn.datasets


## Vectors: notation and `np.ndarray`.
- $\mathbb{N}$: the set of integers.
- $\mathbb{R}$: the set of real numbers.
- $\mathbb{R}^{m}$: the set of $m$-dimensional real vectors.
- Bold lower case (e.g. $\Vec{v}$): a column vector
$$
\Vec{v} 
= 
\begin{bmatrix}
v_{0} \\
v_{1} \\
\vdots \\
v_{m-1} \\
\end{bmatrix}
\in \mathbb{R}^{m}.
$$
  - We always represent a row vector as the transpose of the column vector, e.g. $\Vec{v}^{\top}$.
  - A row or column vector (e.g. $\Vec{v}$ or $\Vec{v}^{\top}$) is usually represented by an 1D `np.ndarray` (e.g. `v`) in NumPy.
    - Note: an 1D `np.ndarray` does not distinguish row and column vectors. 
      - To distinguish them (e.g. matrix multiplication), we use an $m \times 1$ 2D `np.ndarray` (an $m$-dimensional column vector) or $1 \times n$ 2D `np.ndarray`  (an $n$-dimensional row vector). 
  - `v.shape`: the shape of `v`. If `v` represents a vector with the size of $m$ ($\Mat{v} \in \mathbb{R}^{m}$), then `v.shape == (m, )`.
  - `v[i]`: the `i`-th element of `v`.
$$
    \texttt{v} 
    \texttt{==}
    \texttt{[v[0], v[1], ..., v[m-1]]} \\
    =
    \begin{bmatrix}
    v_{0} \\
    v_{1} \\
    \vdots \\
    v_{m-1} \\
    \end{bmatrix}
    \in \mathbb{R}^{m}.
$$

### Example of accessing elements of 1D `np.ndarray`.

In [None]:
v = np.arange(8) ** 2
print('v =', v)
print('v[0] =', v[0])
print('v[1] =', v[1])
print('v[4] =', v[4])
print('v[7] =', v[7])

## Matrices: notation and `np.ndarray`.
- $\mathbb{R}^{m, n}$: the set of real matrices with the size of $m \times n$.
- Bold upper case (e.g. $\Mat{A}$): a matrix
$$
\Mat{A} 
= \begin{bmatrix}
a_{0,0} & a_{0,1} & \cdots & a_{0,n-1} \\
a_{1,0} & a_{1,1} & \cdots & a_{1,n-1} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m-1,0} & a_{m-1,1} & \cdots & a_{m-1,n-1} \\
\end{bmatrix}
\in \mathbb{R}^{m, n}.
$$
  - A matrix (e.g. $\Mat{A}$) is represented by 2D `np.ndarray` (e.g. `A`) in NumPy.
  - `A.shape`: the shape of `A`. If `A` represents an $m \times n$ matrix ($\Mat{A} \in \mathbb{R}^{m, n}$), then `A.shape == (m, n)`.
  - $a_{i,j}$ (`A[i, j]`): the element in the $i$-th row and $j$-th column of $\Mat{A}$ (`A`).
  - `A[i, :]`: the 1D `np.ndarray` that contains the $i$-th row of $\Mat{A}$\
`A[i, :]` $=$ `[A[i, 0], A[i, 1], ..., A[i, n-1]]`
$=
    \begin{bmatrix}
    a_{i,0} & a_{i,1} & \cdots & a_{i,n-1} \\
    \end{bmatrix}
    \in \mathbb{R}^{n}.
$
  - `A[:, j]`: the 1D `np.ndarray` that contains the $j$-th column of $\Mat{A}$.\
`A[:, j]` $=$ `[A[0, j], A[1, j], ..., A[m-1, j]]`
$=
    \begin{bmatrix}
    a_{0,j} \\ a_{1,j} \\ \vdots \\ a_{m-1,j} \\
    \end{bmatrix}
    \in \mathbb{R}^{m}.
$
  - A 2D `np.ndarray` is the stack of the 1D arrays that correspond to the **row** vectors.\
`A` $=$ `[A[0, :], A[1, :], ..., A[m-1, :]]`\
 $=$\
`[[A[0, 0], A[0, 1], ..., A[0, n-1]],`\
&ensp;`[A[1, 0], A[1, 1], ..., A[1, n-1]],`\
&ensp;`...,`\
&ensp;` [A[m-1, 0], A[m-1, 1], ..., A[m-1, n-1]]]`
- $\Mat{A}^{\top}$ (`A.T`): the transpose of $\Mat{A}$. If $\Mat{A} \in \mathbb{R}^{m, n}$, then $\Mat{A}^{\top} \in \mathbb{R}^{n, m}$ (`A.shape == (m, n)` iff `A.T.shape == (n, m)`).


### Example of accessing elements of 2D `np.ndarray`.

In [None]:
A = np.arange(20).reshape(4, 5)
print('A =\n', A, '\n')
print('A[2, :] =', A[2, :])
print('A[:, 3] =', A[:, 3])
print('A[1, 4] =', A[1, 4])


## Supervised learning (prediction, e.g. regression and classification)
### Overview
- What to do: to output ${\hat{\Mat{Y}}_\mathrm{new}}$ as predicted target values for new feature values ${\Mat{X}_\mathrm{new}}$.
  - Note: if target values are real values, the problem is called a **regression** and if class labels, the problem is called a **classification**.
- What we have: the feature and target values${\Mat{X}_\mathrm{train}}, {\Mat{Y}_\mathrm{train}}$ of the training data.
  - Example 1: house price prediction (regression), where each value in $\Mat{X}$ is the income of an customer and the value in the same row in $\Mat{Y}$ indicates the customor's house price.
  - Example 2: breast cancer prediction (classification), where each values in $\Mat{X}$ is the area of a tumour and the value in the same row in $\Mat{Y}$ indicates its label (benign/malignant).


### Fit (`obj.fit`): ${\Mat{X}_\mathrm{train}}, {\Mat{Y}_\mathrm{train}} \mapsto (\Vec{\theta})$.
- Inputs: 
  - ${\Mat{X}_\mathrm{train}}$ (`X_train`): the feature matrix, the matrix (2D `np.ndarray`) that contains the feature vectors of the training data. ${\Mat{X}_\mathrm{train}} \in \mathbb{R}^{m_\mathrm{train}, n}$ (`X_train.shape == (m_train, n)`).
    - $m_\mathrm{train}$ (`m_train`): the number of the training and data points. $m_\mathrm{train} \in \mathbb{N}$ (`type(m_train) == int`).
    - $n$ (`n`): the dimension of a feature vector. $n \in \mathbb{N}$ (`type(n) == int`).
    - The $i$-th row $\Vec{x}_\mathrm{train}^{(i) \top}$ (`X_train[i, :]`) of the feature matrix ${\Mat{X}_\mathrm{train}}$: the feature vector (1D `np.ndarray`) of the $i$-th training data point ($i = 0, 1, \dots, m_\mathrm{train}-1$). $\Vec{x}_\mathrm{train}^{(i) \top} \in \mathbb{R}^{n}$ (`X_train[i, :].shape == (n, )`).
$$
        \Mat{X}_\mathrm{train}
        = \begin{bmatrix}
        \Vec{x}_\mathrm{train}^{(0) \top} \\
        \Vec{x}_\mathrm{train}^{(1) \top} \\
        \vdots \\
        \Vec{x}_\mathrm{train}^{(m_\mathrm{train} - 1) \top} \\
        \end{bmatrix}
        \in \mathbb{R}^{m_\mathrm{train}, n}.
$$
  - ${\Mat{Y}_\mathrm{train}}$ (`y_train` for one-dimensional target cases or `Y_train` for multi-dimensional target cases): the target matrix, the matrix (1D or 2D `np.ndarray`) that contains the target vectors of the training data. ${\Mat{Y}_\mathrm{train}} \in \mathbb{R}^{m_\mathrm{train}, p}$ (`y_train.shape == (p, )` or `Y_train.shape == (m_train, p)`).
    - $p$ (`p`): the dimension of a target vector. $p \in \mathbb{N}$ (`type(p) == int`).
    - The $i$-th row $\Vec{y}_\mathrm{train}^{(i) \top}$ (`y_train[i]` or `Y_train[i, :]`) of the target matrix ${\Mat{Y}_\mathrm{train}}$: the target vector (`int` or 1D `np.ndarray`) of the $i$-th training data point ($i = 0, 1, \dots, m_\mathrm{train}-1$). $\Vec{y}_\mathrm{train}^{(i) \top} \in \mathbb{R}^{n}$ (`y_train[i].shape == ()` or `Y_train[i, :].shape == (p, )`).
$$
        \Mat{Y}_\mathrm{train}
        = \begin{bmatrix}
        \Vec{y}_\mathrm{train}^{(0) \top} \\
        \Vec{y}_\mathrm{train}^{(1) \top} \\
        \vdots \\
        \Vec{y}_\mathrm{train}^{(m_\mathrm{train} - 1) \top} \\
        \end{bmatrix}
        \in \mathbb{R}^{m_\mathrm{train}, n}.
$$
- Implicit outputs (attributes to be stored):
  - ${\Vec{\theta}}$ (e.g. `obj.coeff_`): the parameter vector that specifies a hypothesis (a function $f_{\Vec{\theta}}: \mathbb{R}^{n} \to \mathbb{R}^{p}$). ${\Vec{\theta}}$ is chosen so that it minimizes the loss function $L(\Mat{X}_\mathrm{train}, \Mat{Y}_\mathrm{train}; \Vec{\theta})$.



### Predict (`obj.predict`): ${\Mat{X}_\mathrm{new}}(, \Vec{\theta}) \mapsto  {\hat{\Mat{Y}}_\mathrm{new}}$.
- Inputs: 
  - ${\Mat{X}_\mathrm{new}}$ (`X_new`): the feature matrix, the matrix (2D `np.ndarray`) that contains the feature vectors of the new data. ${\Mat{X}_\mathrm{new}} \in \mathbb{R}^{m_\mathrm{new}, n}$ (`X_new.shape == (m_new, n)`).
    - $m_\mathrm{new}$ (`m_new`): the number of the new and data points. $m_\mathrm{new} \in \mathbb{N}$ (`type(m_new) == int`).
    - The $i$-th row $\Vec{x}_\mathrm{new}^{(i) \top}$ (`X_new[i, :]`) of the feature matrix ${\Mat{X}_\mathrm{new}}$: the feature vector (1D `np.ndarray`) of the $i$-th new data point ($i = 0, 1, \dots, m_\mathrm{new}-1$). $\Vec{x}_\mathrm{new}^{(i) \top} \in \mathbb{R}^{n}$ (`X_new[i, :].shape == (n, )`).
$$
        \Mat{X}_\mathrm{new}
        = \begin{bmatrix}
        \Vec{x}_\mathrm{new}^{(0) \top} \\
        \Vec{x}_\mathrm{new}^{(1) \top} \\
        \vdots \\
        \Vec{x}_\mathrm{new}^{(m_\mathrm{new} - 1) \top} \\
        \end{bmatrix}
        \in \mathbb{R}^{m_\mathrm{new}, n}.
$$
- Implicit inputs (attributes to be used):
  - ${\Vec{\theta}}$ (e.g. `obj.coeff_`): the parameter that specifies a hypothesis (a function $f_{\Vec{\theta}}: \mathbb{R}^{n} \to \mathbb{R}^{p}$).

- Outputs:
  - ${\hat{\Mat{Y}}_\mathrm{new}}$ (`y_pred` for one-dimensional target cases or `Y_pred` for multi-dimensional target cases): the predicted target matrix, the matrix (1D or 2D `np.ndarray`) that contains the predicted values as target vectors for the feature vectors of the new data. ${\hat{\Mat{Y}}_\mathrm{new}} \in \mathbb{R}^{m_\mathrm{new}, p}$ (`y_pred.shape == (p, )` or `Y_pred.shape == (m_new, p)`).
    - $p$ (`p`): the dimension of a target vector. $p \in \mathbb{N}$ (`type(p) == int`).
    - The $i$-th row $\hat{\Vec{y}}_\mathrm{new}^{(i) \top}$ (`y_pred[i]` or `Y_pred[i, :]`) of ${\hat{\Mat{Y}}_\mathrm{new}}$: 
    the predicted vector (`int` or 1D `np.ndarray`) as a target vector of the $i$-th new data point ($i = 0, 1, \dots, m_\mathrm{new}-1$), given by $\hat{\Vec{y}}_\mathrm{new}^{(i)} = f_\Vec{\theta} (\Vec{x}_\mathrm{new}^{(i)})$.
$$
      \hat{\Mat{Y}}_\mathrm{new}
      = \begin{bmatrix}
      \hat{\Vec{y}}_\mathrm{new}^{(0) \top} \\
      \hat{\Vec{y}}_\mathrm{new}^{(1) \top} \\
      \vdots \\
      \hat{\Vec{y}}_\mathrm{new}^{(m_\mathrm{new} - 1) \top} \\
      \end{bmatrix}
      = \begin{bmatrix}
      f_\Vec{\theta} (\Vec{x}_\mathrm{new}^{(0)})^\top \\
      f_\Vec{\theta} (\Vec{x}_\mathrm{new}^{(1)})^\top \\
      \vdots \\
      f_\Vec{\theta} (\Vec{x}_\mathrm{new}^{(m_\mathrm{new} - 1)})^\top \\
      \end{bmatrix}
      \in \mathbb{R}^{m_\mathrm{new}, n}.
$$


## Supervised learning example
### Load the dataset and show


In [None]:
# Load the house price dataset
house = sklearn.datasets.fetch_california_housing()
raw_df = pd.DataFrame(data= np.c_[house['data'], house['target']],
                     columns= house['feature_names'] + ['target'])
# Shuffle dataset
rng = np.random.default_rng(0)
df = raw_df.iloc[rng.permutation(len(raw_df))].reset_index(drop=True)

# show the data
display(df)

# Use only one feature
col = 'MedInc'

# show the data
Xy_df = df[[col, 'target']]
display(Xy_df)


Note: `MedInc` is the abbreviation of the median of incomes. We take the median since each row of the dataset correponds to a set of people in a district.

### Convert the data to NumPy ndarrays.

In [None]:
# prepare NumPy ndarrays
X = np.array(df[[col]])
y = np.array(df['target'])

n_train_points = 50
n_new_points = 10

# Split the data into training/new data
X_train = X[:n_train_points]
X_new = X[n_train_points:n_train_points+n_new_points]

# Split the targets into training/new data
y_train = y[:n_train_points]
y_true = y[n_train_points:n_train_points+n_new_points]


Note: Data splitting will be explained in the 4th week's lecture. It is necessary for a fair evaluation.

### Fit and predict

In [None]:
# Create linear regression object
obj = sklearn.linear_model.LinearRegression()

# Train the model using the training sets
obj.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = obj.predict(X_new)

# The parameters
theta = obj.coef_


### Plot outputs

In [None]:
# Plot outputs
plt.scatter(X_new, y_true,  color='black', label='y_true')
plt.scatter(X_new, y_pred, color='blue', label='y_pred')
plt.plot(np.r_[0:10:0.1], obj.predict(np.r_[0:10:0.1][:, np.newaxis]), color='blue', label='hypothesis')

plt.xlim([0,6])
plt.ylim([0,6])

plt.xlabel(col)
plt.ylabel('house price')

plt.legend()

plt.show()


### Loss function and score function
- Loss: the lower, the better.
- Score: the higher, the better. 

In [None]:
# The mean squared error loss
print('Mean squared error loss: {:.4f}'.format(sklearn.metrics.mean_squared_error(y_true, y_pred)))
# The R2 score: 1 is perfect prediction
print('R2 score: {:.4f}'.format(sklearn.metrics.r2_score(y_true, y_pred)))


Note: the mean square error is affected by the scale of the target values, while R2 score is normalized by the variance of the target values.

In [None]:
# Plot outputs
plt.scatter(X_new, y_true,  color='black', label='y_true')
plt.scatter(X_new, y_pred, color='blue', label='y_pred')
plt.plot(np.r_[0:10:0.1], obj.predict(np.r_[0:10:0.1][:, np.newaxis]), color='blue', label='hypothesis')

plt.xlim([0,6])
plt.ylim([0,6])

plt.xlabel(col)
plt.ylabel('house price')

plt.legend()

plt.show()


In [None]:
# The mean squared error loss
print('Mean squared error loss: {:.4f}'.format(sklearn.metrics.mean_squared_error(y_true, y_pred)))
# The R2 score: 1 is perfect prediction
print('R2 score: {:.4f}'.format(sklearn.metrics.r2_score(y_true, y_pred)))
