# LDA
In this TP you are going to build the Linear Discriminant Analysis classifier, that can be also used for dimensionality reduction. 

You are going to fill a few missing functions in the python scripts to implement the exercises that we ask. So first of all read and understand the given python scripts.  To run your code you have to run the main\_lda.ipynb notebook.  

## Exersises

- Fill the missing functions in the `LDA`, `LDARaleygh` and `LDAGD` class in `lda.py, lda\_raleygh.py and lda\_gd.py to implement the the LDA algorithm.

- Use main\_lda.ipynb in order to run the functions that you implement in the `LDA`, `LDARaleygh` and `LDAGD` classes. You cannot modify the given functions.

- Write a function named `compute_accuracy(y_true, y_pred)` in the `utils.py` script. The function takes as arguments the true and the predicted class labels and returns the accuracy. Use only numpy.



- Once your implementation is ready you will work with for the following datasets:

**Datasets:**
- Iris dataset (https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-plants-dataset)
- Breast cancer dataset (https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-wisconsin-diagnostic-dataset)
	



## General instruction

The code should be well written with detailed comments to explain what you do at each step.  Avoid the for loops and if statements using the nymPy library. Your code should be generic and you should use the given functions.

In [None]:
from __future__ import print_function
import numpy as np

from utils import train_test_split, compute_accuracy
# make figures appear inline
%matplotlib inline


# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

np.random.seed(42)

## Load Breast Cancer dataset

In [None]:
from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer()
X = dataset.data
y = dataset.target
X_train, y_train, X_test, y_test = train_test_split(X, y, 0.3, normalize=True)

# print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

## Import the LDA classifiers and start filling the missing parts.

### 1. Test LDA with Rayleigh quotient implementation

In [None]:
# import classifier 
from lda_rayleigh  import LDARayleigh

In [None]:
lda_clf = LDARayleigh(n_components=1)


In [None]:
SW, SB = lda_clf.calculate_scatter_matrices(X_train, y_train)
assert SW.shape == SB.shape == (X_train.shape[1], X_train.shape[1])

In [None]:
lda_clf._calculate_discriminants(X_train, y_train)
assert lda_clf.linear_discriminants is not None
assert lda_clf.linear_discriminants.shape == (X_train.shape[1], lda_clf.n_components)

In [None]:
lda_clf.train(X_train, y_train)

In [None]:
y_pred = lda_clf.predict(X_test)
assert y_pred.shape == y_test.shape

In [None]:
lda_clf.plot_1d(X_test, y_test)

### 2. Test Gradient Descent implementation

In [None]:
from lda_gd import LDAGD

In [None]:
lda_clf = LDAGD(n_components=1)

In [None]:
SW, SB = lda_clf.calculate_scatter_matrices(X_train, y_train)
assert SW.shape == SB.shape == (X_train.shape[1], X_train.shape[1])

In [None]:
lda_clf._calculate_discriminants(X_train, y_train)
assert lda_clf.linear_discriminants is not None
assert lda_clf.linear_discriminants.shape == (X_train.shape[1], lda_clf.n_components)

In [None]:
lda_clf.train(X_train, y_train)

In [None]:
y_pred = lda_clf.predict(X_test)
assert y_pred.shape == y_test.shape

In [None]:
lda_clf.plot_1d(X_test, y_test)

### Once your implementation is ready run the both classifiers and compute the classification accuracy for breast cancer dataset.

#### (You have to fill  the compute_accuracy() function in the utils.py script)


-------------------------------------------------------------------------------------------------------------------

### Run the LDARayleigh and LDAGD classifiers using on the Breast Cancer dataset and compute classification accuracy on both training and test set.

In [None]:
from utils import compute_accuracy

## Iris dataset

In [None]:
# Cleaning up variables to prevent loading data multiple times (which may cause memory issues)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

In [None]:
# Loas the breast cancer data set
from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer()
X = dataset.data
y = dataset.target
X_train, y_train, X_test, y_test = train_test_split(X, y, 0.3, normalize=True)

In [None]:
# import classifier 
from lda_rayleigh import LDARayleigh
lda_clf = LDARayleigh(n_components=1)
lda_clf.train(X_train, y_train)
y_pred_train = lda_clf.predict(X_train)
y_pred_test = lda_clf.predict(X_test)
train_accuracy = compute_accuracy(y_train, y_pred_train)
test_accuracy = compute_accuracy(y_test, y_pred_test)

print(f"LDA_Rayleigh train accuracy: {train_accuracy}")
print(f"LDA_Rayleigh test accuracy: {test_accuracy}")

In [None]:
# import classifier 
from lda_gd  import LDAGD
lda_clf = LDAGD(n_components=1)
lda_clf.train(X_train, y_train)
y_pred_train = lda_clf.predict(X_train)
y_pred_test = lda_clf.predict(X_test)
train_accuracy = compute_accuracy(y_train, y_pred_train)
test_accuracy = compute_accuracy(y_test, y_pred_test)

print(f"LDA_GD train accuracy: {train_accuracy}")
print(f"LDA_GD test accuracy: {test_accuracy}")

### LDA for dimensionality reduction.

Similarly to PCA, LDA can  can also be used as a dimensionality reduction technique. 
In fact, LDA provides a projection of the data points in a lower dimension that best separates the examples by their assigned class.

In [None]:
from utils import load_IRIS
X_train, y_train, X_test, y_test = load_IRIS(test=True)

# print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)              

In [None]:
lda_clf = LDARayleigh(n_components=2)
lda_clf.train(X_train, y_train)
lda_clf.plot_2d(X_train, y_train, "Train")
lda_clf.plot_2d(X_test, y_test, "Test")

In [None]:
lda_clf = LDAGD(n_components=2)
lda_clf.train(X_train, y_train)
lda_clf.plot_2d(X_train, y_train, "Train")
lda_clf.plot_2d(X_test, y_test, "Test")

-------------------------------------------------------------------------------------------------------------------