<h1 style="text-align:center">Linear Discriminant Analysis (LDA)<br>[<a href="https://sebastianraschka.com/Articles/2014_python_lda.html">source</a>]</h1>

<p style="font-size:15px">LDA can be describe as:<ul style="font-size:15px">
        <li>It is used as a dimensionality reduction technique
        <li>It is used in the pre-processing step for pattern classification and machine learning algorithms
        <li>It has the goal to project a dataset onto a lower-dimensional space<ul>
                <li>Breaking it down further, the goal of LDA is to project a feature space (a dataset n-dimensional samples) onto a small subspace k (where k <= n-1) while maintaining the class-discriminatory information
            </ul>
        <li><strong>LDA is different with PCA because in addition to finding the component axises with LDA we are interested in the axes that maximize the separation between multiple classes</strong>
        <li>Both PCA and LDA are linear transformation techniques used for dimensional reduction
        <li>It is supervised because of the relation to the dependent variable
    </ul>
</p><br>

<p style="font-size:15px">We can accomplish this by 5 main steps:<ol>
        <li>Compute the <i>d</i>-dimensional mean vectors for the different classes from the dataset
        <li>Compute the scatter matrices (in-between-class and within-class scatter matrix)
        <li>Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
        <li>Sort the eigenvectors by decresing eigenvalues and choose <i>k</i> eigenvectors with the largest eigenvalues to form a <i>d x k</i> dimensional matrix <strong><i>W</i></strong> (where every column represents an eigenvector)
        <li>Use this <i>d x k</i> eigenvector matrix to transform the samples onto the new subspace, this can be summarized by the matrix multiplication: <strong><i>Y = X x W</i></strong> (where <strong><i>X</i></strong> is a <i>n x d</i>-dimensional matrix representing the <i>n</i> samples, and <i>y</i> are the transformed <i>n x k</i>-dimensional samples in the new subspace)
    </ol>
</p><br>

<p style="font-size:15px"></p>
<p style="font-size:15px"></p>
<p style="font-size:15px"></p>
<p style="font-size:15px"></p>
<p style="font-size:15px"></p>

---
<h2>1. Importing the Dataset</h2>

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('../../../../data/clean/Wine.csv')
display(df.head())
x = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

Unnamed: 0,Alcohol,Malic_Acid,Ash,Ash_Alcanity,Magnesium,Total_Phenols,Flavanoids,Nonflavanoid_Phenols,Proanthocyanins,Color_Intensity,Hue,OD280,Proline,Customer_Segment
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065,1
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050,1
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,1
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480,1
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735,1


---
<h2>2. Splitting the Dataset</h2>

In [2]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
print("train dataset size : {} observations\ntest dataset size : {} observations".format(x_train.shape[0], x_test.shape[0]))

train dataset size : 142 observations
test dataset size : 36 observations


---
<h2>3. Feature Scaling</h2>

In [3]:
from sklearn.preprocessing import StandardScaler

stand_x = StandardScaler().fit(x_train)
x_ss = stand_x.transform(x_train)

---
<h2>4. Applying LDA</h2>

In [4]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n_components=2).fit(x_ss, y_train)
x_lda = lda.transform(x_ss)

---
<h2>5. Training the Logistic Regression Model (for testing purpose) on The Training Dataset</h2>

In [5]:
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(random_state=0)
logreg.fit(x_lda, y_train)

LogisticRegression(random_state=0)

---
<h2>6. Predicting the Test Dataset Results</h2>

In [6]:
y_pred = logreg.predict(lda.transform(stand_x.transform(x_test)))

pd.DataFrame(data=np.stack((y_test, y_pred), axis=1),
             index=None, columns=['y actual', 'y prediction'],
             copy=False).head(10)

Unnamed: 0,y actual,y prediction
0,1,1
1,3,3
2,2,2
3,1,1
4,2,2
5,2,2
6,1,1
7,3,3
8,2,2
9,2,2


---
<h2>7. Making the Confusion Matrix</h2>

In [7]:
from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_test, y_pred))
print("\nConfusion matrix result shows that:\n\t- 14 correct predictions of the customer segment 1\
        \n\t- 16 correct predictions of the customer segment 2\
        \n\t- 6 correct predictions of the customer segment 3")

[[14  0  0]
 [ 0 16  0]
 [ 0  0  6]]

Confusion matrix result shows that:
	- 14 correct predictions of the customer segment 1        
	- 16 correct predictions of the customer segment 2        
	- 6 correct predictions of the customer segment 3
