# Linear Discriminant Analysis & Quadratic Discriminant Analysis

Güney Işık Tombak \\ Bogazici AI


**Important Note:**

*Parts A and B are optional but recommended*, **C and D must be done.**

In [None]:
# Standard Imports

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd

# use seaborn plotting defaults
import seaborn as sns; sns.set()

plt.rcParams['figure.figsize'] = [8, 6]

random_state = 42 # Check Hitchhiker's Guide to the Galaxy 
# Do Not Change It 

## A) Prologue: Linear Dichotomies (Optional)

For a set of $n$ points *in general position* inside a $d$ dimensional space $(\{\mathbf{x}\}_{i=1}^n \in \mathbb{R^d})$, there are

$$C(n,d) = 2 \sum_{i=0}^{d-1} {n-1 \choose i} = 2 \sum_{i=0}^{d-1} \frac{(n-1)!}{i!(n-1-i)!}$$

possible linear dichotomies. This is known as [**Cover's Theorem**](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4038449&casa_token=mff1wM7fQSMAAAAA:KbkambCYJAP_wifALgrvYsgGA0KQh90hev-7ettzoyauMcxMm4UmZS7GMLWmpS0Qlkv4pzZVjg&tag=1).

You should realize that the seperative hyperplane $(\mathbb{R}^{d-1})$ should cross the origin to linear creation of dichotomies. 

[General position](https://en.wikipedia.org/wiki/General_position): Any combination of $k$ samples should not lie in a flat of $\mathbb{R}^{k-2}$.


#### 1) Implementation of Cover's Theorem

Implement the Cover's theorem as a function for a given $(n,d)$. Moreover, using the recursive relation, create a recursive version of the algorithm:

$$C(n+1,d) = C(n,d) + C(n, d-1).$$

For large $n$ and $d$ values, compare the computational duration of these functions (use `%timeit`).

Hint: Realize $C(1,m) = \begin{cases} 2 & m \geq 1 \\ 0 & m < 1 \end{cases}$

In [None]:
def covers_theorem(n, d):
    pass
    # Your code here


def recursive_covers_theorem(n, d):
    pass
    # Your code here


#n, d = SOME_BIG_NUMBER_1, SOME_BIG_NUMBER_2

# %timeit covers_theorem(n, d)
# %timeit recursive_covers_theorem(n, d)

#### 2) 2D Example

Check the 2D datasets below. Calculate the number of possible linear dichotomies and show an example of them on the dataset as a line. Realize that you have to determine only one variable ($a$) since the equation is $y = ax$. Since $a$ should be able to equal to $\pm \infty$, we can use

$$y = x \cdot tan(\theta)$$

where $\theta$ is defined in radians (i.e. $\pi = 180^{\circ}$). You can use [`numpy.deg2rad`](https://numpy.org/doc/stable/reference/generated/numpy.deg2rad.html).

Are the results consistent? If not, please explain why.

In [None]:
angle_trunc = lambda a: (a+np.ceil(np.abs(a)/(2*np.pi))*2*np.pi+np.pi)%(np.pi*2) - np.pi

def scat(X, theta=None, title=None):

    coor_max = np.max(np.abs(X))

    plt.scatter(X[:,0], X[:,1])
    plt.plot([-coor_max, coor_max], [0,0], '-k', linewidth=2)
    plt.plot([0,0], [-coor_max, coor_max], '-k', linewidth=2)

    if theta is not None:
        theta = angle_trunc(theta)
        x = np.array([-np.cos(theta), np.cos(theta)])
        y = np.array([-np.sin(theta), np.sin(theta)])
        c1 = np.abs(coor_max / np.sin(theta))
        c2 = np.abs(coor_max / np.cos(theta))
        l = min(c1, c2)
        plt.plot(l*x, l*y, '--r', linewidth=2)

    if title is not None:
        plt.title(title)


# Dataset 1
X1 = [[1, 1], [-1, -2], [1, -1]]
X1 = np.asarray(X1)

plt.figure(1)
scat(X1, title='First Dataset')
plt.show()

#Dataset 2
X2 = [[-0.5, -1], [-1.5, 1], [1, 2]]
X2 = np.asarray(X2)

plt.figure(2)
scat(X2, title='Second Dataset')
plt.show()

# Example of how to use scat:

angle = np.pi/5
plt.figure(3)
scat(X2, theta=angle, title='Example')
plt.show()

In [None]:
# Your answer here

Your answer here

#### 3) Automatic 2D Possible Number of Dichotomies Detection

Find a way to automatically detect number of possible dichotomies using a brute-force technique (try all *???*). Realize that for each point $\mathbf{x}$, you can detect its class assignment using the equation below:

$$sign(\mathbf{w}^\top \mathbf{x})$$

Explain your reasoning briefly.

In [None]:
def find_n_dichotomies(X):
    pass
    # Your answer here

print(find_n_dichotomies(X1))
print(find_n_dichotomies(X2))

Your answer here 

## B) Separation with Lines (Optional)

In [None]:
# Toy Dataset Creation

from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=100, centers=2,
                  random_state=random_state, cluster_std=2)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='jet')
plt.show()

#### 1) Possible Lines

Create at least three different lines to separate the clusters described above

$x_2 = mx_1 + b$

Format: `[(m_1, b_1), (m_2, b_2), ..., (m_N, b_N)]`

Tip: You can use Desmos or GeoGebra to determine the line more easily

In [None]:
mb = # Your answer here

xfit = np.linspace(-10, 10) 
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='jet')
plt.plot([0.6], [2.1], 'x', color='red', markeredgewidth=2, markersize=10)

for m, b in mb:
    plt.plot(xfit, m * xfit + b, '-k')

plt.show()

#### 2) Combining Clusters

Using the mean and variance of the all samples 
create a Gaussian random dataset of 1000 points.

Use the random_state provided and plot them

Note: Both variance and the mean is in 2D

In [None]:
def gauss_randomizer(X_in, random_state=random_state):
    
    N = 1000
    mean_X, std_X = np.mean(X_in), np.std(X_in)

    # Your answer here

    return X_out

Xg = gauss_randomizer(X) 

plt.scatter(Xg[:, 0], Xg[:, 1], s=50, cmap='jet')
plt.show()

#### 3) Weights

Derive the vector $\mathbf{w}$ in terms of m and b realizing that the line equation can also be written as 

$\mathbf{w} \cdot \mathbf{x} = c$

Divide the samples into two clusters using the lines you determined in `mb`

Show the number of points in each cluster for each line using the equation:

$\hat{y} = \begin{cases} -1 & \mathbf{w} \cdot \mathbf{x} - c < 0 \\ +1 & \mathbf{w} \cdot \mathbf{x} - c \geq 0 \end{cases}$

The alternative formulation of the equation is:

$\hat{y} (\mathbf{w} \cdot \mathbf{x} - c) \geq 0$

Comment on the results you achieved. Which line seems to be better? Why? Can you name two parameters of the initial distributions that you consider when you fitting the line?

In [None]:
xfit = np.linspace(-10, 10)

# Your answer here

for m, b in mb:
    plt.plot(xfit, m * xfit + b, '-k')

    # and here

Your answer here

## C) Linear Discriminant Analysis

In [None]:
# Toy Dataset Creation

def toy_dataset_create(n_samples=100, show_cls=True):

    X_list, y_list = list(), list()
    std_list = [3,2,4]
    rand_sts = [2,3,10]

    for cls_std, rand_state in zip(std_list, rand_sts):
        X, y = make_blobs(n_samples=n_samples, centers=2,
                        random_state=rand_state, cluster_std=cls_std)
        
        X_list.append(X)
        y_list.append(y)

        plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='jet')
        if show_cls:
            plt.show()

    return X_list, y_list

In [None]:
X_list, y_list = toy_dataset_create()

#### 1) Mean Divider

To divide a cluster into two, we can consider the mean of each distribution and think a hypothetical vector going from one mean to the other. Then we can divide the distribution into clusters by a line crossing the midpoint of the means and orthogonal to the vector.

Prove that this hypothetical vector is indeed $w$. Then, for each distribution above, plot the $w$ and the division line $l$ using the midpoint $\mathbf{\mu}$.

Realize that $w \cdot l = 0$ due to orthogonality.

Your answer here

In [None]:
for X, y in zip(X_list, y_list):

    # Your answer here

#### 2) Effect of Number of Samples

Do the same for different number of samples. Do you observe any change? Please discuss.

In [None]:
n_samples_list = [10, 100, 1000]

for n_samples in n_samples_list:

    X_list, y_list = toy_dataset_create(n_samples=n_samples, 
                                        show_cls=False)

    for X, y in zip(X_list, y_list):

        # Your answer here


Your answer here

#### 3) LDA of Sci-Kit Learn

Use Linear Discriminant Analysis (LDA) by Sci-kit learn to do the same at 2b. 

To visualize the decision boundary, use $w=$ `clf.coef_` and $c=$ `clf.intercept_` .

What do you observe? Is it similar to the previous results? What LDA considers other than the mean of the clusters? What might be a problem for LDA?

*Hint: [LDA and QDA by Scikit Learn](https://scikit-learn.org/stable/modules/lda_qda.html)*

In [None]:
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

clf = LDA()
n_samples_list = [10, 100, 1000]

for n_samples in n_samples_list:

    X_list, y_list = toy_dataset_create(n_samples=n_samples, 
                                        show_cls=False)

    for X, y in zip(X_list, y_list):

        # Your answer here

Your answer here

#### 4) Difficulties for LDA

Construct two distributions that geometrically meaningful (a human can divide by looking at it easily) and LDA fails. Explain why LDA fails and propose a solution (no need for an implementation).

*Hint: You can use `numpy.random.multivariate_normal`*

In [None]:
# Your answer here

Your answer here

## D) Quadratic Discriminant Analysis

Try part 4 and 5 using Quadratic Discriminant Analysis. The main difference is, instead of using $\mathbf{w}^\top \mathbf{x} + w_0 = 0$, QDA looks for a more complex set of weights, such that:

$$\mathbf{x}^\top W_2 \mathbf{x} + \mathbf{w}_1^\top \mathbf{x} + w_0 = 0$$

Comment on the differences you observed.

In [None]:
import numpy as np
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA

clf = QDA()
n_samples_list = [10, 100, 1000]

for n_samples in n_samples_list:

    X_list, y_list = toy_dataset_create(n_samples=n_samples, 
                                        show_cls=False)

    for X, y in zip(X_list, y_list):

        # Your answer here

Your answer here