### Instructions:
* You need to code in this jupyter notebook only.
* Download this notebook and import in your jupyter lab.
* You need to write a partial code for step 0 to step 8 mentioned with prefix ##
* Fill the blanks where it is instructed in comments. 
* Leave other codes, structure as it is.
* Follow all the instructions commented in a cells.



**Answer the questions given at the end of this notebook within your report.**

**Upload this jupyter notebook after completion with your partial code and the report in one file in PDF format.**

**Also upload the resulting image showing all the selected points and boundary line between them after LDA analysis.**

**Your file name should be yourname_lab4.pdf. Upload it before the due time.**

In [1]:
import numpy as np ## import numpy
import cv2 ## import opencv
import matplotlib ## import matplotlib
import matplotlib.pyplot as plt ## import matplotlib pyplot
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis ## from sklearn import LDA analysis
matplotlib.use('TkAgg')

##---------------------------------------------------
## Step 0: Install all other dependencies that occur at run time if  any module not found.
##---------------------------------------------------

In [2]:
Number_of_points = 20  ## Number of points you want select from each strip. Recommended >= 20 

img = cv2.imread("Indian_Flag.jpg") ## Read the given image

def select_points(img, title):
    fig, ax = plt.subplots()
    #------------------------------------------
    ## step 1: Convert the img from BGR to RGB using cv2 and display it using cv2.imshow
    img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
    ax.imshow(img)
    ## step 2: Put title of the image
    ax.set_title(title)
    ##-----------------------------------------
    
    # Set the cursor style to a plus sign
    fig.canvas.manager.set_window_title('Select Points')
    cursor = matplotlib.widgets.Cursor(ax, useblit=True, color='red', linewidth=1)
    plt.show(block=False)  # Show the image without blocking

    k = 0
    points = [] ## Create here an empty list to store points 

    while k < Number_of_points:
        xy = plt.ginput(1, timeout=0)  # Non-blocking input
        if len(xy) > 0:
            col, row = map(int, xy[0])  # Convert to integer
            ##-----------------------------------------------
            ## Step 3: Collect RGB values at the clicked positions (col, row) and print it.
            rgb_val = img[row,col]
            print(rgb_val)
            ##-----------------------------------------------

            k += 1
            points.append([row, col, img[row, col]])  # Store RGB values in empty list points.
            
            # Display colored dot on the image
            plt.scatter(col, row, c='black', marker='o', s=10)

            # Redraw the image to include the dot
            plt.draw()

    plt.close()  # Close the window after all points are collected
    return points ## Fill this blank

In [3]:
##-----------------------------------------------------------------
## Step4: fill the blanks for Selected points from saffron strip
pts_saffron = select_points(img, "Select points from saffron strip")
## Step5: fill the blanks for Selected points from white strip)
pts_white = select_points(img, "Select points from white strip")
## Step6: fill the blanks for Selected points from green strip
pts_green = select_points(img, "Select points from green strip")
##-----------------------------------------------------------------

[243  85  22]
[229  82  31]
[254 100  30]
[244  90  20]
[241  87  17]
[208 205 186]
[255 104  28]
[247  86  18]
[245  86  18]
[245  81  10]
[248  82   8]
[235  68   0]
[247  72   7]
[243  76   6]
[241  75  15]
[193  87  45]
[223  66  21]
[246  89  34]
[248  87  17]
[244  80   9]
[217 220 239]
[221 221 223]
[227 220 227]
[255 220 193]
[219 222 195]
[209 208 222]
[214 215 236]
[ 89  87 101]
[221 217 216]
[227 229 241]
[217 214 225]
[225 225 235]
[233 224 241]
[217 215 218]
[224 218 230]
[216 215 231]
[228 227 245]
[216 217 235]
[192 188 205]
[210 204 216]
[29 93 69]
[29 97 72]
[33 96 67]
[31 91 66]
[27 91 65]
[23 88 64]
[24 90 63]
[25 91 64]
[32 97 77]
[32 96 72]
[29 95 68]
[26 99 70]
[21 87 59]
[33 98 74]
[31 97 70]
[24 82 58]
[18 82 56]
[23 87 61]
[27 98 68]
[ 31 103  79]


In [4]:
# Convert RGB values to Lab color space
def rgb_to_lab(rgb):
    return cv2.cvtColor(np.uint8([[rgb]]), cv2.COLOR_RGB2Lab)[0][0]

saffron_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_saffron])
white_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_white])
green_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_green])

## Step7: Extract a* and b* components from Lab color space
a_features = np.hstack([saffron_lab[:, 1], white_lab[:, 1], green_lab[:, 1]])
b_features = np.hstack([saffron_lab[:, 2], white_lab[:, 2], green_lab[:, 2]])

In [5]:
# Map class labels to numeric values
class_mapping = {'Saffron': 0, 'White': 1, 'Green': 2}
y = np.array([class_mapping[label] for label in ['Saffron'] * Number_of_points + ['White'] * Number_of_points + ['Green'] * Number_of_points])

plt.figure()
plt.scatter(a_features[:Number_of_points], b_features[:Number_of_points], c='b', marker='o', s=50, label='Saffron')
plt.scatter(a_features[Number_of_points:2*Number_of_points], b_features[Number_of_points:2*Number_of_points], c='g', marker='^', s=50, label='White')
plt.scatter(a_features[2*Number_of_points:], b_features[2*Number_of_points:], c='r', marker='*', s=50, label='Green')
plt.legend(['Saffron', 'White', 'Green'], loc='best')
plt.xlabel('a* Component (Lab Color Space)')  ## Provide x label
plt.ylabel('b* Component (Lab Color Space)') ## Provide y label
plt.title('a* vs b* Components in Lab Color Space for Indian Flag Colors') ## Provide title
plt.grid()
plt.show()

##------------------------------------------------------------
# Step 8: Perform LDA analysis using LinearDiscriminantAnalysis() and lda.fit()
X = np.vstack([saffron_lab[:, 1:3], white_lab[:, 1:3], green_lab[:, 1:3]])
lda = LinearDiscriminantAnalysis()
lda.fit(X, y)
X_lda = lda.transform(X)
##-----------------------------------------------------------



In [6]:
# Plot LDA boundaries
plt.figure()
plt.scatter(a_features[:Number_of_points], b_features[:Number_of_points], c='b', marker='o', s=50, label='Saffron')
plt.scatter(a_features[Number_of_points:2*Number_of_points], b_features[Number_of_points:2*Number_of_points], c='g', marker='^', s=50, label='White')
plt.scatter(a_features[2*Number_of_points:], b_features[2*Number_of_points:], c='r', marker='*', s=50, label='Green')

plt.xlabel('First LDA Component')  ## Provide x label
plt.ylabel('LDA Projection of Indian Flag Colors') ## Provide y label
plt.title('LDA boundaries (linear model) for Colors of the Indian Flag')

# Plot the decision boundaries
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
Z = lda.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, colors='k', linewidths=2, linestyles='solid')
plt.legend(loc='best')
plt.grid()
plt.show()

## Report:

## Answer the following questions within your report:


### 1.	What are the key assumptions underlying LDA, and how do these assumptions influence the model's performance?

### 2.	What are the hyperparameters in LDA, and how do they affect the outcome of the model?

### 3.	What methods can be used to assess an LDA model's effectiveness in terms of separation of topics and the coherence of generated topics?

### 4.	What are some common challenges or limitations associated with LDA, and how can they be addressed or mitigated?

### 5. What practical applications does this assignment have in real-world situations, and what benefits does it offer in those specific scenarios?

Q1: Some of the key assumptions underlying LDA are:

1. Linear Separable classes- LDA assumes that different classes (here, saffron, white, green) can be separated by a linear decision boundary. If classes are not linearly separable, LDA may not perform well.
2. Normally Distributed Features (Multivariate Gaussian Distribution)- Each class should have a normal (Gaussian) distribution in feature space. If the data is skewed or non-Gaussian, LDA may fail to properly classify points.
3. Equal Covariance Across Classes (Homogeneity of Variance)- LDA assumes that the variance-covariance structure of different classes is the same.
4. Independence of Features (No Multicollinearity)- Features should not be highly correlated with each other.

Q2: Some of the hyperparameters in LDA are:

1. Components (n_components): Retains up to C-1 discriminant axes(C is no. of classes)
2. Solver (solver): 'svd' is fast; 'lsqr' and 'eigen' handle varying covariance better.
3. Shrinkage (shrinkage): Regularize covariance estimation, prevent overfitting

Q3: To evaluate how well LDA separates data, we use these metrics:

1. Accuracy, Precision, and Recall - Check classification performance using confusion matrices and F1-score.
2. Scatter Plots & Decision Boundaries - Visualizing LDA’s decision regions can help understand separation.
3. Explained Variance Ratio - Measures how much variance each component captures.
4. Cross-validation - Splitting the dataset into training and testing parts helps validate LDA's generalization ability.

Q4: Some of the limitations of LDA are:

1. The classes may not be linearly separable. For mitigating this limitation, we can use kernel LDA or SVMs.
2.  LDA produces at most C-1 projections i.e., it reduces original features dimensionality to at most C-1 dimensions. Thus, if the classification error after LDA is high and more features are needed, some other method must be employed to provide those additional features.
3. Assumption of Normality and Equal Covariance - If data isn’t normally distributed, we can use QDA (Quadratic Discriminant Analysis) instead.

Q5: This assignment on LDA has many applications in classification problems, including:

1. Image Recognition (Face & Object Detection) - Used in Fisherfaces for face recognition.
2. Text & Topic Classification - Used in Natural Language Processing (NLP) to classify news articles or detect spam emails.
3. Disease Classification on patients’ data - Classifying diseases as mild, moderate, or severe using various parameters of patient health
4. Radar and Signal Processing - Used to separate noise from real signals in radar systems.
5. Biometric Authentication - Used in fingerprint and iris recognition.