### Instructions:
* You need to code in this jupyter notebook only.
* Download this notebook and import in your jupyter lab.
* You need to write a partial code for step 0 to step 8 mentioned with prefix ##
* Fill the blanks where it is instructed in comments. 
* Leave other codes, structure as it is.
* Follow all the instructions commented in a cells.



**Answer the questions given at the end of this notebook within your report.**

**Upload this jupyter notebook after completion with your partial code and the report in one file in PDF format.**

**Also upload the resulting image showing all the selected points and boundary line between them after LDA analysis.**

**Your file name should be yourname_lab4.pdf. Upload it before the due time.**

In [1]:
import numpy as np ## import numpy
import cv2 ## import opencv
import matplotlib ## import matplotlib
import matplotlib.pyplot as plt ## import matplotlib pyplot
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis   ## from sklearn import LDA analysis
matplotlib.use('TkAgg')

##---------------------------------------------------
## Step 0: Install all other dependencies that occur at run time if  any module not found.
##---------------------------------------------------

In [2]:
Number_of_points = 25  ## Number of points you want select from each strip. Recommended >= 20 

img = cv2.imread('Indian_Flag.jpg') ## Read the given image

def select_points(img, title):
    fig, ax = plt.subplots()
    #------------------------------------------
    ## step 1: Convert the img from BGR to RGB using cv2 and display it using cv2.imshow
    RGB_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax.imshow(RGB_img)
    ## step 2: Put title of the image
    ax.set_title(title)
    ##-----------------------------------------
    
    # Set the cursor style to a plus sign
    fig.canvas.manager.set_window_title('Select Points')
    cursor = matplotlib.widgets.Cursor(ax, useblit=True, color='red', linewidth=1)
    plt.show(block=False)  # Show the image without blocking

    k = 0
    points = [] ## Create here an empty list to store points 

    while k < Number_of_points:
        xy = plt.ginput(1, timeout=0)  # Non-blocking input
        if len(xy) > 0:
            col, row = map(int, xy[0])  # Convert to integer
            ##-----------------------------------------------
            ## Step 3: Collect RGB values at the clicked positions (col, row) and print it. 
            RGB_val = RGB_img[row][col]
            print(RGB_val)
            ##-----------------------------------------------

            k += 1
            points.append([row, col, img[row, col]])  # Store RGB values in empty list points.
            
            # Display colored dot on the image
            plt.scatter(col, row, c='black', marker='o', s=10)

            # Redraw the image to include the dot
            plt.draw()

    plt.close()  # Close the window after all points are collected
    return points ## Fill this blank

In [5]:
##-----------------------------------------------------------------
## Step4: fill the blanks for Selected points from saffron strip
pts_saffron = select_points(img, 'Saffron')
## Step5: fill the blanks for Selected points from white strip)
pts_white = select_points(img, 'White')
## Step6: fill the blanks for Selected points from green strip
pts_green = select_points(img, 'Green')
##-----------------------------------------------------------------

[248  88  26]
[250  96  26]
[252  91  23]
[252  91  23]
[250  89  21]
[246  79   8]
[246  82  11]
[248  81  10]
[251  84  14]
[242  78  17]
[243  85  12]
[246  89  18]
[254  93  23]
[246  88  23]
[249  92  21]
[248  91  20]
[243  70   1]
[249  82  14]
[245  78   7]
[238  70   0]
[239  81  20]
[252  91  13]
[248  82   8]
[245  86  20]
[249  88  18]
[211 211 219]
[218 209 226]
[215 224 233]
[213 222 219]
[224 221 232]
[221 220 226]
[220 218 231]
[224 221 230]
[226 225 243]
[219 217 220]
[211 208 227]
[222 219 238]
[219 220 240]
[233 232 246]
[226 227 245]
[226 227 245]
[224 212 234]
[224 212 234]
[222 223 228]
[227 224 231]
[227 227 235]
[226 218 242]
[214 212 226]
[221 218 237]
[222 220 234]
[ 41 103  78]
[34 98 72]
[31 99 74]
[28 98 70]
[ 30 100  72]
[ 30 103  74]
[ 38 106  81]
[ 33 103  77]
[27 91 64]
[27 98 68]
[33 98 74]
[ 29 102  75]
[26 96 68]
[29 99 73]
[33 96 67]
[30 95 65]
[ 29 100  70]
[22 92 66]
[31 96 72]
[30 96 69]
[28 95 64]
[29 93 66]
[32 98 71]
[ 34 103  75]
[27 93 66]


In [7]:
# Convert RGB values to Lab color space
def rgb_to_lab(rgb):
    return cv2.cvtColor(np.uint8([[rgb]]), cv2.COLOR_RGB2Lab)[0][0]

saffron_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_saffron])
white_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_white])
green_lab = np.array([rgb_to_lab(rgb) for _, _, rgb in pts_green])

## Step7: Extract a* and b* components from Lab color space
a_features = np.hstack((saffron_lab[:, 1], white_lab[:, 1], green_lab[:, 1]))
b_features = np.hstack((saffron_lab[:, 2], white_lab[:, 2], green_lab[:, 2]))

In [None]:
# Map class labels to numeric values
class_mapping = {'Saffron': 0, 'White': 1, 'Green': 2}
y = np.array([class_mapping[label] for label in ['Saffron'] * Number_of_points + ['White'] * Number_of_points + ['Green'] * Number_of_points])

plt.figure()
plt.scatter(a_features[:Number_of_points], b_features[:Number_of_points], c='b', marker='o', s=50, label='Saffron')
plt.scatter(a_features[Number_of_points:2*Number_of_points], b_features[Number_of_points:2*Number_of_points], c='g', marker='^', s=50, label='White')
plt.scatter(a_features[2*Number_of_points:], b_features[2*Number_of_points:], c='r', marker='*', s=50, label='Green')
plt.legend(['Saffron', 'White', 'Green'], loc='best')
plt.xlabel('Red-Green (a*)')  ## Provide x label
plt.ylabel('Blue-Yellow (b*)') ## Provide y label
plt.title('Colour Distribution') ## Provide title
plt.grid()
plt.show()
plt.savefig('Plotted_Points.png', dpi=300, bbox_inches='tight')

##------------------------------------------------------------
# Step 8: Perform LDA analysis using LinearDiscriminantAnalysis() and lda.fit()
lda = LinearDiscriminantAnalysis()
X = np.column_stack((a_features, b_features))
lda.fit(X, y)  

##-----------------------------------------------------------

In [None]:
# Plot LDA boundaries
plt.figure()
plt.scatter(a_features[:Number_of_points], b_features[:Number_of_points], c='b', marker='o', s=50, label='Saffron')
plt.scatter(a_features[Number_of_points:2*Number_of_points], b_features[Number_of_points:2*Number_of_points], c='g', marker='^', s=50, label='White')
plt.scatter(a_features[2*Number_of_points:], b_features[2*Number_of_points:], c='r', marker='*', s=50, label='Green')

plt.xlabel('Red-Green (a*)')  ## Provide x label
plt.ylabel('Blue-Yellow (b*)') ## Provide y label
plt.title('LDA boundaries (linear model) for Colors of the Indian Flag')

# Plot the decision boundaries
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100), np.linspace(ylim[0], ylim[1], 100))
Z = lda.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, colors='k', linewidths=2, linestyles='solid')
plt.legend(loc='best')
plt.grid()
plt.show()

plt.savefig('Plotted_LDA_Boundaries', dpi=300, bbox_inches='tight')

## Report:

## Answer the following questions within your report:


### 1.	What are the key assumptions underlying LDA, and how do these assumptions influence the model's performance?
Ans1: Assumptions of LDA are **Normality, Homoscedasticity, Linearity and Independence**. LDA assumes that the data with each class should follow a normal/gaussian distribution (Normality), that covariance matrices of different classes are equal (Homoscedasticity) and that the relationship between target classes and predictor variables is linear ie, a linear decision boundary should be sufficient to separate different classes (Linearity). Furthermore, LDA assumes features are not highly correlated ie no multicollinearity (Independence of features). Violation of Normality/Linearity **affects the model's ability to accurately separate** skewed/multiodal/non-linear classes, multicollinearity makes the within class matrix become singular, making it difficult to calculate the inverse of it and violation of homoscedasticity can lead to non-linear decision boundaries as it affects Mahalanobis distance. 

### 2.	What are the hyperparameters in LDA, and how do they affect the outcome of the model?
Some of the key parameters in LDA are **solvers, shrinkage and number of components**. Different solvers like svd, lsqr and eigen are offered by LDA implementations. The effect that they have on the outcome of the model is in terms of computational speed and numerical stability for some cases. Shrinkage (only lsqr and eigen) regularises the covariance matrices by adding a small value to their diagonals. This affects outcomes by improving the stability of covariance matrices, which can prevent overfitting and generalization performance overall. Number of components determines the number of new dimensions the LDA will create and directly influences model performance. If n is too small then you may lose out on important information and otherwise if it is too large then it might lead to overfitting and you lose out on the potential to reduce dimensions. Other paramters include priors(specifies prior probabilties for each class) and tolerance (stopping criteria). 

### 3.	What methods can be used to assess an LDA model's effectiveness in terms of separation of topics and the coherence of generated topics?
There are various methods that can be used to asses and LDA model's effectiveness in terms of separation of topics. **Scatter plots** of LDA components can be visualised to show how effectively LDA separates different classes. **Explained Vraiance ratio**, indicating how much discriminative information is retained by each component and **mahalanobis dist.** which measures degree of class separation are other useful measures. Classification performance in general can be measured through **accuracy, precision, recall, ROC Curves and AUC curves**, etc. As for coherence of generated topics, metrics assessing semantic similarity and co-occurence of words like *C_v, UMass or C_uci* can be used. 

### 4.	What are some common challenges or limitations associated with LDA, and how can they be addressed or mitigated?
LDA has several limitations. Since it relies on mean and covariances estimates, it is very **sensitive to outliers** and is even not the best at dealing with **imbalanced data** and can lead to biased classification for dominating classes. Further, LDA can also only reduce feature space to **at most C-1 dimensions** leading to loss of some high-dimensional data. Thirdly, it **cannot separate non-linear data**, making it inefficient when dealing with complex patterns in data. 

How can they be addressed:
We may deal with outliers before hand by applying methods like z-score filtering, IQR methods or even Regularised LDA. For more dimensions, we can use CA before LDA to retain variance and we may also apply kernels to extend LDA to higher dimensional spaces for the same. In this way, Kernels can also help non-linear classes become separable. We can address dominant classes by adjusting priors as well. 

### 5. What practical applications does this assignment have in real-world situations, and what benefits does it offer in those specific scenarios?
This assignment on color classification has a lot of real-world uses. It helps industries check for defects in fabrics, determine if fruit is ripe, and improve image segmentation for medical scans and self-driving cars. In dermatology, it’s used to track skin conditions over time, and in environmental monitoring, it helps analyze satellite images to detect pollution and land changes. The main benefits are **better accuracy, automation, and efficiency—cutting down human error, making processes faster, and improving decision-making in different fields**.