<strong>Problem formulation:</strong>
<p>**The aim of the project**: Let consider a simple image in which the color of some points was deleted (now the color of this points is white). We aim to restore the initial image using the points we still have colored in it. </p>
<p>**Tehnical aim of the project**: Compare the performance of various classifiers in solving the given problem for both linearly separable and non-linearly separable images. Conduct model selection, involving the choice of model parameters, using different methods, and thoroughly discuss the results. Perform a sensitivity study to highlight the impact of the training/testing ratio and the input dataset. </p>  
<p> **Detailed description of the requirements**: 
    <ol>
    <li>Imagine a simple image in two or more colors of your choice. Then use the Paint application to make an image that contains only some of the points from the original image. Use Paint's pencil tool, not the brush or other tools, to get the image. The colored pixels in the rendered image will represent the data set. Based on these, you need to predict the colors of all the pixels in the image.</li>
    <li>You have to use at least three classifiers: one in the categories a), one in the category b)  and one in the categoty c) given below. Optionally, you can use more classifiers.   
        <p> Classifiers to be used:    
            a) **Bayes classifiers** and **Decision tree classifiers**   
            b) **ANN** and **SVM**,  
            c) **Ensemble methods**  d) ** Other classifiers** (optional). </p> 
               </li>
   <li> You should be able to explain all details related to the  chosen classifiers  </li>
     <li>You should be able to explain all details related to tasks implementation   </li>
      <li>You should provide interpretation and make comparisons of the results.</li>
        </ol>
</p>
<strong>Example:</strong>
<img src="data.png" width="256" height="256"/>
<center><strong>Figure 1</strong></center>
 <p>You can consider that the points from Figure 1 come from an initial (known) image like that given in Figure 2  </p>
    <img src="InitialImage.png" />
    <center><strong>Figure 2</strong> </center>
<p>For more examples of images and a better understanding of the problem formulation, please see:
    <a href="http://playground.tensorflow.org">http://playground.tensorflow.org</a> </p>
and the Figure 3 (at the end of this notebook).
<p> The tasks to be performed are presented below. They can be followed successively as shown, or you can design your own algorithm that encapsulates all or part of all steps (for example, you can design an algorithm that creates a list of images to be used, a list of classifiers to be used, so that a task can be performed directly and more easily using all classifiers and all images, Or you can create a OOP based algorithm to solve generical the problem for any classifier and any image). It is important that all of the tasks listed below are completed. You are welcome to undertake additional tasks of your choice to enhance the comparisons of the results.</p>

<strong>Task to be completed:</strong>

<strong>1.</strong> Generate 2 images using Paint, sIt is important that all of the tasks listed below are completed. You are welcome to undertake additional tasks of your choice to enhance the comparisons of the results.milar to Figure 1, containing sets of points of two or more colors - minimum 200 points (as explained in **Detailed description of the requirements**), such that in the first image the set of points should be linearly separable and in the second image, the set of points should be non-linearly separable. Save the images as "data1.png" and "data2.png".

<strong>2.</strong> Create two datasets based on the 2 images from step 1, using the code give below. Analyze and comment this code.

In [1]:
from PIL import Image
import numpy as np

def rgb_to_int(r,g,b):
    
    return (r<<16) + (g<<8) + b

def read_data(filename):
    x = []
    y = []
    back_color = rgb_to_int(255,255,255)
    
    image = Image.open(filename)
    width,height = image.size
    pixels = image.load()
 
    for i in range(width):
        for j in range(height):
            r,g,b = pixels[i,j]
            color = rgb_to_int(r,g,b)
            
            if (color != back_color):
                x.append([i,j])
                y.append(color)
    return x,y

In [2]:
# Create the datasets here
data1 = read_data('data1.png')
data2 = read_data('data2.png')

<strong>3.</strong> Split the first dataset into a training set and a test set  (using 70% for training and 30% for validation)

In [3]:
import numpy as np
from sklearn.model_selection import train_test_split

# Assuming data1 contains two lists: x (coordinates) and y (color values)
x_data1, y_data1 = data1

# Convert lists to numpy arrays for easier manipulation
x_data1 = np.array(x_data1)
y_data1 = np.array(y_data1)

# Split the data into 70% training and 30% testing
x_train, x_test, y_train, y_test = train_test_split(x_data1, y_data1, test_size=0.3, random_state=42)

# x_train and y_train are the training set
# x_test and y_test are the test set

<strong>4.</strong> Choose a classifier in category a) and train it on the training set generated in step 3.

In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Initialize the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)

# Train the classifier on the training data
clf.fit(x_train, y_train)

<strong>5.</strong> Use the classifier trained in step 4 to make predictions on the test set generated in step 3.

In [5]:
# Predict on the test set
y_pred = clf.predict(x_test)

<strong>6.</strong> Compute the accuracy of the classifier on the test set generated in step 3 and then, on the train test. Discuss the results.

In [6]:
# Compute accuracy on the test set
test_accuracy = accuracy_score(y_test, y_pred)

# Compute accuracy on the training set
y_train_pred = clf.predict(x_train)
train_accuracy = accuracy_score(y_train, y_train_pred)

# Display the results
print(f"Test Accuracy: {test_accuracy:.2f}")
print(f"Train Accuracy: {train_accuracy:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


<strong>7.</strong> Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [7]:
from sklearn.metrics import precision_score, recall_score

# Calculate precision
precision = precision_score(y_test, y_pred, average='weighted')

# Calculate recall
recall = recall_score(y_test, y_pred, average='weighted')

# Display results
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Precision: 1.00
Recall: 1.00


<strong>8.</strong> Predict the color for all the pixels of the first image and save the predicted colors to a new image using the code below (Partial code given. Must be completed). Be able to explain the code below. 

In [8]:
image = Image.open('data1.png')
width, height = image.size

def generate_pixel_coordinates():
    points = []
    for i in range (width):
        for j in range(height):
            points.append([i,j])
            
    return points
        
def getRGBfromI(RGBint):#convert int color code to rgb color code
    blue =  RGBint & 255
    green = (RGBint >> 8) & 255
    red =   (RGBint >> 16) & 255
    return red, green, blue

def save_data(pixels, colors, output_filename):
    
    im = Image.new("RGB", (width, height))
    pix = im.load()
    for i in range(len(pixels)):
             pix[pixels[i][0],pixels[i][1]] = getRGBfromI(colors[i])

    im.save(output_filename, "PNG")    

In [9]:
# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = clf.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'dt_predicted_image1.png')

print("The predicted image has been saved as 'dt_predicted_image1.png'.")

The predicted image has been saved as 'dt_predicted_image1.png'.


<div style="text-align: center;">
    <img src="dt_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>dt_predicted_image1.png</i>
</div>

<strong>9.</strong> Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained in the Step 6.

In [10]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(clf, x_data1, y_data1, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.60992218 0.82667965 0.52872444]
Mean accuracy: 0.66

Results for k=5:
Cross-validation accuracies: [0.54781199 0.85575365 0.81980519 0.86038961 0.5275974 ]
Mean accuracy: 0.72

Results for k=10:
Cross-validation accuracies: [0.69255663 0.89644013 1.         1.         0.82792208 0.7987013
 1.         1.         0.7987013  0.79220779]
Mean accuracy: 0.88



<strong>10.</strong> Repeat steps 3-9 for the second and third classifier.

# SVM

Choose a classifier in category b) and train it on the training set generated in step 3.

In [11]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC()

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier has been successfully trained.")

SVM Classifier has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 3.

In [12]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 3 and then, on the train test. Discuss the results.

In [13]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [14]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 1.00
Recall: 1.00


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [15]:
# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_predicted_image1.png')

print("The predicted image has been saved as 'svm_predicted_image1.png'.")

The predicted image has been saved as 'svm_predicted_image1.png'.


<div style="text-align: center;">
    <img src="svm_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_predicted_image1.png</i>
</div>

Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained for SVM.

In [16]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(svm_classifier, x_data1, y_data1, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.77723735 1.         0.69522882]
Mean accuracy: 0.82

Results for k=5:
Cross-validation accuracies: [0.73743922 1.         1.         1.         0.98051948]
Mean accuracy: 0.94

Results for k=10:
Cross-validation accuracies: [0.83495146 1.         1.         1.         1.         1.
 1.         1.         1.         1.        ]
Mean accuracy: 0.98



# Random forest

Choose a classifier in category c) and train it on the training set generated in step 3.

In [17]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier on the training data
rf_classifier.fit(x_train, y_train)

print("Random Forest Classifier has been successfully trained.")

Random Forest Classifier has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 3.

In [18]:
# Predict the labels for the test data
y_pred_rf = rf_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 3 and then, on the train test. Discuss the results.

In [19]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_rf = accuracy_score(y_test, y_pred_rf)
print(f"Test Accuracy: {test_accuracy_rf:.2f}")

# Predict on the training set
y_train_pred_rf = rf_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_rf = accuracy_score(y_train, y_train_pred_rf)
print(f"Train Accuracy: {train_accuracy_rf:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [20]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_rf = precision_score(y_test, y_pred_rf, average='weighted')

# Compute recall
recall_rf = recall_score(y_test, y_pred_rf, average='weighted')

# Display the results
print(f"Precision: {precision_rf:.2f}")
print(f"Recall: {recall_rf:.2f}")

Precision: 1.00
Recall: 1.00


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [21]:
# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = rf_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'rf_predicted_image1.png')

print("The predicted image has been saved as 'rf_predicted_image1.png'.")

The predicted image has been saved as 'rf_predicted_image1.png'.


<div style="text-align: center;">
    <img src="rf_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>rf_predicted_image1.png</i>
</div>

Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained for Random Forest.

In [22]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(rf_classifier, x_data1, y_data1, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.60992218 0.82667965 0.52872444]
Mean accuracy: 0.66

Results for k=5:
Cross-validation accuracies: [0.54781199 0.85575365 0.83928571 0.86038961 0.5275974 ]
Mean accuracy: 0.73

Results for k=10:
Cross-validation accuracies: [0.62783172 0.96116505 1.         1.         0.82792208 0.97727273
 1.         1.         0.83441558 0.80519481]
Mean accuracy: 0.90



<strong>11.</strong> From scikit-learn study the documentation for the second classifier chosen, select two representative hyperparameters and repeat at least 2 times steps 4-8 for different values of these hyperparameters. Report results for all runs and compare them.

# C=0.01, kernel='linear'

Choose a classifier in category b) and train it on the training set generated in step 3.

In [23]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC(C=0.01, kernel='linear')

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier with C=0.01 and kernel='linear' has been successfully trained.")

SVM Classifier with C=0.01 and kernel='linear' has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 3.

In [24]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 3 and then, on the train test. Discuss the results.

In [25]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [26]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 1.00
Recall: 1.00


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [27]:
# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_hyperparameters_config1_predicted_image1.png')

print("The predicted image has been saved as 'svm_hyperparameters_config1_predicted_image1.png'.")

The predicted image has been saved as 'svm_hyperparameters_config1_predicted_image1.png'.


<div style="text-align: center;">
    <img src="svm_hyperparameters_config1_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_hyperparameters_config1_predicted_image1.png</i>
</div>

# C=100.0, kernel='rbf'

Choose a classifier in category b) and train it on the training set generated in step 3.

In [28]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC(C=100.0, kernel='rbf')

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier with C=100.0 and kernel='rbf' has been successfully trained.")

SVM Classifier with C=100.0 and kernel='rbf' has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 3.

In [29]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 3 and then, on the train test. Discuss the results.

In [30]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [31]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 1.00
Recall: 1.00


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [32]:
# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_hyperparameters_config2_predicted_image1.png')

print("The predicted image has been saved as 'svm_hyperparameters_config2_predicted_image1.png'.")

The predicted image has been saved as 'svm_hyperparameters_config2_predicted_image1.png'.


<div style="text-align: center;">
    <img src="svm_hyperparameters_config2_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_hyperparameters_config2_predicted_image1.png</i>
</div>

## Comparison 

<div style="display: flex; gap: 10px;">
  <div style="text-align: center;">
    <img src="svm_predicted_image1.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>Default SVM</i>
  </div>
  <div style="text-align: center; margin-left: 100px;">
    <img src="svm_hyperparameters_config1_predicted_image1.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>SVM with C=0.01 and kernel='linear'</i>
  </div>
  <div style="text-align: center; margin-left: 100px;">
    <img src="svm_hyperparameters_config2_predicted_image1.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>SVM with C=100.0 and kernel='rbf'</i>
  </div>
</div>

<strong>12.</strong>Use grid search cross validation for optimizing the hyperparameters of the classifiers (for model selection). Report the optimal parameters given by the search.  Predict the color for all the pixels of the first image using the model with optimal parameters. 

# Decision Tree Classifier - Optimizing Hyperparameters

In [33]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 10, 20],
    'min_samples_leaf': [1, 5, 10]
}

# Initialize the Decision Tree classifier
dt = DecisionTreeClassifier(random_state=42)

# Initialize the GridSearchCV object
grid_search_dt = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, verbose=0)

# Fit the grid search to the data
grid_search_dt.fit(x_train, y_train)

# Best parameter set found
print("Best parameters found for the decision tree clasifier: ", grid_search_dt.best_params_)

# Best estimator
best_dt_classifier = grid_search_dt.best_estimator_

# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Assuming functions to generate pixel coordinates and save data
all_pixels = generate_pixel_coordinates()  # Assume this function exists

# Predict the color for each pixel using the best classifier
predicted_colors_dt = best_dt_classifier.predict(all_pixels)

# Save or display the image from predicted colors
save_data(all_pixels, predicted_colors_dt, 'optimized_dt_predicted_image1.png')  # Assume this function exists

print("The predicted image has been saved as 'optimized_dt_predicted_image1.png'.")

Best parameters found for the decision tree clasifier:  {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2}
The predicted image has been saved as 'optimized_dt_predicted_image1.png'.


<div style="text-align: center;">
    <img src="optimized_dt_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_dt_predicted_image1.png</i>
</div>

# SVM Classifier - Optimizing Hyperparameters

In [34]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}

# Initialize the SVM classifier
svm = SVC()

# Initialize the GridSearchCV object
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, verbose=0)

# Fit the grid search to the data
grid_search.fit(x_train, y_train)

# Best parameter set found:
print("Best parameters found for the SVM classifier: ", grid_search.best_params_)

# Best estimator:
best_classifier = grid_search.best_estimator_

# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Function to generate all pixel coordinates for the image
all_pixels = generate_pixel_coordinates()  # This function was assumed to be defined earlier

# Predict the color for each pixel using the best classifier
predicted_colors = best_classifier.predict(all_pixels)

# Function to save data was assumed defined earlier
save_data(all_pixels, predicted_colors, 'optimized_svm_predicted_image1.png')

print("The predicted image has been saved as 'optimized_svm_predicted_image1.png'.")

Best parameters found for the SVM classifier:  {'C': 0.1, 'gamma': 'scale', 'kernel': 'linear'}
The predicted image has been saved as 'optimized_svm_predicted_image1.png'.


<div style="text-align: center;">
    <img src="optimized_svm_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_svm_predicted_image1.png</i>
</div>

# Random Forest Classifier - Optimizing Hyperparameters

In [35]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid_rf_too_much = {
    'n_estimators': [100, 200, 300],  # Number of trees
    'max_depth': [None, 10, 20, 30],  # Maximum depth of each tree
    'min_samples_split': [2, 5, 10],  # Minimum number of samples required to split an internal node
    'min_samples_leaf': [1, 2, 4]     # Minimum number of samples required to be at a leaf node
}

param_grid_rf = {
    'n_estimators': [100, 200],  # Reduced from [100, 200, 300]
    'max_depth': [None, 20],  # Simplified choices
    'min_samples_split': [2, 10],  # Only extremes
    'min_samples_leaf': [1, 4]  # Only extremes
}

# Initialize the RandomForest classifier
rf = RandomForestClassifier(random_state=42)

# Initialize the GridSearchCV object
grid_search_rf = GridSearchCV(estimator=rf, param_grid=param_grid_rf, cv=5, verbose=0)

# Fit the grid search to the data
grid_search_rf.fit(x_train, y_train)

# Best parameter set found
print("Best parameters found for the random forest classifier: ", grid_search_rf.best_params_)

# Best estimator
best_rf_classifier = grid_search_rf.best_estimator_

# Load the original image
image = Image.open('data1.png')
width, height = image.size

# Assuming functions to generate pixel coordinates and save data exist
all_pixels = generate_pixel_coordinates()  # Assume this function exists

# Predict the color for each pixel using the best classifier
predicted_colors_rf = best_rf_classifier.predict(all_pixels)

# Save or display the image from predicted colors
save_data(all_pixels, predicted_colors_rf, 'optimized_rf_predicted_image1.png')  # Assume this function exists

print("The predicted image has been saved as 'optimized_rf_predicted_image1.png'.")

Best parameters found for the random forest classifier:  {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
The predicted image has been saved as 'optimized_rf_predicted_image1.png'.


<div style="text-align: center;">
    <img src="optimized_rf_predicted_image1.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_rf_predicted_image1.png</i>
</div>

<strong>13.</strong> Repeat all the steps from above for the second image (using the same classifiers as for the first image). Compare the results obtained with the same classifier for the linear and non-linear cases, respectively.

<strong>13.3.</strong> Split the first dataset into a training set and a test set (using 70% for training and 30% for validation)

In [36]:
import numpy as np
from sklearn.model_selection import train_test_split

# Assuming data1 contains two lists: x (coordinates) and y (color values)
x_data2, y_data2 = data2

# Convert lists to numpy arrays for easier manipulation
x_data2 = np.array(x_data2)
y_data2 = np.array(y_data2)

# Split the data into 70% training and 30% testing
x_train, x_test, y_train, y_test = train_test_split(x_data2, y_data2, test_size=0.3, random_state=42)

# x_train and y_train are the training set
# x_test and y_test are the test set

<strong>13.4.</strong> Choose a classifier in category a) and train it on the training set generated in step 13.3.

In [37]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Initialize the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)

# Train the classifier on the training data
clf.fit(x_train, y_train)

<strong>13.5.</strong> Use the classifier trained in step 13.4 to make predictions on the test set generated in step 13.3.

In [38]:
# Predict on the test set
y_pred = clf.predict(x_test)

<strong>13.6.</strong> Compute the accuracy of the classifier on the test set generated in step 13.3 and then, on the train test. Discuss the results.

In [39]:
# Compute accuracy on the test set
test_accuracy = accuracy_score(y_test, y_pred)

# Compute accuracy on the training set
y_train_pred = clf.predict(x_train)
train_accuracy = accuracy_score(y_train, y_train_pred)

# Display the results
print(f"Test Accuracy: {test_accuracy:.2f}")
print(f"Train Accuracy: {train_accuracy:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


<strong>13.7.</strong> Compute precision and recall of the classifier on the test set generated in step 13.3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [40]:
from sklearn.metrics import precision_score, recall_score

# Calculate precision
precision = precision_score(y_test, y_pred, average='weighted')

# Calculate recall
recall = recall_score(y_test, y_pred, average='weighted')

# Display results
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Precision: 1.00
Recall: 1.00


<strong>13.8.</strong> Predict the color for all the pixels of the first image and save the predicted colors to a new image using the code below (Partial code given. Must be completed). Be able to explain the code below. 

In [41]:
# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = clf.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'dt_predicted_image2.png')

print("The predicted image has been saved as 'dt_predicted_image2.png'.")

The predicted image has been saved as 'dt_predicted_image2.png'.


<div style="text-align: center;">
    <img src="dt_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>dt_predicted_image2.png</i>
</div>

<strong>13.9.</strong> Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained in the Step 13.6.

In [42]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(clf, x_data2, y_data2, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.61595547 0.73630455 0.58681523]
Mean accuracy: 0.65

Results for k=5:
Cross-validation accuracies: [0.65069552 0.78516229 0.77708978 0.78482972 0.70743034]
Mean accuracy: 0.74

Results for k=10:
Cross-validation accuracies: [0.66666667 0.44444444 0.95665635 0.3869969  0.59752322 1.
 0.82972136 1.         0.65944272 0.54179567]
Mean accuracy: 0.71



<strong>13.10.</strong> Repeat steps 13.3-13.9 for the second and third classifier.

# SVM

Choose a classifier in category b) and train it on the training set generated in step 13.3.

In [43]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC()

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier has been successfully trained.")

SVM Classifier has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 13.3.

In [44]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 13.3 and then, on the train test. Discuss the results.

In [45]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 0.77
Train Accuracy: 0.78


Compute precision and recall of the classifier on the test set generated in step 13.3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [46]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 0.78
Recall: 0.77


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [47]:
# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_predicted_image2.png')

print("The predicted image has been saved as 'svm_predicted_image2.png'.")

The predicted image has been saved as 'svm_predicted_image2.png'.


<div style="text-align: center;">
    <img src="svm_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_predicted_image2.png</i>
</div>

Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained for SVM.

In [48]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(svm_classifier, x_data2, y_data2, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.67625232 0.63324048 0.5821727 ]
Mean accuracy: 0.63

Results for k=5:
Cross-validation accuracies: [0.5935085  0.62751159 0.7120743  0.57739938 0.64396285]
Mean accuracy: 0.63

Results for k=10:
Cross-validation accuracies: [0.53703704 0.72530864 0.73684211 0.55108359 0.55417957 0.73993808
 0.55417957 0.79256966 0.78328173 0.53560372]
Mean accuracy: 0.65



# Random forest

Choose a classifier in category c) and train it on the training set generated in step 13.3.

In [49]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier on the training data
rf_classifier.fit(x_train, y_train)

print("Random Forest Classifier has been successfully trained.")

Random Forest Classifier has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 13.3.

In [50]:
# Predict the labels for the test data
y_pred_rf = rf_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 13.3 and then, on the train test. Discuss the results.

In [51]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_rf = accuracy_score(y_test, y_pred_rf)
print(f"Test Accuracy: {test_accuracy_rf:.2f}")

# Predict on the training set
y_train_pred_rf = rf_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_rf = accuracy_score(y_train, y_train_pred_rf)
print(f"Train Accuracy: {train_accuracy_rf:.2f}")

Test Accuracy: 1.00
Train Accuracy: 1.00


Compute precision and recall of the classifier on the test set generated in step 13.3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [52]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_rf = precision_score(y_test, y_pred_rf, average='weighted')

# Compute recall
recall_rf = recall_score(y_test, y_pred_rf, average='weighted')

# Display the results
print(f"Precision: {precision_rf:.2f}")
print(f"Recall: {recall_rf:.2f}")

Precision: 1.00
Recall: 1.00


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [53]:
# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = rf_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'rf_predicted_image2.png')

print("The predicted image has been saved as 'rf_predicted_image2.png'.")

The predicted image has been saved as 'rf_predicted_image2.png'.


<div style="text-align: center;">
    <img src="rf_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>rf_predicted_image2.png</i>
</div>

Use the k-fold cross-validation, with different values of k, for evaluating the model (e.g. k=3,5,10). Compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them with the accuracy obtained for Random Forest.

In [54]:
from sklearn.model_selection import cross_val_score

# List of different values of k for k-fold cross-validation
k_values = [3, 5, 10]

# Dictionary to store the cross-validation results
cv_results = {}

# Perform k-fold cross-validation for each k value
for k in k_values:
    # Initialize the k-fold cross-validation method
    scores = cross_val_score(rf_classifier, x_data2, y_data2, cv=k)
    
    # Store the scores and calculate the mean accuracy
    cv_results[k] = {
        "scores": scores,
        "mean_accuracy": scores.mean()
    }

    # Print the results for each k
    print(f"Results for k={k}:")
    print(f"Cross-validation accuracies: {scores}")
    print(f"Mean accuracy: {scores.mean():.2f}")
    print()

Results for k=3:
Cross-validation accuracies: [0.54359926 0.71123491 0.58681523]
Mean accuracy: 0.61

Results for k=5:
Cross-validation accuracies: [0.68315301 0.72642968 0.73065015 0.8126935  0.64396285]
Mean accuracy: 0.72

Results for k=10:
Cross-validation accuracies: [0.58024691 0.54320988 0.74922601 0.74303406 0.57894737 0.91021672
 0.73993808 0.88544892 0.75232198 0.54179567]
Mean accuracy: 0.70



<strong>13.11.</strong> From scikit-learn study the documentation for the second classifier chosen, select two representative hyperparameters and repeat at least 2 times steps 13.4-13.8 for different values of these hyperparameters. Report results for all runs and compare them.

# C=0.01, kernel='linear'

Choose a classifier in category b) and train it on the training set generated in step 13.3.

In [55]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC(C=0.01, kernel='linear')

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier with C=0.01 and kernel='linear' has been successfully trained.")

SVM Classifier with C=0.01 and kernel='linear' has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 13.3.

In [56]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 13.3 and then, on the train test. Discuss the results.

In [57]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 0.59
Train Accuracy: 0.59


Compute precision and recall of the classifier on the test set generated in step 13.3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [58]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 0.61
Recall: 0.59


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [59]:
# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_hyperparameters_config1_predicted_image2.png')

print("The predicted image has been saved as 'svm_hyperparameters_config1_predicted_image2.png'.")

The predicted image has been saved as 'svm_hyperparameters_config1_predicted_image2.png'.


<div style="text-align: center;">
    <img src="svm_hyperparameters_config1_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_hyperparameters_config1_predicted_image2.png</i>
</div>

# C=100.0, kernel='rbf'

Choose a classifier in category b) and train it on the training set generated in step 13.3.

In [60]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC(C=100.0, kernel='rbf')

# Train the classifier on the training data
svm_classifier.fit(x_train, y_train)

print("SVM Classifier with C=100.0 and kernel='rbf' has been successfully trained.")

SVM Classifier with C=100.0 and kernel='rbf' has been successfully trained.


Use the classifier trained in the previous step to make predictions on the test set generated in step 13.3.

In [61]:
# Predict the labels for the test data
y_pred_svm = svm_classifier.predict(x_test)

Compute the accuracy of the classifier on the test set generated in step 13.3 and then, on the train test. Discuss the results.

In [62]:
from sklearn.metrics import accuracy_score

# Compute accuracy on the test set
test_accuracy_svm = accuracy_score(y_test, y_pred_svm)
print(f"Test Accuracy: {test_accuracy_svm:.2f}")

# Predict on the training set
y_train_pred_svm = svm_classifier.predict(x_train)

# Compute accuracy on the training set
train_accuracy_svm = accuracy_score(y_train, y_train_pred_svm)
print(f"Train Accuracy: {train_accuracy_svm:.2f}")

Test Accuracy: 0.83
Train Accuracy: 0.83


Compute precision and recall of the classifier on the test set generated in step 13.3 and save to file or display the results. Define (theoretically) precision and recall. Discuss the results.

In [63]:
from sklearn.metrics import precision_score, recall_score

# Compute precision
precision_svm = precision_score(y_test, y_pred_svm, average='weighted')

# Compute recall
recall_svm = recall_score(y_test, y_pred_svm, average='weighted')

# Display the results
print(f"Precision: {precision_svm:.2f}")
print(f"Recall: {recall_svm:.2f}")

Precision: 0.84
Recall: 0.83


Predict the color for all the pixels of the first image and save the predicted colors to a new image. Be able to explain the code below.

In [64]:
# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Generate pixel coordinates for the entire image
all_pixels = generate_pixel_coordinates()

# Predict the color for each pixel using the trained classifier
predicted_colors = svm_classifier.predict(all_pixels)

# Save the predicted colors to a new image
save_data(all_pixels, predicted_colors, 'svm_hyperparameters_config2_predicted_image2.png')

print("The predicted image has been saved as 'svm_hyperparameters_config2_predicted_image2.png'.")

The predicted image has been saved as 'svm_hyperparameters_config2_predicted_image2.png'.


<div style="text-align: center;">
    <img src="svm_hyperparameters_config2_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>svm_hyperparameters_config2_predicted_image2.png</i>
</div>

## Comparison 

<div style="display: flex; gap: 10px;">
  <div style="text-align: center;">
    <img src="svm_predicted_image2.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>Default SVM</i>
  </div>
  <div style="text-align: center; margin-left: 100px;">
    <img src="svm_hyperparameters_config1_predicted_image2.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>SVM with C=0.01 and kernel='linear'</i>
  </div>
  <div style="text-align: center; margin-left: 100px;">
    <img src="svm_hyperparameters_config2_predicted_image2.png" alt="This image is not available" style="height: 300px;">
    <br>
    <i>SVM with C=100.0 and kernel='rbf'</i>
  </div>
</div>

<strong>13.12.</strong> Use grid search cross validation for optimizing the hyperparameters of the classifiers (for model selection). Report the optimal parameters given by the search.  Predict the color for all the pixels of the first image using the model with optimal parameters. 

# Decision Tree Classifier - Optimizing Hyperparameters

In [65]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 10, 20],
    'min_samples_leaf': [1, 5, 10]
}

# Initialize the Decision Tree classifier
dt = DecisionTreeClassifier(random_state=42)

# Initialize the GridSearchCV object
grid_search_dt = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, verbose=0)

# Fit the grid search to the data
grid_search_dt.fit(x_train, y_train)

# Best parameter set found
print("Best parameters found for the decision tree clasifier: ", grid_search_dt.best_params_)

# Best estimator
best_dt_classifier = grid_search_dt.best_estimator_

# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Assuming functions to generate pixel coordinates and save data
all_pixels = generate_pixel_coordinates()  # Assume this function exists

# Predict the color for each pixel using the best classifier
predicted_colors_dt = best_dt_classifier.predict(all_pixels)

# Save or display the image from predicted colors
save_data(all_pixels, predicted_colors_dt, 'optimized_dt_predicted_image2.png')  # Assume this function exists

print("The predicted image has been saved as 'optimized_dt_predicted_image2.png'.")

Best parameters found for the decision tree clasifier:  {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2}
The predicted image has been saved as 'optimized_dt_predicted_image2.png'.


<div style="text-align: center;">
    <img src="optimized_dt_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_dt_predicted_image2.png</i>
</div>

# SVM Classifier - Optimizing Hyperparameters

In [66]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# Define the parameter grid
param_grid = {
    'C': [1, 10],  # Reduced from [0.1, 1, 10, 100]
    'kernel': ['rbf'],  # Focus on the 'rbf' kernel for non-linear separation
    'gamma': ['scale', 'auto']  # Retain these options for comprehensive testing
}

# Initialize the SVM classifier
svm = SVC()

# Initialize the GridSearchCV object
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, verbose=0)

# Fit the grid search to the data
grid_search.fit(x_train, y_train)

# Best parameter set found:
print("Best parameters found for the SVM classifier: ", grid_search.best_params_)

# Best estimator:
best_classifier = grid_search.best_estimator_

# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Function to generate all pixel coordinates for the image
all_pixels = generate_pixel_coordinates()  # This function was assumed to be defined earlier

# Predict the color for each pixel using the best classifier
predicted_colors = best_classifier.predict(all_pixels)

# Function to save data was assumed defined earlier
save_data(all_pixels, predicted_colors, 'optimized_svm_predicted_image2.png')

print("The predicted image has been saved as 'optimized_svm_predicted_image2.png'.")

Best parameters found for the SVM classifier:  {'C': 1, 'gamma': 'auto', 'kernel': 'rbf'}
The predicted image has been saved as 'optimized_svm_predicted_image2.png'.


<div style="text-align: center;">
    <img src="optimized_svm_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_svm_predicted_image2.png</i>
</div>

# Random Forest Classifier - Optimizing Hyperparameters

In [67]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid_rf_too_much = {
    'n_estimators': [100, 200, 300],  # Number of trees
    'max_depth': [None, 10, 20, 30],  # Maximum depth of each tree
    'min_samples_split': [2, 5, 10],  # Minimum number of samples required to split an internal node
    'min_samples_leaf': [1, 2, 4]     # Minimum number of samples required to be at a leaf node
}

param_grid_rf = {
    'n_estimators': [100, 200],  # Reduced from [100, 200, 300]
    'max_depth': [None, 20],  # Simplified choices
    'min_samples_split': [2, 10],  # Only extremes
    'min_samples_leaf': [1, 4]  # Only extremes
}

# Initialize the RandomForest classifier
rf = RandomForestClassifier(random_state=42)

# Initialize the GridSearchCV object
grid_search_rf = GridSearchCV(estimator=rf, param_grid=param_grid_rf, cv=5, verbose=0)

# Fit the grid search to the data
grid_search_rf.fit(x_train, y_train)

# Best parameter set found
print("Best parameters found for the random forest classifier: ", grid_search_rf.best_params_)

# Best estimator
best_rf_classifier = grid_search_rf.best_estimator_

# Load the original image
image = Image.open('data2.png')
width, height = image.size

# Assuming functions to generate pixel coordinates and save data exist
all_pixels = generate_pixel_coordinates()  # Assume this function exists

# Predict the color for each pixel using the best classifier
predicted_colors_rf = best_rf_classifier.predict(all_pixels)

# Save or display the image from predicted colors
save_data(all_pixels, predicted_colors_rf, 'optimized_rf_predicted_image2.png')  # Assume this function exists

print("The predicted image has been saved as 'optimized_rf_predicted_image2.png'.")

Best parameters found for the random forest classifier:  {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
The predicted image has been saved as 'optimized_rf_predicted_image2.png'.


<div style="text-align: center;">
    <img src="optimized_rf_predicted_image2.png" alt="This image is not available" style="height: 300px; display: block; margin: 0 auto;">
    <i>optimized_rf_predicted_image2.png</i>
</div>

<strong>14.</strong>  Compare the results from multiple facets: 
<li> Compare the results obtained for linear and non-linear images with each classifier. </li>
    <li> Compare the results obtained with different classifiers for the same image.</li>
    <li> Try to explain the differences in the performances of various models (classifiers).</li>

<strong>15.</strong> Conduct a sensitivity study on the selection of data (files data1 and data2) from the original images (refer to the first item in the **Detailed Description of the Requirements**), using one of the classifiers employed previously. Explanation: We assume that the pixels in your training sets are derived from a real image that you theoretically know. Adjust the selected pixels, considering the shape of the real image.

Generate two additional files, data12 and data22, and employ one of the three classifiers to predict the color for all pixels in the first and second images. Compare the results obtained (using data1 versus data12 and data2 versus data22). Provide an explanation for the observed results."



 <strong>16.</strong> Conduct a study on the influence of the size of training and testing sets for one of the classifiers of your choice. 

<strong>17.</strong> For the evaluation, you must complete all the steps outlined above, report and compare the results, provide a brief presentation of your chosen classifiers directly in Jupyter Notebook, and be prepared to answer questions related to the presentation. 

<p>For a better understanding of how the results should be represented, refer to Fig. 2.</p>
<img src="Example.PNG" width="1024" height="1024"/>
<center><strong>Figure 3</strong>