## Debugging Code and Troubleshooting

#### Objective:
In this assignment, you will practice debugging a Python script related to a simple Iris clustering project. Your task is identifying and fixing various errors in the provided code, ensuring it runs without issues and produces the expected results.

#### Instructions:

1. Carefully examine the provided Python script, which contains various error types such as syntax, names, types, indexes, attributes, values, keys, import, and/or assertions. The script has been intentionally modified to include these errors for you to practice your debugging skills.

2. Debug the script by identifying and fixing one error at a time. Remember that the script will halt at the first encountered error, so you must resolve each issue sequentially.

3. As you debug the code, take note of the error message displayed and the corresponding line number. Use this information to understand the nature of the error and guide your debugging process. Additionally, you must make comments identifying the lines with the errors and explaining your solutions on the same line.

4. After fixing an error, re-run the script to ensure the issue has been resolved and to uncover any remaining errors. Repeat this process until the script runs without any issues.

5. Once you have successfully debugged the script, analyze the output to ensure the results are as expected. The script should display information about the Iris dataset, visualize the data points, apply K-means clustering, and visualize the clustering results.

6. Document your debugging process by adding comments to the script, explaining the issues you encountered and the solutions you applied.

7. Submit the corrected and annotated Python script as a Jupyter Notebook, summarizing your debugging process and the lessons you learned from this exercise.

Remember, debugging is an essential skill for any AI engineer. This assignment will help you develop your problem-solving abilities and strengthen your understanding of common errors in Python code.

Good luck, and happy debugging!


### Iris Clustering Overview

If you are unfamiliar with the Iris clustering problem, here is an overview of the project.

The Iris clustering project is a simple machine-learning exercise generally used to introduce students to unsupervised learning techniques, specifically clustering algorithms. The project utilizes the famous Iris dataset, which consists of 150 samples of iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. In addition, the dataset includes three classes of iris species: Iris Setosa, Iris Versicolor, and Iris Virginica.

K-means is an unsupervised learning method that aims to partition the dataset into K clusters, where each data point belongs to the cluster with the nearest mean. The algorithm iteratively refines the centroids of the clusters until convergence is reached or a maximum number of iterations is performed. In the iris project, there are 3 different types of flowers therefore, our k = 3, i.e., we split the dataset into 3 clusters.

### **SOLUTION:**

In [None]:
# Import the necessary libraries
import numpy as np # NameError
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets # ImportError
from sklearn.cluster import KMeans

*I debugged the code by changing the name of the library from 'nompy' to 'numpy'.*

In [None]:
# Load the Iris dataset
iris_dataset = datasets.load_iris()
iris_data = iris_dataset.data
iris_target = iris_dataset.target

# Print dataset information for verification
print("Iris dataset shape:", iris_data.shape)
print("Iris target shape:", iris_target.shape)

*There are two errors, Firstly, there is an error related to the load_iris_dataset() function. The error suggests that the function load_iris_dataset() doesn't exist in the datasets module of scikit-learn. To resolve this issue, we should use the correct function load_iris() instead of load_iris_dataset().*

*Secondly, we should remove the parentheses from iris_data.shape() and iris_target.shape() , so that they become iris_data.shape and iris_target.shape. respectively.*

In [None]:
# Visualize the dataset
plt.scatter(iris_data[:, 0], iris_data[:, 1], c=iris_target, cmap='viridis')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.show()

*No Error*

In [None]:
# Apply the K-means clustering algorithm
k_means = KMeans(n_clusters=3, random_state=0)
k_means_predictions = k_means.fit_predict(iris_data)

# Print clustering predictions for verification
print("K-means clustering predictions:", k_means_predictions)

*Furthermore, there is an error in the kMeans instantiation: The class name should be KMeans with a capital 'K'.*

*Moreover, there is an error in the usage of parameters: The n_clusters parameter should be passed as an integer, not a string. Therefore, we need to remove the quotes around '3' and pass 3 as an integer.*

*Lastly, there is an error in the syntax for random_state assignment: The equality check operator '==' is used instead of the assignment operator '=' for random_state. Therefore, we need to change 'random_state==0' to 'random_state=0'.*






In [None]:
# Visualize the clustering results
plt.scatter(iris_data[k_means_predictions == 0, 0], iris_data[k_means_predictions == 0, 1], s=100, c='red', label='Cluster 1')
plt.scatter(iris_data[k_means_predictions == 1, 0], iris_data[k_means_predictions == 1, 1], s=100, c='blue', label='Cluster 2')
plt.scatter(iris_data[k_means_predictions == 2, 0], iris_data[k_means_predictions == 2, 1], s=100, c='green', label='Cluster 3')
plt.scatter(k_means.cluster_centers_[:, 0], k_means.cluster_centers_[:, 1], s=300, c='yellow', label='Centroids')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.legend()
plt.show()

*Firstly, there is an error with the function names. The correct function names for setting the x-axis label and y-axis label are plt.xlabel() and plt.ylabel().*

*Secondly, there is an issue with the indexing of cluster predictions. The cluster predictions should start from index 0, not 1. Therefore, the indexing for scatter plots should be k_means_predictions == 0 for cluster 1, k_means_predictions == 1 for cluster 2, and k_means_predictions == 2 for cluster 3.*