# Clustering Assignment 5

### Q1. What is a contingency matrix, and how is it used to evaluate the performance of a classification model?

A contingency matrix, also known as a confusion matrix, is a table that visualizes the performance of a classification model. It presents a summary of the predicted and actual classes and helps assess the model's accuracy.

### Structure of a Contingency Matrix:

A standard 2x2 contingency matrix looks like this:
![image.png](attachment:43181ba1-bfed-47b1-b82b-379e2de4f16d.png)

### Evaluation of Model Performance:

- **True Positive (TP)**: Instances where the model correctly predicts a positive class.
- **True Negative (TN)**: Instances where the model correctly predicts a negative class.
- **False Positive (FP)**: Instances where the model predicts positive but the actual class is negative (Type I error).
- **False Negative (FN)**: Instances where the model predicts negative but the actual class is positive (Type II error).

### Model Evaluation Metrics Derived from Contingency Matrix:

1. **Accuracy**: Overall correctness of the model (sum of TP and TN divided by total).
2. **Precision**: Proportion of true positive predictions among all positive predictions (TP / (TP + FP)).
3. **Recall (Sensitivity)**: Proportion of actual positives correctly predicted (TP / (TP + FN)).
4. **Specificity**: Proportion of actual negatives correctly predicted (TN / (TN + FP)).
5. **F1 Score**: The harmonic mean of precision and recall, balances precision and recall.
6. **False Positive Rate (FPR)**: The ratio of false positives to actual negatives (FP / (FP + TN)).

Contingency matrices are fundamental in understanding the performance of a classification model by dissecting correct and incorrect predictions, aiding in the calculation of various evaluation metrics.

### Q2. How is a pair confusion matrix different from a regular confusion matrix, and why might it be useful in certain situations?

In [6]:
from sklearn.metrics import confusion_matrix

true_labels = [1, 0, 1, 1, 0, 1]
predicted_labels = [1, 1, 1, 0, 0, 1]

# Creating the confusion matrix
confusion_matrix(true_labels, predicted_labels)

array([[1, 1],
       [1, 3]])

In [7]:
from sklearn.metrics.cluster import pair_confusion_matrix


true_labels = [1, 0, 1, 1, 0, 1]
predicted_labels = [1, 1, 1, 0, 0, 1]

pair_confusion_matrix(true_labels,predicted_labels)

array([[8, 8],
       [8, 6]])

## Let's understand both the matrix:

## Regular Confusion Matrix

The array represents a confusion matrix with a 2x2 shape. In a binary classification setting, a confusion matrix consists of four elements:

- **True Positives (TP)**: Predicted as positive and actually positive.
- **False Positives (FP)**: Predicted as positive but actually negative.
- **True Negatives (TN)**: Predicted as negative and actually negative.
- **False Negatives (FN)**: Predicted as negative but actually positive.

In this case:

- The top-left element (1) represents the count of True Negatives (TN).
- The top-right element (1) represents the count of False Positives (FP).
- The bottom-left element (1) represents the count of False Negatives (FN).
- The bottom-right element (3) represents the count of True Positives (TP).

This matrix suggests the following classification performance:

- The model correctly predicted the negative class (0 or "not the target class") 1 time.
- The model incorrectly predicted the positive class (1 or "the target class") 1 time while it was not.
- The model incorrectly predicted the negative class 1 time when it was actually the positive class.
- The model correctly predicted the positive class 3 times.

## Pair Confusion Matrix

- **Top-Left (Element at [0, 0]):** This value (8) represents the count of true negatives (TN), instances where the model correctly predicts the first class (let's call it class A) when the true label is not class A.

- **Top-Right (Element at [0, 1]):** This value (8) represents the count of false positives (FP), instances where the model predicts class A but the true label is not class A.

- **Bottom-Left (Element at [1, 0]):** This value (8) represents the count of false negatives (FN), instances where the model predicts a non-class A when the true label is class A.

- **Bottom-Right (Element at [1, 1]):** This value (6) represents the count of true positives (TP), instances where the model correctly predicts class A.

In this context, assuming class A is represented by the row/column index 0 and the other class by index 1, the pair confusion matrix indicates that:

- The model correctly identified the non-class A instances 8 times.
- The model correctly identified class A instances 6 times.
- The model misclassified 8 non-class A instances as class A.
- The model misclassified 8 class A instances as non-class A.


### Q3. What is an extrinsic measure in the context of natural language processing, and how is it typically used to evaluate the performance of language models?

Extrinsic measures evaluate a model’s performance based on its performance in real-world applications or specific tasks rather than assessing the model in isolation. These measures are task-specific and consider the model's performance in an end-use case scenario, which often involves complex NLP tasks like sentiment analysis, machine translation, named entity recognition, question answering, text summarization, and more.

These measures rely on task-specific evaluation criteria and metrics. For instance:

1. **Accuracy:** Measures how many instances the model correctly predicts in classification tasks.
2. **Precision and Recall:** Commonly used in information retrieval tasks, precision is the fraction of retrieved instances that are relevant, while recall measures the fraction of relevant instances that are retrieved.
3. **F1 Score:** Harmonic mean of precision and recall, often used when both precision and recall are important for a task.


### Q4. What is an intrinsic measure in the context of machine learning, and how does it differ from an extrinsic measure?

In the world of machine learning:

### Intrinsic Measures:
check how good a model is by looking at its own performance internally. It's like checking if a car's engine runs smoothly without actually driving the car.

Common intrinsic measures in various machine learning tasks include metrics like:

* Silhouette Score in Clustering: Measures how well-separated clusters are and how similar the samples are within the same cluster compared to others.
* Davies-Bouldin Index in Clustering: Measures the average "similarity" between each cluster and its most similar one, where lower values indicate better clustering.
* Mean Squared Error (MSE) in Regression: Measures the average squared differences between predicted and actual values.

### Extrinsic Measures:
test how well a model solves real tasks, like predicting if an email is spam or not. It's like actually driving the car to see how well it handles on the road.

So, intrinsic measures focus on the model's own performance, while extrinsic measures see how well the model performs in real tasks or applications.

### Q5. What is the purpose of a confusion matrix in machine learning, and how can it be used to identify strengths and weaknesses of a model?

The confusion matrix in machine learning is used to:

- **Purpose:** Summarize the performance of a classification model by presenting the count of true positives, true negatives, false positives, and false negatives.

- **Identifying Strengths and Weaknesses:** It helps in understanding a model's performance by highlighting its strengths and weaknesses. For instance:
    - **Strengths:** High counts in the true positive and true negative cells indicate the model's ability to correctly classify instances.
    - **Weaknesses:** High false positive and false negative counts reveal areas where the model is making mistakes, such as misclassifying instances.

In short, the confusion matrix provides a clear snapshot of a model's performance, allowing for an easy identification of its strengths and weaknesses in classification tasks.

### Q6. What are some common intrinsic measures used to evaluate the performance of unsupervised learning algorithms, and how can they be interpreted?


1. **Silhouette Score:** The silhouette score measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It calculates the average silhouette of all samples. The silhouette score ranges from -1 to 1. A score close to +1 indicates that the sample is well-clustered and lies far from neighboring clusters. A score near 0 implies overlapping clusters, where it's on the boundary. A negative score suggests that samples might be assigned to the wrong cluster.

2. **Davies-Bouldin Index:** The Davies-Bouldin index measures the average similarity between each cluster and its most similar one. It considers both the size and dispersion of clusters. Lower index values indicate better clustering. A value closer to 0 represents good separation and distinct clusters.


### Q7. What are some limitations of using accuracy as a sole evaluation metric for classification tasks, and how can these limitations be addressed?

| **Limitations of Accuracy** | **Addressing these Limitations** |
|----------------------------|----------------------------------|
| Imbalanced Datasets         | - Precision and Recall <br> - F1 Score <br> - Confusion Matrix Analysis |
| Ignoring Class Distribution  | - Precision and Recall <br> - F1 Score <br> - ROC Curve and AUC |
| Equal Cost Fallacy           | - Cost-Sensitive Evaluation |


## The End