In [None]:
# Q1. What is a contingency matrix, and how is it used to evaluate the performance of a classification model?
# Answer:
# A **contingency matrix** (also known as a confusion matrix) is a table used to summarize the performance of a classification model.
# It compares the predicted labels to the true labels of the dataset. A typical contingency matrix for binary classification consists of four key values:
# - True Positives (TP): The number of correct positive predictions.
# - True Negatives (TN): The number of correct negative predictions.
# - False Positives (FP): The number of incorrect positive predictions.
# - False Negatives (FN): The number of incorrect negative predictions.
# It is used to compute various metrics such as accuracy, precision, recall, and F1-score, which help in evaluating model performance.

# Example of a confusion matrix:
from sklearn.metrics import confusion_matrix

# True labels and predicted labels
y_true = [1, 0, 1, 1, 0, 0, 1, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 0, 1, 1, 1, 0]

# Calculate confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print(conf_matrix)

# Output: Confusion matrix array:
# [[3 1]   # 3 True Negatives, 1 False Positive
#  [1 5]]  # 1 False Negative, 5 True Positives

# Q2. How is a pair confusion matrix different from a regular confusion matrix, and why might it be useful in certain situations?
# Answer:
# A **pair confusion matrix** is used in multi-class classification problems to track the number of pairs of instances from different classes that are misclassified.
# It measures the similarity and dissimilarity between class pairs. It is useful when the goal is to evaluate how well the model distinguishes between specific class pairs.
# In contrast, a regular confusion matrix summarizes the counts of correct and incorrect classifications for each class separately.

# Q3. What is an extrinsic measure in the context of natural language processing, and how is it typically used to evaluate the performance of language models?
# Answer:
# An **extrinsic measure** in natural language processing (NLP) is used to evaluate the performance of a language model based on its impact or effectiveness in solving a specific task.
# It is typically task-oriented, such as measuring the model's performance on downstream tasks like text classification, machine translation, or information retrieval.
# Examples of extrinsic measures include accuracy, BLEU score, and F1 score for NLP tasks.

# Q4. What is an intrinsic measure in the context of machine learning, and how does it differ from an extrinsic measure?
# Answer:
# An **intrinsic measure** evaluates the quality of a model without external tasks. It is typically used to assess how well the model fits the training data or the inherent quality of the model itself.
# Examples include perplexity and log-likelihood in language modeling or clustering metrics like silhouette score for unsupervised learning.
# **Extrinsic measures**, in contrast, evaluate the model based on its performance on real-world tasks or applications.

# Q5. What is the purpose of a confusion matrix in machine learning, and how can it be used to identify strengths and weaknesses of a model?
# Answer:
# The **confusion matrix** helps evaluate the classification model by displaying the counts of true and false positives/negatives. It provides detailed insights into:
# - The **accuracy** of the model.
# - The **precision** and **recall** for each class.
# - Identifying **misclassified samples**, enabling understanding of where the model is making errors.
# The confusion matrix can highlight whether the model struggles with specific classes or if there is class imbalance.

# Q6. What are some common intrinsic measures used to evaluate the performance of unsupervised learning algorithms, and how can they be interpreted?
# Answer:
# Common **intrinsic measures** for evaluating unsupervised learning algorithms (such as clustering) include:
# - **Silhouette score**: Measures how similar an object is to its own cluster compared to other clusters. Ranges from -1 (incorrect clustering) to 1 (well-clustered).
# - **Davies-Bouldin Index**: Measures the compactness and separation of clusters. Lower values indicate better clustering.
# - **Calinski-Harabasz Index**: Measures the ratio of the sum of between-cluster dispersion to within-cluster dispersion. Higher values indicate better clustering.
# These metrics help assess the quality of clustering without the need for ground truth labels.

# Q7. What are some limitations of using accuracy as a sole evaluation metric for classification tasks, and how can these limitations be addressed?
# Answer:
# **Limitations of Accuracy**:
# - **Class imbalance**: Accuracy can be misleading in cases where the dataset has a class imbalance (e.g., 95% negatives and 5% positives).
# - **Does not account for misclassifications**: A model could have a high accuracy but still perform poorly on certain classes (e.g., predicting only the majority class).
# - **Ignoring false positives/negatives**: Accuracy doesn't provide information about the types of errors made by the model.

# **Solutions**:
# - Use additional metrics such as **precision**, **recall**, **F1-score**, or **AUC-ROC** to get a more comprehensive evaluation.
# - In cases of class imbalance, consider using **balanced accuracy** or **confusion matrix** analysis to better understand model performance.
