In [1]:
#Week.15 
#Assignment.8 
#Question.1 : What is the purpose of grid search cv in machine learning, and how does it work?
#Answer.1 : # Purpose of Grid Search CV in Machine Learning:

# 1. **Optimal Hyperparameter Tuning:**
#    - Machine learning models often have hyperparameters that need to be set before training.
#    - The choice of hyperparameter values can significantly impact the model's performance.

# 2. **Grid Search CV:**
#    - Grid Search CV is a technique used for hyperparameter tuning.
#    - It systematically searches through a predefined set of hyperparameter combinations to find the optimal values.

# 3. **How Grid Search CV Works:**

#    a. **Define Hyperparameter Grid:**
#       - Specify a grid of hyperparameter values to be explored.
#       - For example, in scikit-learn, this can be done using a dictionary with hyperparameter names as keys and 
#lists of possible values.

#         ```python
#         param_grid = {'param_name': [value1, value2, ...]}
#         ```

#    b. **Cross-Validation:**
#       - Divide the dataset into multiple folds (subsets).
#       - For each combination of hyperparameters, train the model on several combinations of training and validation 
#sets (folds).
#       - Evaluate the model's performance on each validation set.

#    c. **Evaluate Performance:**
#       - Use a performance metric (e.g., accuracy, precision, recall) to assess the model's performance for each 
#hyperparameter combination.

#    d. **Select Optimal Hyperparameters:**
#       - Identify the hyperparameter combination that yields the best performance across all cross-validation folds.

#    e. **Train Final Model:**
#       - Train the final model using the optimal hyperparameters on the entire dataset.

# 4. **Implementation in scikit-learn:**
#    - scikit-learn provides the `GridSearchCV` class to perform grid search with cross-validation.

#      ```python
#      from sklearn.model_selection import GridSearchCV

#      # Specify the hyperparameter grid
#      param_grid = {'param_name': [value1, value2, ...]}

#      # Initialize the model and GridSearchCV
#      model = YourModel()
#      grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')

#      # Fit the GridSearchCV object to the data
#      grid_search.fit(X, y)

#      # Access the best hyperparameters
#      best_params = grid_search.best_params_
#      ```

# 5. **Benefits of Grid Search CV:**
#    - Systematically explores hyperparameter combinations.
#    - Reduces the risk of selecting suboptimal hyperparameters.
#    - Provides a more robust estimate of model performance through cross-validation.

# Note: Grid Search CV can be computationally expensive, especially for large hyperparameter grids. It's important to
#balance the search space with computational resources.

#Sample_code : 

# Import necessary libraries
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load the Iris dataset as an example
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Initialize the Random Forest Classifier
rf_model = RandomForestClassifier(random_state=42)

# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the GridSearchCV object to the data
grid_search.fit(X_train, y_train)

# Access the best hyperparameters
best_params = grid_search.best_params_

# Train the final model using the best hyperparameters
final_model = RandomForestClassifier(**best_params, random_state=42)
final_model.fit(X_train, y_train)

# Evaluate the final model on the test set
test_accuracy = final_model.score(X_test, y_test)

# Print results
print(f"Best Hyperparameters: {best_params}")
print(f"Test Accuracy with Best Hyperparameters: {test_accuracy:.4f}")


Best Hyperparameters: {'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}
Test Accuracy with Best Hyperparameters: 1.0000


In [2]:
#Question.2 : Describe the difference between grid search cv and randomize search cv, and when might you choose
#one over the other?
#Answer.2 : # Difference between Grid Search CV and Randomized Search CV:

# 1. **Grid Search CV:**
#    - **Approach:** Exhaustively searches through a predefined grid of hyperparameter combinations.
#    - **Search Space:** Specifies all possible combinations for each hyperparameter.
#    - **Computational Cost:** Can be computationally expensive, especially for large search spaces.
#    - **Use Case:** Suitable when you have a relatively small number of hyperparameters and want to explore all 
#possible combinations.

# 2. **Randomized Search CV:**
#    - **Approach:** Samples a random subset of the hyperparameter space for a fixed number of iterations.
#    - **Search Space:** Randomly selects hyperparameter values from a distribution or predefined list.
#    - **Computational Cost:** Often less computationally expensive compared to Grid Search, as it explores a subset
#of the search space.
#    - **Use Case:** Suitable when the hyperparameter search space is large, and a random sampling of hyperparameters 
#is likely to yield good results.

# 3. **Considerations for Choosing One Over the Other:**

#    a. **Search Space Size:**
#       - **Grid Search:** Suitable when the search space is relatively small and can be exhaustively explored.
#       - **Randomized Search:** Preferred for larger search spaces, where it may be impractical to explore all
#combinations.

#    b. **Computational Resources:**
#       - **Grid Search:** Can be computationally expensive, especially for a large number of hyperparameters and values.
#       - **Randomized Search:** Typically requires fewer resources, making it more efficient for large search spaces.

#    c. **Exploration vs. Exploitation:**
#       - **Grid Search:** Explores all possible combinations thoroughly.
#       - **Randomized Search:** Trades off some exploration for computational efficiency, focusing on promising areas
#of the search space.

# 4. **Implementation in scikit-learn:**
#    - Both Grid Search CV and Randomized Search CV can be implemented using the `GridSearchCV` and `RandomizedSearchCV`
#classes in scikit-learn, respectively.

#      ```python
#      from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
#      from sklearn.ensemble import RandomForestClassifier

#      # Hyperparameter grid for Grid Search
#      param_grid = {'param_name': [value1, value2, ...]}

#      # Initialize the model
#      model = RandomForestClassifier(random_state=42)

#      # Grid Search
#      grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')

#      # Randomized Search
#      random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', 
#random_state=42)
#      ```

# Note: The choice between Grid Search CV and Randomized Search CV depends on the specific characteristics of the
#hyperparameter search space and the available computational resources.


In [3]:
#Question.3 : What is data leakage, and why is it a problem in machine learning? Provide an example.
#Answer.3 : # Data Leakage in Machine Learning:

# 1. **Definition:**
#    - Data leakage refers to the unintentional inclusion of information in the training data that would not be 
#available at the time of making predictions on new, unseen data.
#    - It can lead to overly optimistic model performance during training but result in poor generalization to 
#real-world scenarios.

# 2. **Why Data Leakage is a Problem:**
#    - **Model Misleading:** Leakage can mislead the model by introducing features that are not genuinely predictive
#of the target variable.
#    - **Overfitting:** The model may learn patterns specific to the training data, which do not generalize to new data.
#    - **Invalid Evaluation:** Performance metrics may be inflated during training, providing a false sense of model 
#effectiveness.

# 3. **Examples of Data Leakage:**

#    a. **Using Future Information:**
#       - **Issue:** Including information from the future that would not be available when making predictions.
#       - **Example:**
#         ```python
#         # Incorrect: Using information that is not available at prediction time
#         X_train['future_information'] = X_train['target'].shift(-1)
#         ```

#    b. **Target-Related Leakage:**
#       - **Issue:** Including information related to the target variable that would not be known at prediction time.
#       - **Example:**
#         ```python
#         # Incorrect: Using target-related information
#         X_train['mean_target'] = X_train.groupby('category')['target'].transform('mean')
#         ```

#    c. **Data Preprocessing Leakage:**
#       - **Issue:** Applying transformations to the entire dataset before splitting into training and testing sets.
#       - **Example:**
#         ```python
#         # Incorrect: Scaling the entire dataset before splitting
#         from sklearn.preprocessing import StandardScaler
#         scaler = StandardScaler()
#         X_scaled = scaler.fit_transform(X)
#         ```

# 4. **Preventing Data Leakage:**
#    - **Proper Splitting:** Ensure that data is split into training and testing sets before any preprocessing or
#feature engineering.
#    - **Temporal Data Handling:** Be cautious with time-dependent data to prevent future information leakage.
#    - **Feature Engineering:** Avoid using information derived from the target variable during training.

# Note: Vigilance is crucial to identify and prevent data leakage, as it can significantly impact the validity and 
#reliability of machine learning models.


In [4]:
#Question.4 : How can you prevent data leakage when building a machine learning model?
#Answer.4 : # Preventing Data Leakage in Machine Learning:

# 1. **Proper Data Splitting:**
#    - **Strategy:** Ensure that data is split into training and testing sets before any preprocessing or 
#feature engineering.
#    - **Example:**
#      ```python
#      from sklearn.model_selection import train_test_split

#      # Incorrect: Splitting after preprocessing
#      X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y, test_size=0.2, random_state=42)

#      # Correct: Splitting before preprocessing
#      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#      ```

# 2. **Temporal Data Handling:**
#    - **Strategy:** Be cautious with time-dependent data to prevent future information leakage.
#    - **Example:**
#      ```python
#      # Incorrect: Using future information
#      X_train['future_information'] = X_train['target'].shift(-1)

#      # Correct: Avoiding future information
#      ```

# 3. **Feature Engineering Awareness:**
#    - **Strategy:** Avoid using information derived from the target variable during training.
#    - **Example:**
#      ```python
#      # Incorrect: Using target-related information
#      X_train['mean_target'] = X_train.groupby('category')['target'].transform('mean')

#      # Correct: Avoiding target-related leakage
#      ```

# 4. **Cross-Validation Techniques:**
#    - **Strategy:** Use proper cross-validation techniques, especially for time series or dependent data,
#to ensure each fold represents a fair split.
#    - **Example:**
#      ```python
#      from sklearn.model_selection import TimeSeriesSplit

#      # Correct usage of TimeSeriesSplit
#      tscv = TimeSeriesSplit(n_splits=5)
#      for train_index, test_index in tscv.split(X):
#          X_train, X_test = X.iloc[train_index], X.iloc[test_index]
#          y_train, y_test = y.iloc[train_index], y.iloc[test_index]
#      ```

# 5. **Awareness during Data Preprocessing:**
#    - **Strategy:** Be mindful of data preprocessing steps that might inadvertently introduce leakage.
#    - **Example:**
#      ```python
#      # Incorrect: Scaling the entire dataset before splitting
#      from sklearn.preprocessing import StandardScaler
#      scaler = StandardScaler()
#      X_scaled = scaler.fit_transform(X)

#      # Correct: Scaling after splitting
#      scaler = StandardScaler()
#      X_train_scaled = scaler.fit_transform(X_train)
#      X_test_scaled = scaler.transform(X_test)
#      ```

# Note: Vigilance and a clear understanding of the data are essential to prevent data leakage. Proper handling
#of temporal data and feature engineering can significantly contribute to building reliable machine learning models.


In [6]:
#Question.5 : What is a confusion matrix, and what does it tell you about the performance of a classification model?
#Answer.5 : # Confusion Matrix in Classification:

# 1. **Definition:**
#    - A confusion matrix is a table that summarizes the performance of a classification model by displaying 
#the counts of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions.

# 2. **Components of a Confusion Matrix:**

#    a. **True Positive (TP):**
#       - Instances correctly predicted as the positive class.
#       - Example: Model correctly identifies actual cases of a disease.

#    b. **True Negative (TN):**
#       - Instances correctly predicted as the negative class.
#       - Example: Model correctly identifies instances as not having a disease.

#    c. **False Positive (FP):**
#       - Instances incorrectly predicted as the positive class (Type I error).
#       - Example: Model incorrectly identifies instances as having a disease when they do not.

#    d. **False Negative (FN):**
#       - Instances incorrectly predicted as the negative class (Type II error).
#       - Example: Model incorrectly identifies actual cases of a disease as not having the disease.

# 3. **Organization of the Confusion Matrix:**

#    ```
#                    | Predicted Negative | Predicted Positive |
#    Actual Negative |        TN           |        FP           |
#    Actual Positive |        FN           |        TP           |
#    ```

# 4. **Metrics Derived from a Confusion Matrix:**

#    a. **Accuracy:**
#       - Proportion of correctly classified instances.
#       - `(TP + TN) / (TP + TN + FP + FN)`

#    b. **Precision (Positive Predictive Value):**
#       - Proportion of instances predicted as positive that are actually positive.
#       - `TP / (TP + FP)`

#    c. **Recall (Sensitivity or True Positive Rate):**
#       - Proportion of actual positive instances correctly predicted.
#       - `TP / (TP + FN)`

#    d. **Specificity (True Negative Rate):**
#       - Proportion of actual negative instances correctly predicted.
#       - `TN / (TN + FP)`

#    e. **F1 Score:**
#       - Harmonic mean of precision and recall.
#       - `2 * (Precision * Recall) / (Precision + Recall)`

# 5. **Interpretation:**
#    - The confusion matrix provides a detailed view of how well a classification model performs on different classes.
#    - It helps identify the types and frequency of errors made by the model.

# 6. **Implementation in scikit-learn:**
#    - Scikit-learn provides functions to calculate and visualize confusion matrices.

#      ```python
#      from sklearn.metrics import confusion_matrix

#      # Calculate confusion matrix
#      cm = confusion_matrix(y_true, y_pred)

#      # Print or visualize the confusion matrix
#      print(cm)
#      ```

# Note: Understanding the confusion matrix is crucial for evaluating and fine-tuning classification models.
#It provides insights into model strengths and weaknesses across different classes.


In [7]:
#Question.6 : Explain the difference between precision and recall in the context of a confusion matrix.
#Answer.6 : # Precision and Recall in the Context of a Confusion Matrix:

# 1. **Precision:**
#    - **Definition:** Precision, also known as Positive Predictive Value, measures the proportion of instances
#predicted as positive that are actually positive.
#    - **Formula:** `Precision = TP / (TP + FP)`
#    - **Interpretation:** High precision indicates that when the model predicts a positive class, it is likely correct. 
#It is focused on minimizing false positives.

# 2. **Recall:**
#    - **Definition:** Recall, also known as Sensitivity or True Positive Rate, measures the proportion of actual
#positive instances that are correctly predicted.
#    - **Formula:** `Recall = TP / (TP + FN)`
#    - **Interpretation:** High recall indicates that the model captures a large portion of the actual positive 
#instances. It is focused on minimizing false negatives.

# 3. **Trade-off between Precision and Recall:**
#    - **Balancing Act:** Precision and recall are often in tension with each other; improving one may degrade the other.
#    - **Example Scenario:**
#      ```python
#      # High Precision, Low Recall
#      # - Model predicts positive rarely, but when it does, it's usually correct.
#      Precision = 0.9, Recall = 0.3

#      # High Recall, Low Precision
#      # - Model predicts positive frequently, but many predictions are incorrect.
#      Precision = 0.3, Recall = 0.9
#      ```

# 4. **Use Cases:**
#    - **When to Prioritize Precision:**
#      - In scenarios where false positives are costly or undesirable.
#      - Example: Fraud detection in financial transactions; falsely flagging a non-fraudulent transaction as
#fraud can inconvenience customers.

#    - **When to Prioritize Recall:**
#      - In scenarios where false negatives are costly or dangerous.
#      - Example: Medical diagnosis; failing to detect a serious condition can have severe consequences.

# 5. **Harmonic Mean: F1 Score:**
#    - **F1 Score:** Combines precision and recall into a single metric using the harmonic mean.
#    - **Formula:** `F1 Score = 2 * (Precision * Recall) / (Precision + Recall)`
#    - **Interpretation:** F1 score is useful when there is a need for a balance between precision and recall.

# 6. **Implementation in scikit-learn:**
#    - Scikit-learn provides functions to calculate precision, recall, and F1 score.

#      ```python
#      from sklearn.metrics import precision_score, recall_score, f1_score

#      # Calculate precision, recall, and F1 score
#      precision = precision_score(y_true, y_pred)
#      recall = recall_score(y_true, y_pred)
#      f1 = f1_score(y_true, y_pred)
#      ```

# Note: Precision and recall provide insights into different aspects of a classification model's performance, and 
#the choice between them depends on the specific goals and constraints of the problem at hand.


In [8]:
#Question.7 : How can you interpret a confusion matrix to determine which types of errors your model is making?
#Answer.7 : # Interpreting a Confusion Matrix to Identify Types of Errors:

# 1. **True Positives (TP):**
#    - Instances correctly predicted as the positive class.
#    - Example: In a medical diagnosis, these are patients correctly identified as having a disease.

# 2. **True Negatives (TN):**
#    - Instances correctly predicted as the negative class.
#    - Example: In spam detection, these are non-spam emails correctly identified as such.

# 3. **False Positives (FP):**
#    - Instances incorrectly predicted as the positive class (Type I error).
#    - Example: In spam detection, these are non-spam emails incorrectly identified as spam.

# 4. **False Negatives (FN):**
#    - Instances incorrectly predicted as the negative class (Type II error).
#    - Example: In medical diagnosis, these are patients with a disease incorrectly identified as not having the 
#disease.

# 5. **Metrics Derived from a Confusion Matrix:**
#    a. **Precision (Positive Predictive Value):**
#       - Proportion of instances predicted as positive that are actually positive.
#       - `Precision = TP / (TP + FP)`
#       - Interpretation: High precision means the model is good at avoiding false positives.

#    b. **Recall (Sensitivity or True Positive Rate):**
#       - Proportion of actual positive instances correctly predicted.
#       - `Recall = TP / (TP + FN)`
#       - Interpretation: High recall means the model is good at capturing actual positive instances.

#    c. **Specificity (True Negative Rate):**
#       - Proportion of actual negative instances correctly predicted.
#       - `Specificity = TN / (TN + FP)`
#       - Interpretation: High specificity means the model is good at avoiding false positives in the negative class.

#    d. **False Positive Rate (FPR):**
#       - Proportion of actual negative instances incorrectly predicted as positive.
#       - `FPR = FP / (TN + FP)`
#       - Interpretation: Low FPR indicates a good ability to avoid false positives in the negative class.

# 6. **Visualization of Confusion Matrix:**
#    - Heatmaps and color-coded representations can provide a visual understanding of error patterns.
#    - Example:
#      ```python
#      import seaborn as sns
#      import matplotlib.pyplot as plt

#      # Create a heatmap of the confusion matrix
#      sns.heatmap(confusion_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Predicted Negative', 
#'Predicted Positive'], yticklabels=['Actual Negative', 'Actual Positive'])
#      plt.xlabel('Predicted Label')
#      plt.ylabel('Actual Label')
#      plt.title('Confusion Matrix')
#      plt.show()
#      ```

# 7. **Analyzing Error Patterns:**
#    - Look at the cells of the confusion matrix to understand where the model is making errors.
#    - Identify patterns such as whether the model tends to have more false positives or false negatives.

# Note: Interpreting a confusion matrix is crucial for understanding a model's strengths and weaknesses, 
#guiding further improvements, and aligning with the specific goals and constraints of the problem.


In [9]:
#Question.8 : What are some common metrics that can be derived from a confusion matrix, and how are they
#calculated?
#Answer.8 : # Common Metrics Derived from a Confusion Matrix:

# 1. Accuracy:
#    - Formula: (TP + TN) / (TP + TN + FP + FN)
#    - Interpretation: Proportion of correctly classified instances out of the total.

# 2. Precision (Positive Predictive Value):
#    - Formula: Precision = TP / (TP + FP)
#    - Interpretation: Proportion of instances predicted as positive that are actually positive.

# 3. Recall (Sensitivity or True Positive Rate):
#    - Formula: Recall = TP / (TP + FN)
#    - Interpretation: Proportion of actual positive instances correctly predicted.

# 4. Specificity (True Negative Rate):
#    - Formula: Specificity = TN / (TN + FP)
#    - Interpretation: Proportion of actual negative instances correctly predicted.

# 5. False Positive Rate (FPR):
#    - Formula: FPR = FP / (TN + FP)
#    - Interpretation: Proportion of actual negative instances incorrectly predicted as positive.

# 6. F1 Score:
#    - Formula: F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
#    - Interpretation: Harmonic mean of precision and recall.

# 7. Matthews Correlation Coefficient (MCC):
#    - Formula: MCC = (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
#    - Interpretation: Measures the correlation between observed and predicted classifications.

# 8. Area Under the Receiver Operating Characteristic Curve (AUC-ROC):
#    - Interpretation: Measures the model's ability to distinguish between positive and negative classes across 
#different thresholds.

# Note: The choice of metrics depends on the specific goals and constraints of the problem at hand. Different metrics 
#provide different perspectives on a model's performance.


In [10]:
#Question.9 : What is the relationship between the accuracy of a model and the values in its confusion matrix?
#Answer.9 : # Relationship Between Accuracy and Confusion Matrix:

# 1. **Accuracy:**
#    - Formula: `(TP + TN) / (TP + TN + FP + FN)`
#    - Interpretation: Proportion of correctly classified instances out of the total.

# 2. **Confusion Matrix Components:**
#    - True Positives (TP): Instances correctly predicted as the positive class.
#    - True Negatives (TN): Instances correctly predicted as the negative class.
#    - False Positives (FP): Instances incorrectly predicted as the positive class.
#    - False Negatives (FN): Instances incorrectly predicted as the negative class.

# 3. **Relationship:**
#    - Accuracy is influenced by the correct predictions (TP and TN) as well as incorrect predictions (FP and FN).
#    - Accuracy increases when the model makes more correct predictions and decreases when it makes more incorrect
#predictions.

# 4. **Code Example:**
#    - Let's use scikit-learn to calculate accuracy and display the confusion matrix.

#      ```python
#      from sklearn.metrics import accuracy_score, confusion_matrix
#      from sklearn.model_selection import train_test_split
#      from sklearn.linear_model import LogisticRegression

#      # Example data and model (replace with your data and model)
#      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#      model = LogisticRegression()
#      model.fit(X_train, y_train)
#      y_pred = model.predict(X_test)

#      # Calculate accuracy
#      accuracy = accuracy_score(y_test, y_pred)
#      print(f"Accuracy: {accuracy:.4f}")

#      # Display confusion matrix
#      cm = confusion_matrix(y_test, y_pred)
#      print("Confusion Matrix:")
#      print(cm)
#      ```

# 5. **Interpretation:**
#    - Analyze the confusion matrix along with accuracy to understand where the model is making correct or incorrect 
#predictions.
#    - Accuracy alone may not provide a complete picture, especially in imbalanced datasets.

# Note: While accuracy is a commonly used metric, it may not be suitable for all scenarios, especially when dealing 
#with imbalanced datasets. It is essential to consider other metrics and the context of the problem.


In [None]:
#Question.10 : How can you use a confusion matrix to identify potential biases or limitations in your machine learning
#model?
#Answer.10 : # Using Confusion Matrix to Identify Potential Biases or Limitations:

# 1. **Confusion Matrix Components:**
#    - True Positives (TP): Instances correctly predicted as the positive class.
#    - True Negatives (TN): Instances correctly predicted as the negative class.
#    - False Positives (FP): Instances incorrectly predicted as the positive class.
#    - False Negatives (FN): Instances incorrectly predicted as the negative class.

# 2. **Analysis for Bias or Limitations:**
#    - **Class Imbalance:** Check if the dataset has imbalances, leading to one class being favored over another.
#    - **False Positives or False Negatives Disproportion:** Evaluate if the model shows a bias towards false 
#positives or false negatives.

# 3. **Code Example:**
#    - Let's use scikit-learn to calculate a confusion matrix and analyze potential biases.

#      ```python
#      from sklearn.metrics import confusion_matrix
#      from sklearn.model_selection import train_test_split
#      from sklearn.linear_model import LogisticRegression

#      # Example data and model (replace with your data and model)
#      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#      model = LogisticRegression()
#      model.fit(X_train, y_train)
#      y_pred = model.predict(X_test)

#      # Display confusion matrix
#      cm = confusion_matrix(y_test, y_pred)
#      print("Confusion Matrix:")
#      print(cm)

#      # Analyze potential biases
#      total_positive_instances = cm[1, 0] + cm[1, 1]  # Sum of false positives and true positives
#      total_negative_instances = cm[0, 0] + cm[0, 1]  # Sum of true negatives and false negatives

#      bias_towards_positive = cm[1, 1] / total_positive_instances
#      bias_towards_negative = cm[0, 0] / total_negative_instances

#      print(f"Bias towards Positive Class: {bias_towards_positive:.4f}")
#      print(f"Bias towards Negative Class: {bias_towards_negative:.4f}")
#      ```

# 4. **Interpretation:**
#    - A bias towards the positive class may indicate the model tends to overpredict positive instances, and vice versa.
#    - Evaluate false positive and false negative rates to understand if the model has limitations in specific scenarios.

# 5. **Further Investigation:**
#    - Consider demographic or domain-specific breakdowns of the confusion matrix to identify biases across
#different subgroups.

# Note: Identifying biases or limitations is crucial for model fairness and generalizability. Interpret the 
Z#confusion matrix in the context of the problem and data at hand.
