# Gadgeon Interview Preparation

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 📖 TABLE OF CONTENTS

- [Section 1]()
  - [Subsection 1]()
    - [Subsubsection 1]()
    - [Subsubsection 2]()
  - [Subsection 2]()
    - [Subsubsection 1]()
    - [Subsubsection 2]()
- [Section 2]()
  - [Subsection 1]()
    - [Subsubsection 1]()
    - [Subsubsection 2]()
  - [Subsection 2]()
    - [Subsubsection 1]()
    - [Subsubsection 2]()

In [None]:
# Wireless Power Transfer Circuit Schematic

from IPython import display
display.Image("data/images/Gadgeon-Interview-Preparation/Wireless-Power-Transfer-Circuit-Schematic.png")

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 1. Machine Learning Metrics

## 1. Precision and Recall

### Definitions

Precision and Recall are two important metrics used to evaluate the performance of classification models, particularly in scenarios where the classes are imbalanced (e.g., predicting whether a patient has a disease based on ECG signals).

- **Precision:** This metric indicates the accuracy of the positive predictions made by the model. It answers the question: "Of all the instances that were predicted as positive, how many were actually positive?"

    $Precision = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} \; + \; \text{False Positives (FP)}}$
    ​
- **Recall (also known as Sensitivity or True Positive Rate):** This metric measures the ability of the model to find all relevant cases (actual positives). It answers the question: "Of all the actual positive instances, how many did we correctly predict as positive?"

    $Recall = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} \; + \; \text{False Negatives (FN)}}$

### Example

Consider a medical diagnosis scenario where we want to predict whether patients have a particular heart condition based on ECG data. Let's say we have the following results from our model:

- **True Positives (TP):** The model correctly predicts that 30 patients have the condition.
    
- **False Positives (FP):** The model incorrectly predicts that 10 patients have the condition when they do not.

- **False Negatives (FN):** The model fails to identify 5 patients who actually have the condition.

Using these values, we can calculate Precision and Recall:

- $Precision = \frac {30}{30+10}=\frac {30}{40} = 0.75$

    This means that when the model predicts a patient has the condition, it is correct 75% of the time.

- $Recall = \frac {30}{30+5}=\frac {30}{35} = 0.857$

    This means that out of all patients who actually have the condition, the model correctly identifies approximately 85.7%.

### Importance of Precision and Recall

- Precision is crucial in scenarios where false positives can lead to unnecessary treatments or anxiety for patients. For example, if a model falsely identifies healthy patients as having a heart condition, it may lead to unnecessary medical interventions.

- Recall is important when missing a positive case could have serious consequences. In our example, failing to identify a patient with a heart condition could lead to severe health risks.

### Understanding the Trade-off between Precision and Recall

In the context of COVID-19 classification, Precision and Recall are critical metrics that help evaluate the effectiveness of diagnostic models.

- **Precision** measures the accuracy of positive predictions. In COVID-19 detection, it answers the question: "Of all patients predicted to have COVID-19, how many actually have it?" A high precision indicates that when the model predicts a patient has COVID-19, it is likely correct.

- **Recall**, on the other hand, measures the ability of the model to identify all actual positive cases. It answers: "Of all patients who actually have COVID-19, how many did we correctly identify?" High recall means that most patients with COVID-19 are correctly diagnosed.

### The Trade-off Explained

The trade-off between precision and recall occurs because increasing one often leads to a decrease in the other. This is particularly relevant in high-stakes situations like COVID-19 diagnosis:

1. **High Recall, Lower Precision:** If a model is tuned to maximize recall, it will classify more patients as positive for COVID-19 to ensure that most actual cases are detected. However, this can lead to a higher number of false positives (healthy patients incorrectly diagnosed as having COVID-19). For instance, if a model identifies 95% of true COVID-19 cases (high recall) but also falsely identifies many healthy patients as positive (lower precision), it may create unnecessary alarm and resource allocation for those false positives.

2. **High Precision, Lower Recall:** Conversely, if the model is adjusted to maximize precision, it will be more conservative in its positive predictions. This means fewer healthy patients will be misclassified as having COVID-19 (higher precision), but some actual cases may be missed (lower recall). In this scenario, a patient with COVID-19 might be incorrectly classified as negative, potentially leading to further spread of the virus.

Given the contagious nature of COVID-19:

- **High Recall is Critical:** It is often more important to ensure that all infected individuals are identified to prevent further transmission. Missing an actual case (false negative) can lead to severe public health consequences.

- **Acceptable Precision Levels:** While high precision is desirable to avoid unnecessary panic and treatment for healthy individuals, in urgent public health scenarios like a pandemic, slightly lower precision may be acceptable if it means capturing more true cases.

### Facial Recognition System: High Precision Low Recall Application Scenario

**Scenario**

In a facial recognition system designed to authenticate users for secure access (e.g., smartphones, secure buildings), the primary goal is to ensure that only authorized individuals can gain entry. Here, it is crucial to minimize the chances of unauthorized access, even if it means that some authorized users may be incorrectly denied access.

**High Precision**

- **Definition:** High precision in this context means that when the system identifies a person as authorized (positive prediction), it is very likely correct. For instance, if the system claims that 90% of the individuals it recognizes are indeed authorized users, this indicates high precision.
    
- **Implication:** This high precision reduces the risk of unauthorized individuals gaining access. If the model predicts that a person is authorized, there is a 90% chance that they actually are. This reliability is critical in security applications where false positives (incorrectly granting access) can lead to serious security breaches.

**Low Recall**

- **Definition:** Low recall means that while the system is very accurate in its positive predictions, it fails to recognize a significant number of actual authorized users. For example, if out of 100 authorized users, the system only recognizes 30 correctly (true positives), while failing to recognize 70 (false negatives), the recall would be low.
    
- **Implication:** This low recall indicates that many legitimate users are being denied access because their faces are not recognized by the system. While this might be acceptable in high-security scenarios where preventing unauthorized access is paramount, it can frustrate users who experience repeated denials.

**Trade-off Justification**

In this application:

- **The cost of false positives (granting access to unauthorized individuals) is much higher than the cost of false negatives (denying access to authorized users). Therefore, designers prioritize precision over recall.**

- Users may prefer a system that is very accurate when it does grant access, even if it occasionally denies legitimate users. In such cases, having a reliable verification process with high precision ensures that security remains intact.


### Tumor Detection in Medical Imaging: Low Precision High Recall Application Scenario

**Scenario**

In the medical field, especially in oncology, early detection of tumors can significantly improve treatment outcomes. A machine learning model is developed to analyze medical images (such as MRI or CT scans) to identify potential tumors.

**High Recall**

- **Definition:** High recall in this context means that the model is very effective at identifying actual tumor cases. For instance, if out of 100 patients with tumors, the model correctly identifies 90 of them as having tumors, this indicates high recall.
    
- **Implication:** This high recall is crucial because missing a tumor (false negative) could lead to delayed treatment and worsen the patient's prognosis. In this scenario, it is vital to catch as many true cases as possible.

**Low Precision**

- **Definition:** Low precision means that while the model identifies most actual tumors, it also incorrectly labels many healthy cases as positive (false positives). For example, if the model predicts that 120 patients have tumors (including both true and false positives), but only 30 of those predictions are correct, the precision would be low.
    
- **Implication:** This results in a situation where many healthy patients are subjected to unnecessary anxiety and additional testing due to false positives. While it is critical to catch all possible tumor cases, the downside is that a significant number of healthy individuals are misclassified.

**Trade-off Justification**

In this application:

- **The cost of false negatives (failing to identify an actual tumor) is much higher than that of false positives (incorrectly identifying a healthy patient as having a tumor). Therefore, the model is designed with a focus on maximizing recall.**
    
- Medical professionals often prefer a system that ensures they do not miss any potential cancer cases, even if it means dealing with a higher number of false alarms. This approach allows for further investigation and testing for those flagged as positive.

## 2. ROC and AUC

### Definitions

**Receiver Operating Characteristic (ROC) Curve:** The ROC curve is a graphical representation used to evaluate the performance of a binary classification model at various threshold settings. It plots the **True Positive Rate (TPR)** against the **False Positive Rate (FPR)**.

- **True Positive Rate (TPR)**, also known as sensitivity or recall, is calculated as:

    $TPR = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} \; + \; \text{False Negatives (FN)}}$

- **False Positive Rate (FPR)** is calculated as:

    $FPR = \frac{\text{False Positives (FP)}}{\text{False Positives (FP)} \; + \; \text{True Negatives (TN)}}$

**Area Under the Curve (AUC):** The AUC quantifies the overall performance of the model by measuring the area under the ROC curve. The value of AUC ranges from 0 to 1:

- An AUC of 1 indicates a perfect model that can perfectly distinguish between positive and negative classes.

- An AUC of 0.5 suggests that the model performs no better than random chance.

- An AUC less than 0.5 indicates that the model is performing worse than random guessing.

### How ROC and AUC Work

1. **Generating the ROC Curve:**

- To create an ROC curve, you calculate TPR and FPR for different threshold values ranging from 0 to 1.
    
- As you adjust the threshold, you can observe how TPR and FPR change, allowing you to plot these values on a graph.

2. **Interpreting the ROC Curve:**

- The curve starts at the point (0,0) and ends at (1,1).
    
- A curve that bows towards the top left corner indicates a better-performing model, while a curve closer to the diagonal line (from (0,0) to (1,1)) indicates poor performance.

### Example

Consider a binary classification model designed to detect whether patients have a specific disease based on test results. Here's how you might visualize its performance using an ROC curve:

- **Model Predictions:** After running your model, you get predicted probabilities for each patient indicating their likelihood of having the disease.

- **Thresholds:** You evaluate thresholds from 0.0 to 1.0 in increments (e.g., 0.1).

For example:

- At a threshold of **0.3**, suppose your model predicts:

    - TP = 80
    - FP = 10
    - FN = 20
    - TN = 90

Calculating TPR and FPR:

- TPR = $\frac {80}{80+20}$ = 0.8
- FPR = $\frac {10}{10+90}$ = 0.1

You would plot this point on your ROC curve at coordinates (0.1, 0.8). As you continue adjusting thresholds and calculating TPR and FPR, you generate more points until you can connect them to form your ROC curve.

### Importance of AUC

The AUC provides a single scalar value that summarizes the model's ability to discriminate between classes across all thresholds:

- A higher AUC value indicates better performance in distinguishing between positive and negative classes.
    
- For instance, if your model has an AUC of 0.85, it means there is an 85% chance that it will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

### Differences in AUC-ROC for Multi-Class Classification

1. **Binary vs. Multi-Class:**

- In binary classification, the ROC curve is straightforward, plotting True Positive Rate (TPR) against False Positive Rate (FPR) for one positive class versus one negative class.
    
- In multi-class classification, where there are three or more classes, the ROC curve needs to be adapted because each class can be considered as a positive class while treating all other classes as negative.

2. **One-vs-Rest (OvR) Approach:**

- The most common method for creating ROC curves in multi-class settings is the **One-vs-Rest (OvR)** approach. For each class:

    - Treat that class as the positive class and all other classes as negative.    
    - Generate a separate ROC curve for each class.
    
- For example, if you have three classes (A, B, C), you would create:
        
    - One ROC curve for class A vs. classes B and C.
    - One ROC curve for class B vs. classes A and C.
    - One ROC curve for class C vs. classes A and B.

3. **One-vs-One (OvO) Approach:**

- Another method is the **One-vs-One (OvO)** approach, which involves creating a ROC curve for every pair of classes.
    
- For three classes (A, B, C), you would create:
        
    - One ROC curve for class A vs. class B.
    - One ROC curve for class A vs. class C.
    - One ROC curve for class B vs. class C.
    
- This method can become computationally intensive as the number of classes increases.

4. **Micro and Macro Averaging:**

- After generating multiple ROC curves using either OvR or OvO methods, you can summarize the performance using **micro** and **macro averaging:**
        
    - **Micro Averaging:** Computes global TPR and FPR by aggregating contributions from all classes before calculating the AUC. This approach treats all instances equally and is useful when you want to emphasize the performance across all samples.
        
    - **Macro Averaging:** Calculates the AUC for each class separately and then takes the average of these values. This method treats all classes equally regardless of their size, which can be beneficial when dealing with imbalanced datasets.

### Example of AUC-ROC for Multi-Class Classification

Imagine we have a dataset containing images of cats, dogs, and rabbits. Our goal is to train a classifier that can accurately identify these animals based on their features.

- **Step 1: Training the Model**

    We train a multi-class classifier (e.g., a neural network or logistic regression) on this dataset. After training, the model outputs predicted probabilities for each class for every instance in the test set.

- **Step 2: Applying ROC and AUC**

    In the One-vs-Rest (OvR) approach, we treat each class as a positive class while combining all other classes as negative. This means we will create three separate ROC curves:

    1. **ROC Curve for Cats:**
        - Treat "Cats" as the positive class.
        - Combine "Dogs" and "Rabbits" as the negative class.
        - Calculate True Positive Rate (TPR) and False Positive Rate (FPR) at various thresholds.
    2. **ROC Curve for Dogs:**
        - Treat "Dogs" as the positive class.
        - Combine "Cats" and "Rabbits" as the negative class.
        - Calculate TPR and FPR similarly.
    3. **ROC Curve for Rabbits:**
        - Treat "Rabbits" as the positive class.
        - Combine "Cats" and "Dogs" as the negative class.
        - Calculate TPR and FPR accordingly.

    - **Example Calculation for OvR**

        Assume we calculate TPR and FPR at different thresholds for each class. Here's an example of what the results might look like:

        - **Cats:**
            - TPR = 0.85, FPR = 0.10
        - **Dogs:**
            - TPR = 0.80, FPR = 0.15
        - **Rabbits:**
            - TPR = 0.90, FPR = 0.05

        Using these values, we can plot three ROC curves on the same graph.
    
    - **Area Under the Curve (AUC)**
    
        After plotting the ROC curves, we calculate the AUC for each curve:

        - AUC for Cats: **0.88**
        - AUC for Dogs: **0.82**
        - AUC for Rabbits: **0.91**

- **Step 3: Averaging AUC Scores**

To summarize model performance across all classes, we can use two averaging methods:

1. **Macro Averaging:**
        
    - This method calculates the average of AUCs across all classes without considering class imbalance.

    $\text{Macro AUC} = \frac {\text{AUC}_\text{Cats} + \text{AUC}_\text{Dogs} + \text{AUC}_\text{Rabbits}}{3} = \frac {0.88+0.82+0.91}{3} \approx 0.87$
    
2. **Micro Averaging:**
        
    - Micro averaging aggregates contributions from all classes before calculating metrics, treating each instance equally.
    
    - For micro averaging in multi-class settings, you typically sum all true positives, false positives, etc., across classes before calculating TPR and FPR.

### Common Pitfalls When Interpreting the AUC-ROC Curve

Interpreting the AUC-ROC curve can be insightful, but there are several common pitfalls that can lead to misleading conclusions. Here are some key pitfalls to be aware of:

1. **Imbalanced Datasets:**

    - **Issue:** The ROC curve can provide an overly optimistic assessment of model performance when dealing with imbalanced datasets. In such cases, the False Positive Rate (FPR) may appear very low due to a large number of True Negatives (TN), making the model seem more effective than it actually is.
    
    - **Example:** In a dataset with 95% negative cases and 5% positive cases, a model that predicts all instances as negative could still achieve a high AUC simply because it has many TNs and few FPs, despite failing to identify any positive cases.

2. **Ignoring Cost of Errors:**

    - **Issue:** AUC-ROC does not take into account the different costs associated with false positives and false negatives. In many applications, especially in healthcare, the consequences of misclassifications can vary significantly.
    
    - **Example:** In a medical diagnosis scenario, failing to identify a disease (false negative) might have severe consequences compared to falsely diagnosing it (false positive). Relying solely on AUC could lead to poor decision-making if the costs of errors are not considered.

3. **Threshold Independence:**

    - **Issue:** AUC measures performance across all possible thresholds, which may include thresholds that are not practically relevant for specific applications. This can obscure meaningful insights about model performance at clinically relevant thresholds.
    
    - **Example:** A model might have a high AUC but perform poorly at the threshold that is most relevant for clinical decisions. This means that while the overall performance looks good, it may not translate to effective real-world use.

4. **Misleading Comparisons:**

    - **Issue:** When comparing models based on AUC values, caution is needed, especially if ROC curves intersect. Simply comparing AUC values may not provide a complete picture of model performance.
    
    - **Example:** If two models have similar AUC values but one model performs significantly better at clinically important thresholds while the other does not, relying solely on AUC could lead to choosing the less effective model.

5. **Interpretation of AUC Values:**

    - **Issue:** Misinterpretation of what different AUC values imply can lead to incorrect conclusions about model effectiveness. An AUC close to 0.5 indicates random guessing, while an AUC above 0.7 is often considered acceptable.
    
    - **Example:** Clinicians might assume that an AUC of 0.8 indicates excellent performance without considering how it translates into actual clinical outcomes or whether it consistently performs well across relevant thresholds.

6. **Overfitting and Model Complexity:**

    - **Issue:** High AUC scores can sometimes be achieved by overly complex models that do not generalize well to unseen data. This can lead to overfitting where the model performs well on training data but poorly on test data.
    
    - **Example:** A complex model might achieve an AUC of 0.95 on training data but drop significantly when evaluated on validation data due to its inability to generalize.

Here are some alternative metrics that can complement the AUC-ROC curve for a more comprehensive evaluation of model performance:

1. **Precision and Recall:**

    - **Precision:** Measures the accuracy of positive predictions. It answers the question: "Of all instances predicted as positive, how many were actually positive?"
    
    - **Recall:** Measures the ability to identify all actual positive instances. It answers: "Of all actual positives, how many did we correctly identify?"
    
    - **Use Case:** These metrics are particularly important in medical diagnosis, where failing to identify a disease (false negative) can have serious consequences.

2. **Area Under the Precision-Recall Curve (AUC-PR):**

    - **Definition:** This metric summarizes the trade-off between precision and recall across different thresholds.
    
    - **Use Case:** Particularly useful for imbalanced datasets where the positive class is rare. It focuses on the performance of the classifier with respect to the positive (minority) class.
    
    - **Strengths:** Unlike AUC-ROC, which can be overly optimistic in imbalanced settings, AUC-PR provides a clearer picture of how well the model performs on the minority class.

3. **Logarithmic Loss (Log Loss):**

    - **Definition:** Log loss measures the performance of a classification model where predictions are probabilities between 0 and 1. It penalizes incorrect classifications with a heavier weight for confident wrong predictions.
    
    - **Use Case:** Useful when you want to evaluate how well your predicted probabilities align with actual outcomes.

**Conclusion**

While the AUC-ROC curve is a valuable tool for assessing model performance, it is essential to be aware of these common pitfalls when interpreting its results. Understanding these limitations helps ensure that decisions based on ROC analysis are informed and appropriate for the specific context in which a model is being applied. To mitigate these issues, consider using additional metrics (like precision and recall) and conducting thorough evaluations at clinically relevant thresholds alongside ROC analysis.

### How does the Choice of Threshold affect the AUC-ROC Curve?

The choice of threshold in a classification model significantly affects the AUC-ROC curve and the interpretation of model performance. Here's a detailed explanation of how this relationship works:

#### Understanding Thresholds in ROC Analysis

1. **Threshold Definition:**

    - A threshold is a probability value that determines how a predicted score is classified into binary outcomes (positive or negative). For instance, if a model predicts a probability of 0.7 for a positive class, and the threshold is set at 0.5, the prediction will be classified as positive.

2. **Impact on True Positive Rate (TPR) and False Positive Rate (FPR):**

    - As you adjust the threshold:
        
        - **Lowering the Threshold:** Increases TPR (sensitivity) because more instances are classified as positive. However, this also increases FPR, leading to more false positives.
        
        - **Raising the Threshold:** Decreases TPR because fewer instances are classified as positive, but also decreases FPR, resulting in fewer false positives.

    This inverse relationship means that as you increase sensitivity (TPR), you simultaneously increase the rate of false positives (FPR) and vice versa.

#### How Threshold Choice Affects AUC-ROC

1. **AUC-ROC Overview:**

    - The AUC measures the area under the ROC curve, which plots TPR against FPR across various thresholds. A higher AUC indicates better model performance in distinguishing between positive and negative classes.

2. **Threshold Independence of AUC:**

    - One of the key properties of AUC is that it is threshold-invariant; it summarizes model performance across all possible thresholds rather than being tied to a specific one. This means that while individual thresholds affect TPR and FPR, the overall AUC remains a holistic measure of performance.

3. **Choosing Optimal Thresholds:**

    - Although AUC provides a general assessment, selecting an optimal threshold for practical applications depends on the specific costs associated with false positives and false negatives:
        
        - If false negatives are costly (e.g., missing a cancer diagnosis), you might choose a lower threshold to maximize TPR, even if it leads to more false positives.
        
        - Conversely, if false positives are costly (e.g., unnecessary treatments), you might opt for a higher threshold to minimize FPR, accepting a lower TPR.

#### Example Scenario

Consider a medical diagnostic test for detecting a disease:

- **Threshold Set at 0.3:**
    - TPR = 0.85 (high sensitivity)
    - FPR = 0.15
    - This setting captures most actual cases of the disease but may lead to many healthy individuals being incorrectly diagnosed.

- **Threshold Set at 0.7:**
    - TPR = 0.60 (lower sensitivity)
    - FPR = 0.05
    - This setting reduces false positives significantly but misses some true cases of the disease.

#### Summary

- The choice of threshold directly influences TPR and FPR, shaping the ROC curve's appearance.
    
- While AUC provides an overall measure of model performance across all thresholds, selecting an appropriate threshold for specific applications requires careful consideration of the implications of false positives and negatives.
    
- Ultimately, understanding how threshold adjustments affect both individual metrics and overall AUC helps practitioners make informed decisions about model deployment in real-world scenarios.

By considering these factors, you can better interpret ROC curves and select thresholds that align with your specific goals and constraints in classification tasks.

## 3. Micro and Macro Averaging to Find Different Metrics

Micro and macro averaging are techniques used to evaluate the performance of classification models, particularly in multi-class classification problems. They provide different perspectives on model performance by handling class contributions differently. Here’s a detailed explanation of both methods, along with examples to illustrate their differences.

### Definitions

- **Micro Averaging:**

    - Micro averaging aggregates the contributions of all classes to compute the overall performance metrics. It treats each instance equally, regardless of its class label.
    
    - In micro averaging, you sum up the true positives (TP), false positives (FP), and false negatives (FN) across all classes before calculating precision, recall, or F1 score.

- **Macro Averaging:**

    - Macro averaging calculates the performance metrics for each class independently and then takes the average. It gives equal weight to each class, regardless of how many instances belong to each class.
    
    - In macro averaging, you compute precision, recall, or F1 score for each class separately and then average these values.

### When to Use Each Method

- **Micro Averaging:** Use this when you want to give equal weight to each instance. This is particularly useful in imbalanced datasets where you want the overall performance to reflect the model's ability to classify instances correctly.

- **Macro Averaging:** Use this when you want to treat all classes equally, regardless of their size. This is useful when you want to assess how well your model performs across all classes without being biased by the majority class.

### Example Scenario

Consider a multi-class classification problem with three classes: **Cats**, **Dogs**, and **Rabbits**. Let's say we have the following confusion matrix based on model predictions:

| Actual \ Predicted | Cats | Dogs | Rabbits |
| :----------------- | :--- | :--- | :------ |
| Cats | 30 | 5 | 2 |
| Dogs | 3 | 25 | 1 |
| Rabbits | 4 | 2 | 27 |

From this confusion matrix, we can derive the following:

- **True Positives (TP):**
    
    - Cats: 30
    - Dogs: 25
    - Rabbits: 27

- **False Positives (FP):**

    - FP for Cats $\implies$ Predicted as Cats but Actually Dogs or Rabbits = 3 + 4 = 7
    - FP for Dogs $\implies$ Predicted as Dogs but Actually Cats or Rabbits = 5 + 2 = 7
    - FP for Rabbits $\implies$ Predicted as Rabbits but Actually Cats or Dogs = 2 + 1 = 3

- **False Negatives (FN):**

    - FN for Cats $\implies$ Instances that are actually Cats but were predicted as Dogs or Rabbits = 5 + 2 = 7
    - FN for Dogs $\implies$ Instances that are actually Dogs but were predicted as Cats or Rabbits = 3 + 1 = 4
    - FN for Rabbits $\implies$ Instances that are actually Rabbits but were predicted as Cats or Dogs = 4 + 2 = 6

### Micro Averaging Calculation

To calculate micro averages, we sum up all TP, FP, and FN:

- Total TP = 30 + 25 + 27=82
- Total FP = 7 + 7 + 3 = 17
- Total FN = 7 + 4 + 6 = 17

Now we can calculate micro precision and recall:

- **Micro Precision:**
    
    $\text{Micro Precision} = \frac {\text{Total TP}}{\text{Total TP} + \text{Total FP}} = \frac {82}{82 + 17} = \frac {82}{99} = 0.828$

- **Micro Recall:**
    
    $\text{Micro Recall} = \frac {\text{Total TP}}{\text{Total TP} + \text{Total FN}} = \frac {82}{82 + 17} = \frac {82}{99} = 0.828$

### Macro Averaging Calculation

For macro averages, we calculate precision and recall for each class independently:

1. **Precision for Each Class:**
    
    - **Cats:**
    
    $\text{P}_\text{Cats} = \frac {\text{TP}_\text{Cats}}{\text{TP}_\text{Cats} + \text{FP}_\text{Cats}} = \frac {30}{30 + 7} = \frac {30}{37} = 0.811$

    - **Dogs:**
    
    $\text{P}_\text{Dogs} = \frac {\text{TP}_\text{Dogs}}{\text{TP}_\text{Dogs} + \text{FP}_\text{Dogs}} = \frac {25}{25 + 7} = \frac {25}{32} = 0.781$

    - **Rabbits:**
    
    $\text{P}_\text{Rabbits} = \frac {\text{TP}_\text{Rabbits}}{\text{TP}_\text{Rabbits} + \text{FP}_\text{Rabbits}} = \frac {27}{27 + 3} = \frac {27}{30} = 0.9$

2. **Recall for Each Class:**
    
    - **Cats:**
    
    $\text{R}_\text{Cats} = \frac {\text{TP}_\text{Cats}}{\text{TP}_\text{Cats} + \text{FN}_\text{Cats}} = \frac {30}{30 + 7} = \frac {30}{37} = 0.811$

    - **Dogs:**
    
    $\text{R}_\text{Dogs} = \frac {\text{TP}_\text{Dogs}}{\text{TP}_\text{Dogs} + \text{FN}_\text{Dogs}} = \frac {25}{25 + 4} = \frac {25}{29} = 0.862$

    - **Rabbits:**
    
    $\text{R}_\text{Rabbits} = \frac {\text{TP}_\text{Rabbits}}{\text{TP}_\text{Rabbits} + \text{FN}_\text{Rabbits}} = \frac {27}{27 + 6} = \frac {27}{33} = 0.818$

3. **Calculating Macro Averages:**

    - **Macro Precision:**

    $\text{Macro Precision} = \frac {\text{P}_\text{Cats} + \text{P}_\text{Dogs} + \text{P}_\text{Rabbits}}{3} = \frac {0.811 + 0.781 + 0.9}{3} = \frac {2.492}{3} = 0.8307$

    - **Macro Recall:**

    $\text{Macro Recall} = \frac {\text{R}_\text{Cats} + \text{R}_\text{Dogs} + \text{R}_\text{Rabbits}}{3} = \frac {0.811 + 0.862 + 0.818}{3} = \frac {2.491}{3} = 0.8303$

### Summary of Results

| Metric | Micro Average | Macro Average |
| :----- | :------------ | :------------ |
| Precision | 0.828 | 0.8307 |
| Recall | 0.828 | 0.8303 |

### Conclusion

Micro and macro averaging provide valuable insights into model performance in multi-class classification tasks:

- **Micro Averaging** gives equal weight to each instance, making it sensitive to class imbalances and reflecting overall performance across all predictions.

- **Macro Averaging** treats all classes equally regardless of their size, providing insights into how well the model performs across all classes without being biased by larger classes.

Choosing between micro and macro averaging depends on the specific context of your application and whether you want to prioritize overall accuracy or equal treatment of all classes in your evaluation metrics.

## 4. F1 Score

### Definition

The **F1 Score** is a metric used to evaluate the performance of a classification model, particularly in situations where the classes are imbalanced. It is the harmonic mean of **Precision** and **Recall**, providing a single score that balances both metrics. The F1 Score is especially useful when you want to find an optimal balance between precision and recall.

- Formula $\implies$ $\text{F1 Score} = 2 \times \frac {\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$

### Importance of F1 Score

- **Balancing Act:** The F1 Score provides a way to combine precision and recall into a single metric, making it easier to understand model performance, especially when dealing with imbalanced datasets.
    
- **Focus on Positive Class:** In many applications, such as fraud detection or medical diagnosis, identifying the positive class correctly is more important than simply achieving high accuracy. The F1 Score emphasizes this aspect.

### Example Scenario

Let's consider a binary classification problem where we want to predict whether an email is spam or not. After running our model, we obtain the following confusion matrix:

| Actual \ Predicted | Spam (Positive) | Not Spam (Negative) |
| :----------------- | :-------------- | :------------------ |
| Spam | 40 | 10 |
| Not Spam | 5 | 45 |

From this confusion matrix, we can derive:

- **True Positives (TP):** 40 (correctly predicted spam emails)
    
- **False Positives (FP):** 5 (not spam emails incorrectly predicted as spam)
    
- **False Negatives (FN):** 10 (spam emails incorrectly predicted as not spam)

### Step-by-Step Calculation

1. **Calculate Precision:**

    $\text{Precision} = \frac {\text{TP}}{\text{TP} + \text{FP}} = \frac {40}{40 + 5} = \frac {40}{45} = 0.889$

2. **Calculate Recall:**

    $\text{Recall} = \frac {\text{TP}}{\text{TP} + \text{FN}} = \frac {40}{40 + 10} = \frac {40}{50} = 0.8$

3. **Calculate F1 Score:**

    $\text{F1 Score} = 2 \times \frac {\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = 2 \times \frac {0.889 \times 0.8}{0.889 + 0.8} \approx 0.842$

### Conclusion

In this example, the F1 Score of approximately **0.842** indicates a good balance between precision and recall for our spam detection model. The F1 Score is particularly valuable in scenarios where false negatives carry significant consequences, such as in medical diagnoses or fraud detection, allowing practitioners to make informed decisions about model performance based on a single metric that reflects both precision and recall. By using the F1 Score alongside other metrics like accuracy and AUC-ROC, you can gain a comprehensive understanding of your model's performance and its suitability for your specific application needs.

## 5. Other Evaluation Metrics for Multi-Class Classification

In addition to commonly used metrics like precision, recall, and F1 score, there are several other evaluation metrics that are important for assessing the performance of multi-class classification models. These metrics provide insights into different aspects of model performance, especially when dealing with imbalanced datasets or specific application requirements.

Here are some key evaluation metrics for multi-class classification:

### 1. Accuracy

- **Definition:** Accuracy measures the proportion of correctly predicted instances out of the total instances. It is calculated as:

    - $\text{Accuracy} = \frac {\text{Total Correct Predictions}}{\text{Total Predictions}} = \frac {\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}$

- **Example:** If a model correctly classifies 80 out of 100 instances in a three-class problem (Cats, Dogs, Rabbits), the accuracy would be:

    - $\text{Accuracy} = \frac {80}{100} = 0.80 or 80 \%$

- **Limitations:** While accuracy is a straightforward metric, it can be misleading in cases of class imbalance. For example, if 95 out of 100 instances belong to one class, a model that predicts all instances as that class could achieve high accuracy without actually being effective at distinguishing between classes.

### 2. Log Loss (Cross-Entropy Loss)

- **Definition:** Log loss measures the performance of a classification model whose output is a probability value between 0 and 1. It quantifies the difference between the predicted probabilities and the actual class labels. The formula for log loss is:

    - $\text{Log Loss} = - \frac {1}{N} \Sigma_{i = 1}^{N} (y_i \log p_i + (1 - y_i) \log (1 - p_i))$ where
        - $y_i$ is the actual label (0 or 1).
        - $p_i$ is the predicted probability of the positive class.

- **Example:** If a model predicts probabilities for three classes and the actual labels are known, log loss can be calculated to assess how well the predicted probabilities align with the actual outcomes.

- **Limitations:** Log loss can be sensitive to outliers and may not provide a clear picture of performance if used alone.

### 3. Macro-Averaged Metrics

- **Macro-Averaged Precision/Recall/F1 Score:** These metrics calculate precision, recall, or F1 score for each class independently and then average these scores without considering class imbalance. This approach treats all classes equally.

    $\text{Macro Precision} = \frac {\Sigma_{i = 1}^C \text{Precision}_i}{C}$ where $C$ is the number of classes.


### 4. Weighted-Averaged Metrics

- **Weighted-Averaged Precision/Recall/F1 Score:** These metrics compute precision, recall, or F1 score for each class and then average them while weighting by the number of true instances for each class (support). This approach accounts for class imbalance.

    $\text{Weighted Precision} = \frac {\Sigma_{i = 1}^C \text{Precision}_i \times \text{Support}_i{\Sigma_{i = 1}^C \text{Support}_i}$ where $\text{Support}_i$ is the number of actual instances for each class.


![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 2. Deep Learning Model Architectures

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 3. Deep Learning on Time Series Data

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 4. Practical Applications and Datasets

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

# 5. Additional Topics

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)

Difficulty: ${\color{green}{Easy}}$
Difficulty: ${\color{orange}{Medium}}$
Difficulty: ${\color{red}{Hard}}$

In [None]:
# Deep Learning as subset of ML

from IPython import display
display.Image("data/images/ML.jpg")

![rainbow](https://github.com/ancilcleetus/My-Learning-Journey/assets/25684256/839c3524-2a1d-4779-85a0-83c562e1e5e5)