# Quick Assessment: Find Your Starting Level

## 🎯 Purpose

This quick assessment helps you determine which level of exercises to start with. It should take about **5-10 minutes** to complete.

## 📋 Instructions

1. **Answer honestly** - this is for your benefit!
2. **Don't look up answers** - use your current knowledge
3. **It's okay to not know** - that's why you're here to learn!
4. **Count your correct answers** at the end

## 🏁 Scoring Guide

- **0-3 correct**: Start with **Level 1** (Beginner)
- **4-6 correct**: Start with **Level 2** (Intermediate) 
- **7-9 correct**: Start with **Level 3** (Advanced)

---

## Question 1: Cross-Validation Basics

**Scenario**: You have a dataset with 1000 samples and want to use 5-fold cross-validation.

**Question**: How many samples will be in each training set?

A) 200 samples  
B) 800 samples  
C) 1000 samples  
D) 500 samples  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: B) 800 samples**

**Explanation**: In 5-fold CV, data is split into 5 equal parts (200 samples each). For each fold, 4 parts are used for training (4 × 200 = 800) and 1 part for testing (200).

</details>

## Question 2: Confusion Matrix Interpretation

**Given this confusion matrix for a binary classifier**:

```
           Predicted
         0    1
Actual 0 85   15
       1 10   90
```

**Question**: What is the precision for class 1?

A) 90/100 = 0.90  
B) 90/105 = 0.86  
C) 85/95 = 0.89  
D) 90/200 = 0.45  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: B) 90/105 = 0.86**

**Explanation**: Precision = True Positives / (True Positives + False Positives) = 90 / (90 + 15) = 90/105 = 0.86

</details>

## Question 3: Hyperparameter Tuning

**Question**: Which statement about Grid Search vs Random Search is TRUE?

A) Grid Search is always faster than Random Search  
B) Random Search explores all possible combinations  
C) Random Search can be more efficient when many parameters don't affect performance  
D) Grid Search is better for continuous parameters  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: C) Random Search can be more efficient when many parameters don't affect performance**

**Explanation**: Random Search can find good solutions faster when only a few parameters significantly impact performance, as it doesn't waste time on systematic but irrelevant combinations.

</details>

## Question 4: Metric Selection

**Scenario**: You're building a model to detect rare diseases (1% of population has the disease).

**Question**: Which metric would be MOST appropriate as your primary evaluation metric?

A) Accuracy  
B) Precision  
C) Recall  
D) F1-Score  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: C) Recall**

**Explanation**: For rare disease detection, missing a positive case (false negative) is much worse than a false alarm. High recall ensures we catch most disease cases, even if we have some false positives.

</details>

## Question 5: Data Leakage

**Question**: Which of these would cause data leakage in cross-validation?

A) Using different random seeds for each fold  
B) Scaling features before splitting the data  
C) Using stratified sampling  
D) Setting random_state parameter  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: B) Scaling features before splitting the data**

**Explanation**: Scaling before splitting means the test set statistics influence the training set preprocessing, causing data leakage. Scaling should be done within each CV fold.

</details>

## Question 6: ROC Curve Understanding

**Question**: What does the Area Under the ROC Curve (AUC) represent?

A) The probability that the model ranks a random positive example higher than a random negative example  
B) The accuracy of the model at the optimal threshold  
C) The precision of the model across all thresholds  
D) The recall of the model at 50% threshold  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: A) The probability that the model ranks a random positive example higher than a random negative example**

**Explanation**: AUC-ROC measures the model's ability to distinguish between classes. It represents the probability that a randomly chosen positive instance will be ranked higher than a randomly chosen negative instance.

</details>

## Question 7: Nested Cross-Validation

**Question**: Why would you use nested cross-validation?

A) To speed up hyperparameter tuning  
B) To get an unbiased estimate of model performance when doing hyperparameter tuning  
C) To reduce overfitting to the training set  
D) To handle imbalanced datasets better  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: B) To get an unbiased estimate of model performance when doing hyperparameter tuning**

**Explanation**: Nested CV separates hyperparameter optimization from performance estimation, preventing optimistic bias that occurs when the same data is used for both tuning and evaluation.

</details>

## Question 8: Learning Curves

**Question**: If training score is high but validation score is low and flat, this indicates:

A) Underfitting - need more complex model  
B) Overfitting - need more data or regularization  
C) Good fit - model is optimal  
D) Data quality issues  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: B) Overfitting - need more data or regularization**

**Explanation**: High training score with low, flat validation score indicates the model memorizes training data but doesn't generalize. This is classic overfitting.

</details>

## Question 9: Production Considerations

**Question**: In production, which is MOST important for monitoring model performance?

A) Training accuracy  
B) Cross-validation scores  
C) Distribution drift in input features  
D) Model complexity  

**Your Answer**: ___

<details>
<summary>Click for Answer & Explanation</summary>

**Correct Answer: C) Distribution drift in input features**

**Explanation**: In production, data distribution changes over time. Monitoring feature drift helps detect when model performance may degrade due to changing data patterns.

</details>

---

## 📊 Calculate Your Score

Count how many questions you answered correctly:

In [None]:
# Enter your answers here (use the letter: A, B, C, or D)
your_answers = {
    'Q1': '',  # Your answer for Question 1
    'Q2': '',  # Your answer for Question 2
    'Q3': '',  # Your answer for Question 3
    'Q4': '',  # Your answer for Question 4
    'Q5': '',  # Your answer for Question 5
    'Q6': '',  # Your answer for Question 6
    'Q7': '',  # Your answer for Question 7
    'Q8': '',  # Your answer for Question 8
    'Q9': '',  # Your answer for Question 9
}

# Correct answers
correct_answers = {
    'Q1': 'B',
    'Q2': 'B', 
    'Q3': 'C',
    'Q4': 'C',
    'Q5': 'B',
    'Q6': 'A',
    'Q7': 'B',
    'Q8': 'B',
    'Q9': 'C'
}

# Calculate score
score = 0
for q in correct_answers:
    if your_answers[q].upper() == correct_answers[q]:
        score += 1
        print(f"✅ {q}: Correct!")
    else:
        print(f"❌ {q}: Your answer: {your_answers[q]}, Correct: {correct_answers[q]}")

print(f"\n🎯 Your Score: {score}/9 ({score/9*100:.1f}%)")

# Recommendation
if score <= 3:
    level = "Level 1 (Beginner)"
    recommendation = "Start with the fundamentals! Level 1 will build your foundation."
elif score <= 6:
    level = "Level 2 (Intermediate)"
    recommendation = "You have good basics! Level 2 will help you integrate concepts."
else:
    level = "Level 3 (Advanced)"
    recommendation = "Strong foundation! Jump to Level 3 for real-world challenges."

print(f"\n🚀 Recommended Starting Level: {level}")
print(f"💡 {recommendation}")

## 🎯 Your Learning Path

Based on your score, here's your recommended path:

### If you scored 0-3 (Level 1 - Beginner)
**Perfect!** You're exactly where you should be. Start with:
1. [Basic Cross-Validation](./level1_basic_cross_validation.ipynb)
2. [Understanding Metrics](./level1_understanding_metrics.ipynb)
3. [Simple Hyperparameter Tuning](./level1_simple_hyperparameter_tuning.ipynb)

**Focus**: Take your time with each concept. Understanding is more important than speed.

### If you scored 4-6 (Level 2 - Intermediate)
**Great foundation!** You can start with intermediate exercises:
1. [Complete Model Evaluation Pipeline](./level2_complete_evaluation_pipeline.ipynb)
2. [Advanced Hyperparameter Optimization](./level2_advanced_hyperparameter_optimization.ipynb)
3. [Model Selection and Validation](./level2_model_selection_validation.ipynb)

**Tip**: If any Level 2 exercise feels too challenging, don't hesitate to review Level 1 materials.

### If you scored 7-9 (Level 3 - Advanced)
**Excellent knowledge!** Jump into real-world scenarios:
1. [Healthcare Diagnostic Model](./level3_healthcare_diagnostic.ipynb)
2. [Financial Risk Assessment](./level3_financial_risk_assessment.ipynb)
3. [Production Model Monitoring](./level3_production_monitoring.ipynb)

**Challenge**: Try to complete exercises without looking at hints first.

## 📚 Additional Resources

Regardless of your level, these resources will help:

- **[Main Tutorial](../tutorial.ipynb)**: Comprehensive walkthrough
- **[Cross-Validation Guide](../cross-validation.md)**: Deep dive into CV techniques
- **[Hyperparameter Tuning Guide](../hyperparameter-tuning.md)**: Advanced optimization strategies
- **[Metrics Guide](../metrics.md)**: Complete metrics reference

## 🎉 Ready to Start?

Click on your recommended exercise above to begin your learning journey!

Remember: **Learning is a journey, not a race.** Take your time, ask questions, and enjoy the process! 🚀