# Diagnostic Testing Accuracy Practice

A healthcare provider is testing a new diagnostic method using a blood test to detect a particular disease. Your task is to analyze the performance of this test using simulated data.

In [7]:
import pandas as pd
import numpy as np
from simulate import DiagnosticSimulator

np.random.seed(30)

## Your Task

1. Understand what this simulated dataset consist of
2. Build a confusion matrix
3. Compute test performance metrics:
4. Reflection

In [8]:
sim = DiagnosticSimulator()
data = sim.simulate()

### Understand what this simulated dataset consists of

### Build a confustion matrix

**Note:** Your are building this from scratch. No use of external libraries to create the confusion matrix.

### Compute test performance metrics

- Sensitivity
- Specificity
- Positive Predictive Value
- Negative Predictive Value
- Likelihood Ratios(LR+ and LR-)

### Reflection

1. Based on sensitivity and specificity, how accurate is the test?

2. If a patient tests positive, how confident can we be that they truly have the disease?

3. How would the predictive values change if the disease were rare(e.g. <5% prevalence)?

4. Which metric would be most important for:
    - A screening test for early detection
    - A confirmatory test to finalize diagnosis

### Optional Exercise

Try rerunning the simulation with a different prevalence(e.g., 5%, 50%, 70%) and observe how the PPV and NPV change, even if sensitivity and specificity remain the same.

In [3]:
# reset the same seed from earlier
np.random.seed(30)

# pass additional arguments to simulator
sim = DiagnosticSimulator(n_patients=1000, prevalence=0.384)
data = sim.simulate()