In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, RocCurveDisplay, ConfusionMatrixDisplay

## Data Loading

In [2]:
data = pd.read_csv("Autism_Prediction/train.csv")

In [4]:
data.head()

Unnamed: 0,ID,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,...,gender,ethnicity,jaundice,austim,contry_of_res,used_app_before,result,age_desc,relation,Class/ASD
0,1,1,0,1,0,1,0,1,0,1,...,f,?,no,no,Austria,no,6.351166,18 and more,Self,0
1,2,0,0,0,0,0,0,0,0,0,...,m,?,no,no,India,no,2.255185,18 and more,Self,0
2,3,1,1,1,1,1,1,1,1,1,...,m,White-European,no,yes,United States,no,14.851484,18 and more,Self,1
3,4,0,0,0,0,0,0,0,0,0,...,f,?,no,no,United States,no,2.276617,18 and more,Self,0
4,5,0,0,0,0,0,0,0,0,0,...,m,?,no,no,South Africa,no,-4.777286,18 and more,Self,0


In [6]:
scores = ['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score', 'A6_Score',
          'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score']

In [9]:
x_train = data[scores]
y_train = data["Class/ASD"].values

In [12]:
y_train.shape
x_train.shape

(800, 10)

## Model Fitting

In [40]:
### Your code here

## Model Validation

## Understanding the ROC Curve

The **Receiver Operating Characteristic (ROC)** curve is a graphical representation used to evaluate the performance of a binary classification model. It plots the **True Positive Rate (TPR)** against the **False Positive Rate (FPR)** at various threshold settings. Below is a an illustration of the ROC:


<img src="https://upload.wikimedia.org/wikipedia/commons/1/13/Roc_curve.svg" alt="ROC Curve" width="400"/>


- **True Positive Rate (TPR)**: Also known as sensitivity or recall, it measures the proportion of actual positives correctly identified.

$$
  TPR = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
$$

- **False Positive Rate (FPR)**: It measures the proportion of actual negatives that were incorrectly classified as positive.
$$
  FPR = \frac{\text{False Positives}}{\text{False Positives} + \text{True Negatives}}
$$


### Interpretation
- A model with an AUC close to 1.0 is highly capable of distinguishing between classes.
- A model with an AUC near 0.5 performs no better than random chance.

The ROC curve is useful for visualizing the trade-off between sensitivity and specificity across different thresholds.


In [85]:
### Your code here

In [None]:
display = RocCurveDisplay.from_predictions(
    ### Your code here
    color="#0000AA",
    plot_chance_level=True,
)

_ = display.ax_.set(
    xlabel="False Positive Rate",
    ylabel="True Positive Rate",
)

## Understanding the Confusion Matrix

The **Confusion Matrix** is a table that summarizes the performance of a classification model by comparing the predicted labels with the actual labels. It provides a detailed breakdown of correct and incorrect classifications for each class.

### Structure of a Confusion Matrix:
|                | **Predicted: Positive** | **Predicted: Negative** |
|----------------|--------------------------|--------------------------|
| **Actual: Positive** | True Positive (TP)        | False Negative (FN)        |
| **Actual: Negative** | False Positive (FP)       | True Negative (TN)         |

### Memory items:
1. **True Positive (TP)**: Cases where the model correctly predicts the positive class.
2. **False Negative (FN)**: Cases where the model fails to predict the positive class (misses a positive case).
3. **False Positive (FP)**: Cases where the model incorrectly predicts the positive class (false alarm).
4. **True Negative (TN)**: Cases where the model correctly predicts the negative class.

### Some Metrics Derived from the Confusion Matrix:
- **Accuracy**: The overall correctness of the model.
$$
  \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

- **Precision**: The proportion of positive predictions that are actually correct.
$$
  \text{Precision} = \frac{TP}{TP + FP}
$$

- **Recall (Sensitivity)**: The proportion of actual positives that are correctly identified.
$$
  \text{Recall} = \frac{TP}{TP + FN}
$$

- **F1-Score**: The harmonic mean of precision and recall.
$$
  \text{F1-Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$


In [3]:
### Your code here

## Using the "Clinical Score"

In [4]:
### Your code here

### Going Deep With Neural Networks