## Quadratic Discriminant Analysis (QDA)

Quadratic Discriminant Analysis (QDA) is a classification technique that extends Linear Discriminant Analysis (LDA) by allowing **quadratic decision boundaries**. Unlike LDA, which assumes a **shared covariance matrix** across classes, QDA estimates a **separate covariance matrix** for each class. This makes QDA more flexible for capturing complex, non-linear relationships in data.

### How It Works:
- **Class-Specific Covariance Matrices**: Unlike LDA, which assumes identical covariance across all classes, QDA allows each class to have its own covariance structure.
- **Non-Linear Decision Boundaries**: The decision surface is quadratic rather than linear, enabling better separation for complex datasets.
- **Bayes’ Theorem**: Similar to LDA, QDA applies Bayes' theorem to estimate the probability of a data point belonging to each class.
- **No Dimensionality Reduction**: Unlike LDA, QDA does not project data onto a lower-dimensional space.

### Advantages:
✅ **Captures Non-Linear Patterns**: More flexible than LDA as it allows quadratic decision boundaries.  
✅ **Better for Complex Data**: Works well when class distributions have different covariance structures.  
✅ **No Assumption of Identical Covariance**: Each class has its own covariance matrix, leading to more precise decision boundaries.  
✅ **Effective for Well-Separated Data**: Performs well when class distributions are distinct and non-linearly separable.  

### Disadvantages:
❌ **Requires More Data**: Since QDA estimates a separate covariance matrix for each class, it needs **more training data** to avoid overfitting.  
❌ **Sensitive to Outliers**: Outliers can significantly distort the covariance estimates, affecting classification performance.  
❌ **Computationally Expensive**: Estimating multiple covariance matrices increases computational cost, especially in high-dimensional datasets.  
❌ **Not Always Better than LDA**: If the true class boundaries are linear, LDA may outperform QDA due to its simpler assumptions.  



In [24]:
#downloading all the necesaary dependecies
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score, precision_score
from sklearn.model_selection import cross_val_score, KFold

In [25]:
%run ../Data/Data_Formatting.ipynb

In [26]:
%run ../Data/Ultimate_Hyperparameters.ipynb

In [27]:
%run ../Data/Parameters.ipynb

In [28]:
pip install scikit-learn

Note: you may need to restart the kernel to use updated packages.


In [29]:
#loading the training dataset 
train_path = Path("../Data/premierleague_team_data.csv")
matches = pd.read_csv(train_path)

#loading the testing data 
test_path = Path("../Data/premierleague_test_team_data.csv")
test_matches = pd.read_csv(test_path)

In [30]:
#loading the training dataset with rank
train_path = Path("../Data/premierleague_rank_team_data.csv")
new_matches = pd.read_csv(train_path)

#loading the testing data with rank
test_path = Path("../Data/premierleague_rank_test_team_data.csv")
new_test_matches = pd.read_csv(test_path)

In [31]:
process_data(matches, test_matches)

In [32]:
process_data(new_matches, new_test_matches)

### QDA using Baseline Predictors (refer /Data/Data_Formatting.ipynb)

In [33]:
def make_yearly_predictions_qda_base(Train, Test):
    static_predictors =  parameters_base(Train,Test)

    # Train a QDA model
    qda_clf = QuadraticDiscriminantAnalysis()
    qda_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = qda_clf.predict(test_year[static_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Quadratic Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### QDA using Baseline Predictors + Rolling Predictors (refer /Data/Data_Formatting.ipynb)

In [34]:
def make_yearly_predictions_qda_roll(Train, Test):

    all_predictors = parameters_roll(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

 # Train a QDA model
    qda_clf = QuadraticDiscriminantAnalysis(solver="lsqr")
    qda_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds =qda_clf.predict(test_year[ all_predictors ])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Quadratic Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### QDA using  Full Feature Set (refer /Data/Data_Formatting.ipynb)

In [35]:
def make_yearly_predictions_qda_full(Train, Test):

    all_predictors = parameters_full(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

   # Train a QDA model
    qda_clf = QuadraticDiscriminantAnalysis(solver="lsqr")
    qda_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = qda_clf.predict(test_year[ all_predictors ])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Quadratic Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df