## Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification technique used in machine learning. It aims to find the linear combination of features that best separate two or more classes. Unlike logistic regression, which focuses on estimating probabilities, LDA maximizes class separability by projecting data onto a lower-dimensional space.

### How It Works:
- **Assumption of Normality**: Assumes that the features follow a Gaussian distribution for each class.
- **Class Separation**: Finds a linear decision boundary by maximizing the ratio of between-class variance to within-class variance.
- **Bayes’ Theorem**: Uses Bayes' theorem to estimate the probability of a data point belonging to a class and assigns it to the class with the highest probability.
- **Dimensionality Reduction**: Reduces the number of features while retaining the most discriminative information.

### Advantages:
✅ **Handles Multi-Class Problems**: Unlike logistic regression, LDA naturally extends to multiple classes.  
✅ **Effective for Linearly Separable Data**: Works well when the class distributions have distinct means.  
✅ **Reduces Overfitting**: By projecting data onto lower dimensions, it can help prevent overfitting in high-dimensional datasets.  
✅ **Computationally Efficient**: Faster to train and evaluate compared to more complex models like Support Vector Machines (SVMs).  

### Disadvantages:
❌ **Assumption of Normality**: Performance may degrade if the feature distribution is highly non-Gaussian.  
❌ **Sensitive to Outliers**: Outliers can affect the mean and covariance estimates, leading to poor classification.  
❌ **Limited to Linear Boundaries**: Similar to logistic regression, it struggles with non-linear relationships unless extended with kernel methods.  
❌ **Requires Balanced Classes**: Works best when class distributions are approximately equal; otherwise, it may be biased toward the majority class.  


### LDA using Baseline Predictors (refer /Data/Data_Formatting.ipynb)

In [10]:
def make_yearly_predictions_lda_base(Train, Test):
    # Convert 'Date' columns to datetime and sort data
    best_shrinkage = find_best_shrinkage_base(Train)
    static_predictors =  parameters_base(Train,Test)

    # Train an LDA model
    lda_clf = LinearDiscriminantAnalysis(solver="lsqr",shrinkage=best_shrinkage)
    lda_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lda_clf.predict(test_year[static_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Linear Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### LDA using Baseline Predictors + Rolling Predictors (refer /Data/Data_Formatting.ipynb)

In [11]:
def make_yearly_predictions_lda_roll(Train, Test):
    # Convert 'Date' columns to datetime and sort data
    best_shrinkage = find_best_shrinkage_roll(Train) 
    all_predictors = parameters_roll(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

    # Train an LDA model
    lda_clf = LinearDiscriminantAnalysis(solver="lsqr" ,shrinkage=best_shrinkage ) 
    lda_clf.fit(Train[ all_predictors ], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lda_clf.predict(test_year[ all_predictors ])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Linear Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df


### LDA using  Full Feature Set (refer /Data/Data_Formatting.ipynb)

In [12]:
def make_yearly_predictions_lda_full(Train, Test):
    # Convert 'Date' columns to datetime and sort data
    best_shrinkage = find_best_shrinkage_full(Train)
    all_predictors = parameters_full(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

    # Train an LDA model
    lda_clf = LinearDiscriminantAnalysis(solver="lsqr",shrinkage=best_shrinkage)
    lda_clf.fit(Train[ all_predictors ], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lda_clf.predict(test_year[ all_predictors ])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Linear Discriminant Analysis",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    return results_df