In [3]:
import sys
from pathlib import Path

# Get the absolute path of the current notebook
notebook_path = Path().resolve()

# Get the project root directory (which is the parent of the 'notebooks' directory)
project_root = notebook_path.parent

# Add BOTH the project root and the src directory to the Python path
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))
if str(project_root / 'src') not in sys.path:
    sys.path.append(str(project_root / 'src'))

# Now, we can import our modules
from src.data_handling import DataHandler
from src.ridge_model import RidgeModel

### **Model 1: Standard Ridge Regression**
Now, we will execute the full pipeline for a standard Ridge regression model without using PCA.

#### Initialize DataHandler
This cell creates an instance of the `DataHandler` class. It will load the dataset, define the feature/target sets, and pre-fit the weighted scalers and PCA objects for the training data. We will set the test year to 2020.

In [4]:
dh = DataHandler(test_year=2020)

DataHandler initialized - Using 52 features - Test year: 2020


#### Initialize RidgeModel Handler
Here, we create an instance of the `RidgeModel` class. We give it a unique `model_name` (`'ridge_base'`) which will be used to name all the output files (study, model, and predictions).

In [5]:
ridge_base = RidgeModel(dh, model_name='ridge_base')

RidgeModel initialized with model name: ridge_base
Optuna study will be stored in  : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/optuna/ridge_base_study.pkl
Trained model will be stored in : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/models/ridge_base_model.pkl
Final preds will be stored in   : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/preds/ridge_base_preds.csv


#### Run Optuna Hyperparameter Study
This cell runs the Optuna hyperparameter search. It will perform 3-fold cross-validation on the training years (2008, 2012, 2016) to find the optimal `alpha` value that minimizes the weighted mean squared error. The resulting study object is saved to the results directory.

In [6]:
ridge_base.run_optuna_study(n_trials=50, use_pca=False)

[I 2025-10-23 08:45:18,943] A new study created in memory with name: ridge_base_study
[I 2025-10-23 08:45:18,981] Trial 0 finished with value: 0.001559380533786808 and parameters: {'alpha': 1.0991360257357304e-06}. Best is trial 0 with value: 0.001559380533786808.
[I 2025-10-23 08:45:19,021] Trial 1 finished with value: 0.001561051008376702 and parameters: {'alpha': 0.0014574187393831734}. Best is trial 0 with value: 0.001559380533786808.
[I 2025-10-23 08:45:19,050] Trial 2 finished with value: 0.0015597709846077955 and parameters: {'alpha': 0.0025913925855668425}. Best is trial 0 with value: 0.001559380533786808.
[I 2025-10-23 08:45:19,084] Trial 3 finished with value: 0.001559708869201289 and parameters: {'alpha': 4.262245744760241e-05}. Best is trial 0 with value: 0.001559380533786808.
[I 2025-10-23 08:45:19,108] Trial 4 finished with value: 0.0015220845785253046 and parameters: {'alpha': 0.045681114172742676}. Best is trial 4 with value: 0.0015220845785253046.
[I 2025-10-23 08:45:1

Best alpha: 0.25451690813373756
Optuna study saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/optuna/ridge_base_study.pkl


#### Train Final Model
Using the best `alpha` found by Optuna, this cell trains the final Ridge model on the entire training dataset (2008, 2012, and 2016 combined). The trained model object is then saved to the results directory.

In [7]:
ridge_base.train_final_model()

Updated best_params: {'alpha': 0.25451690813373756}
Model saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/models/ridge_base_model.pkl


#### Generate Final Predictions
Finally, we load the trained model and use it to make predictions on the held-out 2020 test set. The predictions are saved to a CSV file in the results directory.

In [8]:
preds_base = ridge_base.make_final_predictions()
print("\nBase Model Predictions (first 5 rows):")
print(preds_base[:5])

Predictions saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/preds/ridge_base_preds.csv

Base Model Predictions (first 5 rows):
[[0.15881518 0.25941199 0.0203533  0.56141953]
 [0.14980317 0.29484541 0.03149691 0.52385451]
 [0.14949427 0.19768357 0.01641405 0.63640811]
 [0.12391762 0.22150447 0.01862553 0.63595238]
 [0.10586231 0.27138053 0.01973731 0.60301985]]


### **Model 2: Ridge Regression with PCA**
Next, we will repeat the pipeline for a Ridge model that uses principal components as features.

#### Initialize PCA-based RidgeModel Handler
We initialize a new `RidgeModel` instance with a different name (`'ridge_pca'`) to keep its artifacts separate from the base model.

In [9]:
ridge_pca = RidgeModel(dh, model_name='ridge_pca')

RidgeModel initialized with model name: ridge_pca
Optuna study will be stored in  : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/optuna/ridge_pca_study.pkl
Trained model will be stored in : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/models/ridge_pca_model.pkl
Final preds will be stored in   : /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/preds/ridge_pca_preds.csv


#### Run Optuna Hyperparameter Study (with PCA)
This cell runs the Optuna study again. This time, by setting `use_pca=True`, the objective function will not only tune the `alpha` parameter but also the `n_components` parameter—the number of principal components to use as features.

In [10]:
ridge_pca.run_optuna_study(n_trials=50, use_pca=True)

[I 2025-10-23 08:45:32,507] A new study created in memory with name: ridge_pca_study
[I 2025-10-23 08:45:32,549] Trial 0 finished with value: 0.001545127517410182 and parameters: {'alpha': 0.0004438594813940582, 'n_components': 40}. Best is trial 0 with value: 0.001545127517410182.
[I 2025-10-23 08:45:32,598] Trial 1 finished with value: 0.0015614157558854578 and parameters: {'alpha': 0.0008111757330655691, 'n_components': 50}. Best is trial 0 with value: 0.001545127517410182.
[I 2025-10-23 08:45:32,628] Trial 2 finished with value: 0.0017127594147635329 and parameters: {'alpha': 0.02775326881875697, 'n_components': 10}. Best is trial 0 with value: 0.001545127517410182.
[I 2025-10-23 08:45:32,664] Trial 3 finished with value: 0.0015594292054799685 and parameters: {'alpha': 6.893447543607435e-06, 'n_components': 50}. Best is trial 0 with value: 0.001545127517410182.
[I 2025-10-23 08:45:32,695] Trial 4 finished with value: 0.0015594229130273158 and parameters: {'alpha': 6.138075417554358

Best alpha: 0.2616955391008058
Best n_components: 50
Optuna study saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/optuna/ridge_pca_study.pkl


#### Train Final PCA Model
Using the best `alpha` and `n_components` found by Optuna, this cell trains the final model on the PCA-transformed training data. The model is saved to the results directory.

In [11]:
ridge_pca.train_final_model()

Updated best_params: {'alpha': 0.2616955391008058, 'n_components': 50}
Model saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/models/ridge_pca_model.pkl


#### Generate Final Predictions (with PCA)
Finally, we generate and save predictions from our PCA-based model on the (identically transformed) 2020 test data.

In [12]:
preds_pca = ridge_pca.make_final_predictions()
print("\nPCA Model Predictions (first 5 rows):")
print(preds_pca[:5])

Predictions saved to: /Users/arvindsuresh/Documents/Github/Election-prediction-May-2025/2020-results-20251023/preds/ridge_pca_preds.csv

PCA Model Predictions (first 5 rows):
[[0.15906742 0.25930651 0.02035531 0.56127076]
 [0.1499611  0.29471339 0.03146421 0.5238613 ]
 [0.14960783 0.1976685  0.01637518 0.63634849]
 [0.12391852 0.22153466 0.01860323 0.63594359]
 [0.10600766 0.27135157 0.01971709 0.60292368]]
