### 1. Setup and Environment
This section will mount Google Drive and install dependencies.
CSE 6363 Project-Sentiment Analysis With BERT-based-uncased

In [1]:
from google.colab import drive
import os

print("Mounting Google Drive...")
drive.mount('/content/drive')
print("Drive mounted successfully.")

project_path = "/content/drive/My Drive/CSE 6363 Project-Sentiment Analysis With BERT-based-uncased/Sentiment Analysis"

print(f"Changing directory to: {project_path}")
os.chdir(project_path)


Mounting Google Drive...


ValueError: mount failed

### 2. Generate Model Probabilities (Phase 2-Step 1)

**Goal:** Before we can test Hypothesis 2 (Probability Calibration), we must first get the raw, uncalibrated probabilities from our best-performing binary classification model.

**What This Cell Does:**
The command below executes our `run_probability_generation.py` script. This script is the starting point for both Phase 2 and Phase 3 of our project. It will:

1.  **Load our "Champion" Model:** We are specifying `--config configs/bert_full_finetune.yaml` and `--seed 123`, which loads our best-performing model from Phase 1 (`bert_full_finetune_seed123.pt`).
2.  **Run Inference:** It runs this model on the complete **validation set** (`validation_clean.csv`) and **test set** (`test_clean.csv`).
3.  **Extract Probabilities:** For every review, it calculates the softmax probability and saves *only* the probability for the "positive" class (class 1).
4.  **Save Output:** It saves these probabilities and their corresponding true labels into two `.npz` files in the `outputs/probabilities/` directory.

The validation set outputs (`*_validation_outputs.npz`) will be used to *fit* our calibration model (Isotonic Regression). The test set outputs (`*_test_outputs.npz`) will be used to *evaluate* the "before vs. after" performance of our calibration.

In [None]:
# We use our champion model: config=bert_full_finetune and seed=123
!python run_probability_generation.py --config configs/bert_full_finetune.yaml --seed 123

2025-11-11 03:25:48.614794: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1762831548.635133   17731 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1762831548.641293   17731 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1762831548.657392   17731 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1762831548.657423   17731 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1762831548.657425   17731 computation_placer.cc:177] computation placer alr

### 3. Run Probability Calibration (Phase 2-Step 2)

**Goal:** Test Hypothesis 2. Now that we have our uncalibrated probabilities, we will use the `src/postprocessing/calibrate.py` script to:

1.  **Calculate Baseline Metrics:** Measure the "before" ECE and Brier Score of our uncalibrated test set probabilities.
2.  **Fit Calibrator:** Load the **validation set** probabilities (`..._validation_outputs.npz`) and use them to fit an `IsotonicRegression` model. This model learns the "correction function."
3.  **Apply Calibrator:** Use the *fitted* model to "correct" our **test set** probabilities.
4.  **Calculate Calibrated Metrics:** Measure the "after" ECE and Brier Score on the new, calibrated probabilities.
5.  **Report Results:** Show the percentage improvement and confirm if Hypothesis 2 was supported.


In [None]:
!python src/postprocessing/calibrate.py --run_name "bert_full_finetune_seed123.pt"

python3: can't open file '/content/src/postprocessing/calibrate.py': [Errno 2] No such file or directory


### 4. Run Ordinal Rating Mapping (Phase 3)

**Goal:** Test Hypothesis 3. This is the final step of our project.

**What This Cell Does:**
The command below executes our new `src/postprocessing/ordinal.py` script. This script will:

1.  **Load All Data:** Load the validation and test set data from our `.npz` files (which now include probabilities, binary labels, and 1-10 star ratings).
2.  **Fit Calibrator:** Re-fit the `IsotonicRegression` model on the validation data to get our "correction function."
3.  **Generate Two Probability Sets:**
    * **Uncalibrated:** The raw probabilities from the test set.
    * **Calibrated:** The "corrected" probabilities after applying our calibrator.
4.  **Map to Star Ratings:** Apply our 1-5 star mapping logic to *both* sets of probabilities.
5.  **Calculate Final Metrics:** Calculate the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and correlation coefficients (Pearson & Spearman) for both mappings.
6.  **Report Results:** Print a final comparison table to show if the calibrated probabilities (as we hypothesized) produce a more accurate star rating prediction.
7.  **Generate Visualization:** Create and save a box plot comparing the probability distributions for each true star rating.

In [None]:
!pip install matplotlib seaborn

!python src/postprocessing/ordinal.py --run_name "bert_full_finetune_seed123.pt"

Loading data for: bert_full_finetune_seed123.pt

Loaded 2500 validation samples.
Loaded 25000 test samples (with 1-10 star ratings).

--- Fitting Calibrator (Phase 2 logic) ---
IsotonicRegression model fitted on validation data.

--- Calculating Ordinal Metrics (Phase 3) ---

--- Final Results (Hypothesis 3) ---
Comparison of Ordinal Mapping Performance (1-5 Scale):
|              |     MAE |    RMSE |   Pearson r |   Spearman œÅ |
|:-------------|--------:|--------:|------------:|-------------:|
| Uncalibrated |  0.5191 |  0.8558 |      0.8893 |       0.8198 |
| Calibrated   |  0.5076 |  0.8235 |      0.8917 |       0.8221 |
| % Change     | -2.2068 | -3.7748 |      0.2710 |       0.2741 |

Hypothesis 3 Confirmed: Calibrated probabilities produced more accurate ordinal ratings.

Generating plot and saving to /content/drive/MyDrive/CSE 6363 Project-Sentiment Analysis With BERT-based-uncased/Sentiment Analysis/src/postprocessing/../../outputs/plots/bert_full_finetune_seed123.pt_ordinal_