# Results Summary - Code Comment Classification

This notebook summarizes the complete pipeline results and provides conclusions.

## Pipeline Overview:

1. **Data Cleaning** (`data_cleaning.ipynb`)
   - BERT similarity analysis identified most confusable categories
   - Automatically merged most "similar" category based on cosine similarity.
   - Reduced the number of target classes to 4 reducing the chance of misidentification and making all the categories more distinct.
   - Split into train and test sets

2. **Feature Encoding** (`encoding.ipynb`)
   - BERT embeddings: 384 features
   - Class one-hot encoding: 306 features
   - Metadata features: 5 features
   - **Total: 695 features**

3. **Baseline Model** (`model_training.ipynb`)
   - GridSearchCV with Logistic Regression
   - Best CV F1-Macro: 0.832
   - Test Accuracy: 83.2%

4. **Multi-Model Comparison** (`multi_model_training.ipynb`)
   - Tested 4 algorithms
   - Logistic Regression: 0.832 (best)
   - Linear SVC: 0.821
   - SGD Classifier: 0.813
   - Random Forest: 0.794 (worst)

## Conclusions and Key Findings

### Final Model Performance

This notebook implements an **automatic category merging** approach based on BERT similarity analysis combined with BERT embeddings and metadata features for classification.

**Best Model:** Logistic Regression with RandomOverSampler
- **Test Accuracy: 82.2%**
- **F1-Macro Score: 0.83**
- **Cross-Validation F1-Macro: 0.832**

### Per-Category Performance

**Strong Categories:**
- **DevelopmentNotes (Class 0):** F1=0.91 ✓ - Clear vocabulary, rich in context clues (parameter-related keywords)
- **Expand (Class 1):** F1=0.82 ✓ - Largest class after merge, well-defined

Overall all classes have the same average F1, recall and precision scores which means that the model can predict with a high confidence rate each category.

### Model Comparison

Tested 4 different algorithms (all with RandomOverSampler):

| Model | Mean F1-Macro | Std Dev | Winner |
|-------|-------------|---------|--------|
| Logistic Regression | 0.832374 | 0.003923 | ✓ Best |
| Linear SVC | 0.821078 | 0.007529 | |
| SGD Classifier | 0.813973 | 0.019767 | |
| Random Forest | 0.794452 | 0.005036 | ❌ Worst |

Linear models (Logistic Regression, SVC) significantly outperform Random Forest when using dense BERT embeddings.

### Conclusion

This pipeline successfully demonstrates:
- **Automatic category merging** based on semantic similarity analysis
- **BERT embeddings + metadata** achieve 83.2% accuracy (F1-Macro: 0.83)
- **Logistic Regression** outperforms complex models on this task
- **Proper data handling** prevents leakage and ensures fair evaluation

## Generated Files

### Data Files:
- `code-comment-classification-cleaned.csv` - Cleaned dataset (2,812 rows)
- `code-comment-classification-train-unbalanced.csv` - Training set (2,249 rows)
- `code-comment-classification-test.csv` - Test set (563 rows)

### Encoded Features:
- `train_features_4cat_bert_meta.npz` - Training features (2,249 × 695)
- `test_features_4cat_bert_meta.npz` - Test features (563 × 695)
- `train_target_4cat_meta.csv` - Training labels
- `test_target_4cat_meta.csv` - Test labels

### Encoders:
- `class_encoder_4cat_meta.pkl` - OneHotEncoder for class names
- `bert_model_4cat_meta.pkl` - SentenceTransformer model
- `label_encoder_4cat_meta.pkl` - LabelEncoder for categories

### Trained Models:
- `best_model_final.pkl` - Best performing model (Logistic Regression)

## How to Use

### Run the complete pipeline
```bash
jupyter notebook complete-pipeline.ipynb
```
### Run each step of the pipeline separately:
```bash
# 1. Clean data and merge categories
jupyter notebook data_cleaning.ipynb

# 2. Encode features
jupyter notebook encoding.ipynb

# 3. Train baseline model
jupyter notebook model_training.ipynb

# 4. Compare multiple models
jupyter notebook multi_model_training.ipynb

# 5. View results summary
jupyter notebook results_summary.ipynb
```

### Load and use the trained model:
```python
import joblib

# Load model
model = joblib.load("outputs/best_model_final.pkl")

# Load encoders
class_encoder = joblib.load("outputs/class_encoder_4cat_meta.pkl")
bert_model = joblib.load("outputs/bert_model_4cat_meta.pkl")
label_encoder = joblib.load("outputs/label_encoder_4cat_meta.pkl")

# Make predictions on new data
# (after encoding features the same way)
predictions = model.predict(X_new)
```