# üöÄ MuRIL Sentiment Training - Google Colab
**Author:** Bhavika Baddur  
**Model:** MuRIL (Multilingual Representations for Indian Languages)  
**Target:** 80-85% accuracy on code-mixed Hindi-English reviews

---

## üìã What This Notebook Does:
1. Installs required packages
2. Uploads your balanced dataset
3. Uploads training script
4. Trains MuRIL model
5. Downloads trained model

**‚è∞ Expected Time:** 15-25 minutes on GPU

## ‚úÖ STEP 1: Install Dependencies
Run this cell first!

In [None]:
%%capture
!pip install transformers>=4.35.0
!pip install torch>=2.0.0
!pip install accelerate>=0.24.0
!pip install scikit-learn
!pip install imbalanced-learn

print("‚úÖ All packages installed!")

## üîç STEP 2: Check GPU Availability

In [None]:
import torch

if torch.cuda.is_available():
    print("‚úÖ GPU Available!")
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è No GPU detected!")
    print("üí° Go to Runtime > Change runtime type > Select GPU")
    print("   Then click 'Disconnect and delete runtime' and restart")

## üì§ STEP 3: Upload Files

Upload these 2 files from your computer:
1. `balanced_dataset.csv` (from `processed_data/`)
2. `06_train_muril.py` (from `scripts/`)

In [None]:
from google.colab import files
import os

print("üì§ UPLOAD FILE 1/2: balanced_dataset.csv")
print("From: C:\\Users\\HP\\OneDrive\\Desktop\\ecommerce-sentiment-project\\processed_data\\")
print("="*70)

uploaded = files.upload()

# Move to correct location
os.makedirs('processed_data', exist_ok=True)
if 'balanced_dataset.csv' in uploaded:
    import shutil
    shutil.move('balanced_dataset.csv', 'processed_data/balanced_dataset.csv')
    print("\n‚úÖ Dataset uploaded!")
    
    # Verify
    import pandas as pd
    df = pd.read_csv('processed_data/balanced_dataset.csv')
    print(f"   Total reviews: {len(df):,}")
    print(f"   Sentiment distribution:")
    print(df['sentiment'].value_counts())
else:
    print("‚ùå File not found! Please upload balanced_dataset.csv")

In [None]:
print("üì§ UPLOAD FILE 2/2: 06_train_muril.py")
print("From: C:\\Users\\HP\\OneDrive\\Desktop\\ecommerce-sentiment-project\\scripts\\")
print("="*70)

uploaded = files.upload()

if '06_train_muril.py' in uploaded:
    print("\n‚úÖ Training script uploaded!")
else:
    print("‚ùå File not found! Please upload 06_train_muril.py")

## üöÄ STEP 4: START TRAINING!

**‚è∞ This will take 15-25 minutes on GPU**

‚òï Perfect time for a coffee break!

**Watch for:**
- Training progress bars
- Evaluation loss decreasing
- Final accuracy at the end

In [None]:
import datetime

print("üöÄ STARTING MuRIL TRAINING")
print("="*70)
print(f"‚è∞ Start Time: {datetime.datetime.now().strftime('%I:%M %p')}")
print("="*70)
print("\n‚òï Grab a coffee! This takes 15-25 minutes...\n")
print("="*70)

!python 06_train_muril.py

print("\n" + "="*70)
print("üéâ TRAINING COMPLETE!")
print(f"‚è∞ End Time: {datetime.datetime.now().strftime('%I:%M %p')}")
print("="*70)

## üìä STEP 5: View Results

In [None]:
import pandas as pd
from IPython.display import Image, display

print("üìä FINAL RESULTS")
print("="*70)

# Show results CSV
results = pd.read_csv('reports/muril_results.csv')
print("\nüéØ Model Performance:")
display(results)

accuracy = results['Accuracy'].iloc[0]
print(f"\nüéØ Final Test Accuracy: {accuracy*100:.2f}%")

# Show confusion matrix
print("\nüìà Confusion Matrix:")
display(Image('reports/images/muril_confusion_matrix.png'))

# Show classification report
print("\nüìã Classification Report:")
with open('reports/muril_classification_report.txt', 'r') as f:
    print(f.read())

## üíæ STEP 6: Download Trained Model

Download all results to your computer!

In [None]:
from google.colab import files
import os

print("üì¶ Preparing files for download...")
print("="*70)

# Zip the model
print("\nüì¶ Zipping model (this takes 1-2 minutes)...")
!zip -r -q muril_model.zip models/muril_sentiment/
print("‚úÖ Model zipped!")

# Download model
print("\nüì• Downloading model...")
files.download('muril_model.zip')
print("‚úÖ Model download started!")

# Download reports
print("\nüì• Downloading confusion matrix...")
files.download('reports/images/muril_confusion_matrix.png')

print("\nüì• Downloading classification report...")
files.download('reports/muril_classification_report.txt')

print("\nüì• Downloading results CSV...")
files.download('reports/muril_results.csv')

print("\n" + "="*70)
print("‚úÖ ALL FILES DOWNLOADED!")
print("="*70)
print("\nCheck your Downloads folder for:")
print("  ‚Ä¢ muril_model.zip (~500 MB)")
print("  ‚Ä¢ muril_confusion_matrix.png")
print("  ‚Ä¢ muril_classification_report.txt")
print("  ‚Ä¢ muril_results.csv")

## üéâ CONGRATULATIONS!

You've successfully trained MuRIL!

### üìÅ Save Files on Your Computer:

1. **Unzip** `muril_model.zip` to:
   ```
   C:\Users\HP\OneDrive\Desktop\ecommerce-sentiment-project\models\muril_sentiment\
   ```

2. **Move** reports to:
   ```
   C:\Users\HP\OneDrive\Desktop\ecommerce-sentiment-project\reports\
   C:\Users\HP\OneDrive\Desktop\ecommerce-sentiment-project\reports\images\
   ```

### üöÄ Next Steps:
1. Review your accuracy and classification report
2. Test model on sample reviews
3. Integrate into dashboard
4. Compare with Random Forest baseline

---

**Great work, Bhavika! You're building something amazing!** ‚≠ê