# 🚀 Kaggle Submission - Make Predictions on Test Set

## 📚 Overview

Time to make predictions and submit to Kaggle! You'll:
- Load your trained model
- Preprocess test data
- Generate predictions
- Create submission file
- Submit to Kaggle competition

## 🎯 Learning Objectives

1. **Load trained model** from saved state
2. **Apply preprocessing** to new, unseen test data
3. **Generate predictions** in batch mode
4. **Create Kaggle submission** format
5. **Submit and get feedback** from leaderboard

---

## TODO 1: Load Trained Model and Test Data

- Load your best model from `../models/disaster_classifier.pth`
- Load test data from `../data/raw/test.csv`
- Load your vocabulary (vocab_dict)
- Set model to eval mode

---

## TODO 2: Preprocess Test Data

Apply the SAME preprocessing as training data:
- Use functions from `01_preprocessing.ipynb`
- Convert text to sequences using your vocabulary
- Handle unknown words with `<UNK>` token
- Pad/truncate to same max_length (50)
- Create DataLoader (no shuffle needed)

**Critical**: Preprocessing must match training exactly!

---

## TODO 3: Generate Predictions

```python
model.eval()
predictions = []

with torch.no_grad():
    for texts in test_loader:
        outputs = model(texts)
        preds = (torch.sigmoid(outputs) > 0.5).int()
        predictions.extend(preds.cpu().numpy())
```

---

## TODO 4: Create Submission File

Kaggle expects this format:
```
id,target
0,1
2,1
3,1
9,0
...
```

```python
submission = pd.DataFrame({
    'id': test_df['id'],
    'target': predictions
})
submission.to_csv('../submissions/submission.csv', index=False)
```

---

## TODO 5: Validate Submission Format

Check before submitting:
- Correct number of rows (3,263 for this competition)
- Columns are 'id' and 'target'
- No missing values
- target is 0 or 1 only

---

## TODO 6: Submit to Kaggle 🎉

1. Go to competition page
2. Click "Submit Predictions"
3. Upload your `submission.csv`
4. Wait for score!

**Competition metric**: F1-Score

---

## 💡 Next Steps

After your first submission:
- Analyze which tweets were misclassified
- Try different architectures (LSTM, GRU)
- Experiment with hyperparameters
- Move to Phase 2: HuggingFace Transformers!
