# Assignment: Advanced Text Classification and Model Interpretability

**Background:** This assignment advances your understanding of **text classification** beyond basic sentiment analysis. You will implement multiple approaches to text classification, compare traditional machine learning with modern transformer models, and critically analyze model behavior, interpretability, and real-world deployment considerations. The focus is on **understanding the complete pipeline** from data preprocessing to model interpretation rather than just achieving high accuracy.

## Instructions and Point Breakdown

### 1. **Multi-Modal Text Classification Setup (2 points)**

- Use the provided **Rotten Tomatoes dataset** as your base, but extend it by creating additional classification tasks:
  - **Binary sentiment**: Positive/Negative (baseline task)
  - **Fine-grained sentiment**: 5-class ordinal classification (1-5 stars equivalent)
  - **Aspect-based classification**: Extract and classify mentions of specific movie aspects (acting, plot, cinematography, etc.)

- **Implementation Requirements:**
  - Load and explore the dataset structure
  - Create the multi-class labels through data augmentation or external labeling
  - Justify your approach to creating the additional classification tasks

- **Questions:**
  - Why might fine-grained sentiment be more challenging than binary classification?
  - What are the trade-offs between single-task and multi-task learning for these related problems?

### 2. **Comparative Algorithm Implementation (3 points)**

- Implement **three different approaches** to text classification:
  
  **Traditional ML Pipeline:**
  - TF-IDF or Count Vectorization + Logistic Regression/SVM
  - Include proper preprocessing (tokenization, stopword removal, etc.)
  
  **Embedding-Based Approach:**
  - Sentence transformers or pre-trained word embeddings
  - Simple neural network classifier on top of embeddings
  
  **Transformer Fine-tuning:**
  - Fine-tune a pre-trained model (BERT, RoBERTa, or similar)
  - Use proper train/validation/test splits

- **Questions:**
  - What are the computational and memory trade-offs between these approaches?
  - How does performance scale with dataset size for each method?
  - Which approach generalizes better to out-of-distribution data?

### 3. **Model Interpretability and Error Analysis (2 points)**

- **Interpretability Investigation:**
  - For the traditional ML model: Analyze top features/words for each class
  - For the transformer model: Extract attention weights and analyze what the model focuses on
  - Compare interpretability between approaches

- **Systematic Error Analysis:**
  - Identify classes of examples where each model fails
  - Analyze length bias, domain bias, and linguistic complexity effects
  - Create confusion matrices and analyze misclassification patterns

- **Critical Questions:**
  - Do the models learn semantically meaningful patterns or exploit spurious correlations?
  - How do attention patterns relate to human understanding of sentiment indicators?
  - What are the implications of model opacity for real-world deployment?

### 4. **Robustness and Adversarial Testing (2 points)**

- **Robustness Evaluation:**
  - Test model performance on adversarial examples (negation handling, sarcasm, etc.)
  - Evaluate performance on texts with neutral sentiment or mixed sentiments
  - Test cross-domain generalization (if possible, evaluate on different review domains)

- **Bias and Fairness Analysis:**
  - Investigate potential biases in model predictions
  - Test performance across different text lengths and writing styles
  - Analyze failure modes and edge cases

- **Critical Questions:**
  - How do different preprocessing choices affect model robustness?
  - What are the ethical implications of deploying sentiment analysis models?
  - How would you detect and mitigate bias in a production system?

### 5. **Real-World Deployment Considerations (1 point)**

- **Technical Reflection:**
  - Compare inference speed, memory usage, and scalability of your approaches
  - Discuss strategies for handling class imbalance and concept drift
  - Propose methods for continuous model monitoring and updating

- **Critical Analysis Questions:**
  - How would you design a feedback loop to improve model performance over time?
  - What are the key considerations for deploying text classification in production?
  - How do you balance model complexity with interpretability requirements?
  - What metrics beyond accuracy are important for evaluating production ML systems?

## Submission Requirements

- **Jupyter Notebook** containing:
  - Complete implementation of all three approaches
  - Thorough experimental comparison with proper statistical testing
  - Visualizations of model interpretability (attention maps, feature importance, etc.)
  - Comprehensive error analysis with specific examples
  - Written responses to all critical thinking questions (2-3 paragraphs each)

- **Technical Implementation:** Use libraries including (not limited) `transformers`, `sentence-transformers`, `sklearn`, `matplotlib`, `seaborn`, and `pandas`

- **Experimental Rigor:** Include proper cross-validation, statistical significance testing, and ablation studies where appropriate

**Grading Rubric:**

| Section                                    | Points |
|:-------------------------------------------|:------:|
| Multi-modal classification setup           | 2      |
| Comparative algorithm implementation       | 3      |
| Model interpretability & error analysis   | 2      |
| Robustness & adversarial testing          | 2      |
| Real-world deployment considerations       | 1      |
| **Total**                                 | **10** |

**Evaluation Criteria:**
- **Technical Implementation (35%):** Quality and correctness of implementations, proper experimental design
- **Critical Analysis (40%):** Depth of understanding demonstrated in written responses and experimental insights
- **Interpretability & Analysis (25%):** Quality of error analysis, attention visualization, and bias investigation

**Learning Objectives:**
By completing this assignment, students will understand the full lifecycle of text classification systems, from data preprocessing to production deployment, with emphasis on model interpretability, robustness, and ethical considerations in real-world applications.