## Document Analysis: Computational Methods - Summer Term 2025
### Lectures: Jun.-Prof. Dr. Andreas Spitz
### Tutorials: Julian Schelb

# Exercise 9

### You will learn about:

In this exercise, you will implement three custom classifiers for sentiment analysis in movie reviews and evaluate them. As a dataset, use the polarity dataset (v2): 
http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz. 
The data contains 1000 negative and 1000 positive movie reviews.

As a first step, split the data into a training set (900 negative and 900 positive reviews) and a test set (the remaining 200 reviews). 
Add labels to the data (0 for negative reviews, 1 for positive reviews). 
Make sure to shuffle both data sets. You will be using the same training and test data for all tasks.


In [None]:
# your code goes here


## Task 1: Rule-based Sentiment Classifier

Inspect positive and negative reviews in your training set and create a set of 10 rules for sentiment classification. 
You may also look at word counts and word distributions for positive and negative reviews in your training data (or their differences) to come up with rules. 
All rules should be encoded as regular expressions that either match or fail to match a text string. 
Combine your rules in a **ClassifySentimentRB('string')** function that outputs 0 if the sentiment of the input string is negative and 1 if the sentiment is positive.
Run your function on the test set, then compute the accuracy of your classifier and the confusion matrix.

In [None]:
# your code goes here


## Task 2 - Naive Bayes Sentiment Classifier

Implement a Naive Bayes classifier for sentiment classification and train it on word frequency features derived from the training data. 
You may use an existing Naive Bayes implementation (for example: https://scikit-learn.org/stable/modules/naive_bayes.html), but make sure to optimize it by engineering good features! 
Consider which preprocessing steps might be helpful and which might harm the performance, e.g. stopword removal or stemming. 
Wrap your code in a function **ClassifySentimentNB('string')** that outputs 0 if the sentiment of the input string is negative and 1 if the sentiment is positive.
Use the test set to evaluate your Naive Bayes sentiment classifier. 
Compute the accuracy of your classifier and the confusion matrix.



In [None]:
# your code goes here

## Task 3: Neural Sentiment Classification

Implement a neural sentiment classifier by fine-tuning DistilBERT model https://huggingface.co/docs/transformers/model_doc/distilbert . 
Use the training set as input for fine-tuning a model that is using a classification layer as the final layer. 
Make sure to fine-tune your classifier sufficiently (that is, run it for multiple epochs and check for convergence). 

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# this time, we will work with distilbert -- a distilled version of BERT (it is smaller, faster, and cheaper)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
# ...and we use an architecture that is specifically developed for the classification task; 
# here, we specify that our training data will have 2 labels, i.e., 0 for the negative sentiment and 1 for the positive sentiment
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) 

# the input review can be max 512 tokens long
max_length = 512

Note that some reviews may exceed the input token limit of your model. 
You may truncate reviews (=removing tokens at the end of the review that would exceed the limit) 
or experiment with alternate approaches (e.g., truncating from the middle to keep beginning and end, or splitting reviews into three sections and using majority voting, etc.). 
If you need to use a validation set, make sure to split it from your training data - do not use the test data for validation.
Evaluate your neural classifier on the test. Compute the accuracy of your classifier and the confusion matrix.

In [None]:
# prepare the training and test data here (and the validation data, if needed)
# tokenize the dataset, and deal with reviews that are longer than 512 tokens

In [None]:
# specify the parameters for learning, 
# train the model, 
# and evaluate it

# REMARK: you CAN use and extend this code, or write your own code from scratch! (whatever is easier for you)
# you can get some hints here: https://www.thepythoncode.com/article/finetuning-bert-using-huggingface-transformers-python
from transformers import Trainer, TrainingArguments

# these are just default settings, adapt them as needed
training_args = TrainingArguments(
    # specify your training arguments here
)

trainer = Trainer(
    # specify your trainer here
)

# train the model
trainer.train()

## Task 4: Error Analysis
For each of the above classifiers, randomly select 10 misclassified reviews (false positives or false negatives) and manually inspect them. 
Can you determine trends in the errors that the models make? 
Discuss your findings and the relative performances of the three sentiment classifiers that you implemented in Tasks 1-3.


<font color='ff000000'>\# TEXT SUBMISSION ANSWER HERE (Double click to edit) </font>


#### Submitting your results:

To submit your results, please:

- save this file, i.e., `ex??_assignment.ipynb`.
- if you reference any external files (e.g., images), please create a zip or rar archive and put the notebook files and all referenced files in there.
- login to ILIAS and submit the `*.ipynb` or archive for the corresponding assignment.

**Remarks:**
    
- Do not copy any code from the Internet. In case you want to use publicly available code, please, add the reference to the respective code snippet.
- Check your code compiles and executes, even after you have restarted the Kernel.
- Submit your written solutions and the coding exercises within the provided spaces and not otherwise.
- Write the names of your partner and your name in the top section.