#DATASCI 266 - Summer 2025 - Project - Neha Dhage and Karl-Johan Westhoff

**Description:** This assignment notebook builds on the material from the
[lesson 4 notebook](https://github.com/datasci-w266/2025-summer-main/blob/master/materials/lesson_notebooks/lesson_4_BERT.ipynb),


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datasci-w266/2025-summer-main/blob/master/assignment/a2/Text_classification_BERT.ipynb)

The overall assignment structure is as follows:


0. Setup
  
  0.1 Libraries and Helper Functions

  0.2 Data Acquisition

  0.3. Data Preparation


1. Classification with BERT

  1.1. BERT Basics


**INSTRUCTIONS:**:




## 0. Setup

### 0.1. Libraries and Helper Functions

This notebook requires the Hugging Face datasets and other prerequisites that you must download.  

In [1]:
!pip install -q transformers
!pip install -q torchinfo
!pip install -q evaluate
!pip install --upgrade transformers
!pip install -q datasets fsspec huggingface_hub
!pip install -q --upgrade datasets fsspec huggingface_hub

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting transformers
  Downloading transformers-4.53.1-py3-none-any.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.53.1-py3-none-any.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m132.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.53.0
    Uninstalling transformers-4.53.0:
      Successfully uninstalled transformers-4.53.0
Successfully installed transformers-4.53.1
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m36.5 MB/s[0m eta [3

Now we are ready to do the imports.

In [2]:


import numpy as np

import transformers
import evaluate

from datasets import load_dataset
from torchinfo import summary

from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from transformers import BertTokenizer, BertModel, BertForSequenceClassification

import pandas as pd
import matplotlib.pyplot as plt


### 0.2 Data Acquisition


We will use the MentalManip and inspect it. (https://huggingface.co/datasets/audreyeleven/MentalManip).  We will then split into training and test sets.

In [3]:
# Load dataset
dataset = load_dataset("audreyeleven/MentalManip", name="mentalmanip_maj")


# Print the dataset structure and features to understand available keys
print(dataset)
print(dataset['train'].features)


# Access the 'train' split and convert it to a pandas DataFrame
df = pd.DataFrame(dataset['train'])


# Inspect
print(f"Total records: {len(df)}")
print(f"Average text length (in words): {df['dialogue'].apply(lambda x: len(x.split())).mean()}")
print(f"Maximum text length (in words): {df['dialogue'].apply(lambda x: len(x.split())).max()}")

README.md: 0.00B [00:00, ?B/s]

Some datasets params were ignored: ['license']. Make sure to use only valid params for the dataset builder and to have a up-to-date version of the `datasets` library.


mentalmanip_maj.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/4000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'manipulative', 'technique', 'vulnerability'],
        num_rows: 4000
    })
})
{'id': Value('int64'), 'dialogue': Value('string'), 'manipulative': Value('int64'), 'technique': Value('string'), 'vulnerability': Value('string')}
Total records: 4000
Average text length (in words): 94.275
Maximum text length (in words): 1245


It is always highly recommended to look at the data. What do the records look like?

In [4]:
for i in range(9):
  print(dataset['train']['dialogue'][i])
  print(dataset['train']['manipulative'][i])
  print(dataset['train']['technique'][i])
  print(dataset['train']['vulnerability'][i])
  print()

Person1: Jesus! Listen to this one: "Do you remember me? Airport shuttle, June 7th. You: striking redhead with yellow dress, pearl necklace, brown shoes. I was the bookish fellow in the green cardigan who helped you find your contact lens. Am I crazy, or did we have a moment?"
Person2: God, that's so pathetic. I bet she didn't even notice him.
Person1: I know. And he's like psychotically obsessing over every little detail.
Person2: We should call him and pretend to be the redhead.
Person1: Oh, we totally have to.
1
None
None

Person1: I don't know.
Person2: Don't go.
Person1: I have to... he's a G-34
Person2: You don't have to... I don't trust him.  You ought to file a report.
Person1: No, don't make trouble.  It's nothing.
1
Rationalization
None

Person1: You're mine, you know.  I made you.
Person2: I know.
Person1: If you went away, what would become of me?
Person2: I'm grown up now.  I have to leave some time.
Person1: Of course you do, and I want you to... but there's no need to hu

In [5]:

# Filter where manipulative == 0 and technique is not None
invalid_cases = df[(df['manipulative'] == 0) & (df['technique'].notna())]

# Display mismatched records
print(f"Number of mismatches: {len(invalid_cases)}")
if not invalid_cases.empty:
    print(invalid_cases[['dialogue', 'manipulative', 'technique']].head())
else:
    print("✅ All records with manipulative == 0 have technique == None")

# Split comma-separated techniques and flatten the list
techniques_series = df['technique'].dropna().apply(lambda x: [t.strip() for t in x.split(',')])
all_techniques = [tech for sublist in techniques_series for tech in sublist]

# Get unique and sorted techniques
unique_techniques = sorted(set(all_techniques))

# Create a numbered list
numbered_techniques = {i + 1: tech for i, tech in enumerate(unique_techniques)}
numbered_techniques

Number of mismatches: 0
✅ All records with manipulative == 0 have technique == None


{1: 'Accusation',
 2: 'Brandishing Anger',
 3: 'Denial',
 4: 'Evasion',
 5: 'Feigning Innocence',
 6: 'Intimidation',
 7: 'Persuasion or Seduction',
 8: 'Playing Servant Role',
 9: 'Playing Victim Role',
 10: 'Rationalization',
 11: 'Shaming or Belittlement'}

In [6]:
# Split the dataset (e.g., 80% train, 20% test)
split_dataset = dataset["train"].train_test_split(test_size=0.2, seed=42)

# Access the splits
train_data = split_dataset["train"]
test_data = split_dataset["test"]

# Optional: Verify the sizes
print(f"Train size: {len(train_data)}")
print(f"Test size: {len(test_data)}")

Train size: 3200
Test size: 800


For convenience, in this assignment we will define a sequence length and truncate all records at that length. For records that are shorter than our defined sequence length we will add padding characters to insure that our input shapes are consistent across all records.

In [7]:
MAX_SEQUENCE_LENGTH = 512

## 0.3. Data Preparation

We will need to tokenize the text into vocab_ids to pass into a BERT model. To do so, we'll need to use the specific tokenizer that goes with the model we're using. In this notebook, we will try several different BERT-style models. Let's
first write a function that will take the text from our dataset and a tokenizer, and encode the text using that tokenizer. Then we'll apply the function to our dataset for each tokenizer and model.

In [8]:

def preprocess_maj(batch, tokenizer):
    return tokenizer(
        batch['dialogue'],  # ← This is where we tokenize the 'dialogue' column
        padding='max_length',
        truncation=True,
        max_length=MAX_SEQUENCE_LENGTH,
        return_attention_mask=True,
        return_token_type_ids=True
    )


## 1. BERT-based Classification Models

Now we turn to classification with BERT. We will perform classifications with model that are based on pre-trained BERT models.  

### 1.1. Basics

Let us first explore some basics of BERT. We'll start by loading the first pretrained BERT model and tokenizer that we'll use ('bert-base-cased').

To explore just the pre-trained portion of the model, we'll use the AutoModel class (equivalent to BertModel, but works for any architecture including BERT). This class gives us the pre-trained model layers up until the last hidden layer (but not any output layer).

In [9]:
# bert_classification_model = BertForSequenceClassification.from_pretrained(checkpoint)

In [10]:
model_checkpoint_name = "bert-base-cased"
bert_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint_name)
bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)


# Apply to your dataset (assuming train_data and test_data are loaded)
train_encoded = train_data.map(preprocess_maj, batched=True, fn_kwargs={'tokenizer': bert_tokenizer})
test_encoded = test_data.map(preprocess_maj, batched=True, fn_kwargs={'tokenizer': bert_tokenizer})

# Rename the 'manipulative' column to 'labels'
train_encoded = train_encoded.rename_column("manipulative", "labels")
test_encoded = test_encoded.rename_column("manipulative", "labels")

# Remove the 'id', 'dialogue', 'technique', and 'vulnerability' columns as they are not needed for training
#train_encoded = train_encoded.remove_columns(["id", "dialogue", "technique", "vulnerability"])
#test_encoded = test_encoded.remove_columns(["id", "dialogue", "technique", "vulnerability"])

# Remove all irrelevant columns before training
columns_to_remove = ['id', 'dialogue', 'technique', 'vulnerability']
train_encoded = train_encoded.remove_columns([col for col in columns_to_remove if col in train_encoded.column_names])
test_encoded = test_encoded.remove_columns([col for col in columns_to_remove if col in test_encoded.column_names])


# Set format for PyTorch
train_encoded.set_format("torch", columns=["input_ids", "attention_mask", "token_type_ids", "labels"])
test_encoded.set_format("torch", columns=["input_ids", "attention_mask", "token_type_ids", "labels"])


# Explicitly select only the columns needed for training
train_encoded = train_encoded.select_columns(["input_ids", "attention_mask", "token_type_ids", "labels"])
test_encoded = test_encoded.select_columns(["input_ids", "attention_mask", "token_type_ids", "labels"])

print(train_encoded.column_names)



tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/3200 [00:00<?, ? examples/s]

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

['input_ids', 'attention_mask', 'token_type_ids', 'labels']


4.2 Model Training
To train a Huggingface model, we'll use a Trainer class, and a TrainingArguments class that goes with it.

Let's start with the TrainingArguments. This is just a simple config where we specify things like the batch size and number of epochs.

We also choose a filepath where we want to save model checkpoints after training. For now, we'll just define a local directory name, which will save the trained model in the Colab notebook's temporary storage.

For your assignments and project, you'll probably want to mount your Google Drive and specify a filepath to a directory there, so that the saved model checkpoints persist after the notebook is shut down.

In [11]:
training_args = TrainingArguments(
    output_dir="./bert_output",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,  # Increase for better learning
    eval_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="steps",                  # ✅ Enable logging
    logging_steps=10,                          # ✅ Log every N steps
    logging_dir="./logs",
    report_to='none',
    remove_unused_columns=False
)

In addition to model loss, we'll also want to keep track of a simple but more interpretable metric like validation accuracy, so that we can see how well the model is generalizing.

The trainer takes a "compute_metrics" argument, which needs to be a function that takes a set of predictions and labels and returns a metric. We can use the accuracy metric from the Huggingface evaluate package, and wrap it in the necessary function like this:



In [12]:
metric = evaluate.load('accuracy')

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

Downloading builder script: 0.00B [00:00, ?B/s]

Now we make our Trainer, passing it the model to use, the training arguments, the training and validation data, and our compute_metrics function.

In [13]:
trainer = Trainer(
    model=bert_classification_model,
    args=training_args,
    train_dataset=train_encoded,
    eval_dataset=test_encoded,
    compute_metrics=compute_metrics
)

... and train it!

In [14]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5744,0.564571,0.7125
2,0.5248,0.567236,0.71125
3,0.2476,0.734966,0.675
4,0.095,1.385089,0.70125
5,0.1089,1.558613,0.71625


TrainOutput(global_step=1000, training_loss=0.3132683160994202, metrics={'train_runtime': 1750.7975, 'train_samples_per_second': 9.139, 'train_steps_per_second': 0.571, 'total_flos': 4209776885760000.0, 'train_loss': 0.3132683160994202, 'epoch': 5.0})

### 2. Testing Different Pre-Trained BERT Models

In the live session we discussed classification with the `bert-base-cased` model, using the Huggingface class BertForSequenceClassification, which comes with a new output layer for our task that we need to train on our dataset.

We're going to try different pre-trained models now. Like in the lesson 4 notebook, we'll want to fine-tune each model on our IMDB reviews dataset and compare them with a metric like the validation accuracy. We'll use the model class AutoModelForSequenceClassification, which is equivalent to BertForSequenceClassification, but works for other similar architectures too.

Let's write the code we'll need as a function that takes the model and tokenizer as arguments, along with the raw train and dev data. The function will need to tokenize the inputs using the provided tokenizer, so that we can repeat the same code for different pre-trained models. Then the function should create the training args and trainer class, and call trainer.train().

The other hyperparameters you'll need are provided in the function definition, including batch_size and num_epochs. You should use the default values provided for those. Use the function provided below for compute_metrics.

For now, keep all layers of the pre-trained models you load unfrozen.

In [16]:
bert_classification_model = BertForSequenceClassification.from_pretrained(checkpoint)

NameError: name 'checkpoint' is not defined

In [None]:
for name, param in bert_classification_model.named_parameters():
    print(name)

In [None]:
metric = evaluate.load('accuracy')

def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
def fine_tune_classification_model(classification_model,
                                   tokenizer,
                                   train_data,
                                   dev_data,
                                   batch_size = 16,
                                   num_epochs = 2):
    """
    Preprocess the data using the given tokenizer (we've give you the code for that part).
    Create the training arguments and trainer for the given model and data (write your code for that).
    Then train it.
    """

    preprocessed_train_data = train_data.map(preprocess_imdb, batched=True, fn_kwargs={'tokenizer': tokenizer})
    preprocessed_dev_data = dev_data.map(preprocess_imdb, batched=True, fn_kwargs={'tokenizer': tokenizer})

    ### YOUR CODE HERE

    # training_args = ...
    # trainer = ...


    training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    logging_dir="./logs",
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    report_to="none"  # disables W&B/Hub reporting for now
    )

    trainer = Trainer(
        model=classification_model,
        args=training_args,
        train_dataset=preprocessed_train_data,
        eval_dataset=preprocessed_dev_data,
        compute_metrics=compute_metrics
    )


    ### END YOUR CODE

    trainer.train()

Let's try BERT-base-case first, the same model that was used in the lesson 4 notebook.

In [None]:
"""
Show the output from training BERT-base-cased on the IMDB movie reviews dataset.
"""

model_checkpoint_name = "bert-base-cased"
bert_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint_name)
bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)

fine_tune_classification_model(bert_classification_model, bert_tokenizer, imdb_train_dataset, imdb_dev_dataset)

### 3. Unfreezing Different Pre-Trained Layers

In the lesson 4 notebook, we tested freezing most or all of the pre-trained BERT model layers. We used the .named_parameters() method, looking at the specific names of each set of model parameters.

As in the lesson notebook, we will always want to make sure we keep the classification layer parameters unfrozen, since those need to be trained for our specific task. We will also keep the pooler layer unfrozen, since it's next closest to the classification layer and was only pre-trained in standard BERT models with the next sentence prediction task.

For the remaining layers, what happens if we unfreeze lower transformer blocks and keep higher transformer blocks frozen (the opposite of what we did in the lesson notebook)? What if we instead try unfreezing specific types of layers within each transformer block, e.g. all of the self attention layers, or all of the dense layers?

Let's modify our fine-tuning function, to add an argument for the layers that we want to train. We'll make that argument a list of strings, and we'll set the default to just unfreeze the classification layer. You'll need to write the code to compare those strings to the names of the model parameters (after loading the specified model) and freeze all parameters that don't match (as in the lesson 4 notebook).

In [None]:
# Refresh your memory on what the parameter names look like

from transformers import AutoModelForSequenceClassification

#model_checkpoint_name = "bert-base-cased"
#model_checkpoint_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"


bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)


for name, param in bert_classification_model.named_parameters():
    print(name)

In [None]:
MAX_SEQUENCE_LENGTH = 100

def fine_tune_classif_model_freeze_layers(classification_model,
                                          tokenizer,
                                          train_data,
                                          dev_data,
                                          layers_to_train = ["classifier."],
                                          max_sequence_length=MAX_SEQUENCE_LENGTH,
                                          batch_size = 16,
                                          num_epochs = 2):
    """
    Freeze any parameters inside the given model that have a name containing one of the
    strings in the "layers_to_freeze" list.
    Then specify the training arguments and trainer for the given model and data.
    Then train it.
    """

    preprocessed_train_data = train_data.map(preprocess_imdb, batched=True, fn_kwargs={'tokenizer': tokenizer})
    preprocessed_dev_data = dev_data.map(preprocess_imdb, batched=True, fn_kwargs={'tokenizer': tokenizer})

    ### YOUR CODE HERE


    # Freeze all layers except those in layers_to_train
    for name, param in classification_model.named_parameters():
        if not any(trainable in name for trainable in layers_to_train):
            param.requires_grad = False  # freeze this parameter

    # Define training arguments
    training_args = TrainingArguments(
        output_dir="./results_freeze",
        eval_strategy="epoch",
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=num_epochs,
        save_strategy="no",
        logging_dir="./logs",
        logging_strategy="epoch"
    )

    # Define trainer
    trainer = Trainer(
        model=classification_model,
        args=training_args,
        train_dataset=preprocessed_train_data,
        eval_dataset=preprocessed_dev_data,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer
    )

    # Train the model

    ### END YOUR CODE

    trainer.train()

We'll go back to using bert-base-cased for this part. First, try freezing the parameters in transformer layers 1-11 (including all parameters with "layer.#" in the name). That means you're leaving unfrozen the initial embedding layers, the first transformer layer (numbered 0), and the classification layer.

Unfreezing the bottom transformer layer(s) rather than the top one(s) is uncommon, but it's always good to try to understand why. Since we're learning, we'll try doing it this way and see what happens. We've given you the code for this exercise, so that the way to specify layers_to_freeze is clear.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

from datasets import load_dataset

# Load the IMDB dataset
imdb_dataset = load_dataset("imdb")

# Split into train and validation sets
imdb_train_dataset = imdb_dataset["train"]
imdb_dev_dataset = imdb_dataset["test"]


"""
Show the output from training a BERT-base-cased classification model, when unfreezing
only the parameters in the embedding layers, first transformer layer (layer 0), and classifier layer.
"""

model_checkpoint_name = "bert-base-cased"


bert_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint_name)
bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)

layers_to_train = ["embeddings.", "layer.0.", "classifier."]

fine_tune_classif_model_freeze_layers(
    bert_classification_model,
    bert_tokenizer,
    imdb_train_dataset,
    imdb_dev_dataset,
    layers_to_train
)

Now try two more versions, this time choosing which layers to train yourself. Instead of focusing on the number of the transformer block (layer.#), focus on the type of layer within each block (the stuff that comes after layer.# in the name).

Keep the pooler and classification layers unfrozen in all model versions. Your options to also train include the initial embedding layers and the different components within the transformer blocks (e.g. self attention matrices, dense layers, layer norms).

Try to find one combination that does better than the version you just ran above (higher validation accuracy after 2 epochs), without much more overfitting (training_loss / eval_loss > 0.7). Also try to find one version that overfits a lot more after 2 epochs (training_loss / eval_loss < 0.5).

In [None]:
"""
Show the output from training a particular model on the IMDB movie reviews dataset.
Choose layers to train that lead the model to perform better than the one in question 3.a, without overfitting much more.
"""

model_checkpoint_name = "bert-base-cased"

bert_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint_name)
bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)

### YOUR CODE HERE

# layers_to_train = [...]    #ANY STRINGS THAT MATCH SOME LAYERS ARE OK
# This will train the classifier, pooler, and some attention + intermediate dense layers
layers_to_train = [
    "classifier",
    "pooler",
    "attention.self.query",
    "attention.self.key",
    "attention.self.value",
    "intermediate.dense"
    "output.dense",
    "layernorm"


]

### END YOUR CODE


fine_tune_classif_model_freeze_layers(
    bert_classification_model,
    bert_tokenizer,
    imdb_train_dataset,
    imdb_dev_dataset,
    layers_to_train
)

In [None]:
"""
Show the output from training a particular model on the IMDB movie reviews dataset.
Choose layers to train that lead the model to overfit.
"""

model_checkpoint_name = "bert-base-cased"

bert_tokenizer = AutoTokenizer.from_pretrained(model_checkpoint_name)
bert_classification_model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint_name)

### YOUR CODE HERE

# layers_to_train = [...]

layers_to_train = [""]

### END YOUR CODE


fine_tune_classif_model_freeze_layers(
    bert_classification_model,
    bert_tokenizer,
    imdb_train_dataset,
    imdb_dev_dataset,
    layers_to_train
)

In [None]:
bert_classification_model = BertForSequenceClassification.from_pretrained(checkpoint)