In [2]:
#pip install datasets

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-18.1.0-cp39-cp39-win_amd64.whl.metadata (3.4 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Using cached xxhash-3.5.0-cp39-cp39-win_amd64.whl.metadata (13 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Using cached multiprocess-0.70.16-py39-none-any.whl.metadata (7.2 kB)
Collecting aiohttp (from datasets)
  Downloading aiohttp-3.11.10-cp39-cp39-win_amd64.whl.metadata (8.0 kB)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp->datasets)
  Downloading aiohappyeyeballs-2.4.4-py3-none-any.whl.metadata (6.1 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets)
  Using cached aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting async-timeout<6.0,>=4.0 (from aiohttp->datasets)
  Downloading async_timeout-5.0.1-py3-none-any.whl.metad



In [None]:
#pip install pandas openpyxl

## Importing Required libraries:

In [3]:
import pandas as pd
from transformers import RobertaTokenizer, RobertaForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
#from sklearn.model_selection import train_test_split
from datasets import Dataset
import torch

As our dataset is already preprocessed and splitted into training, test and development test by the split that was given to us already in the initial dataset after loading the datasets again we need to prepare them for the RoBERTA Model in a format that is acceptable for this model which can be found in the corresponding huggingFace page.

Link to RoBERTA model documentation is provided below:

https://huggingface.co/docs/transformers/en/model_doc/roberta

Because RoBERta and other similar pre-trained models are trained on general datasets and are not specialized for specific task like in our case classification of text to sexist and no-sexist we need to fine-tune the model based on our data and labels although it can be used already but the results and accuracy of the model might not be as good as it should be. we are going to compare the results of both original and fine-tuned models.

But to describe why fine-tuning is needed in more details.
Fine-tuning involves training the last few layers (and optionally all layers) of the model on your labeled data. The goal is to optimize the pre-trained weights for your task while retaining the knowledge learned during pre-training.

1. It will allow the model to learn task specific patterns and adapt to our specific domain.
2. The pre-trained model doesn't know about our labels in this condition fine-tuning will align the model's output to our specific purposes.
3. Fine-tuning can imporve the performance of the model.

- Steps of fine-tuning are as follows:

1. Pre-processing the data (which we have already done)
2. Adapting the data to our model for binary classification
3. Adding classification head on top of RoBERTa for binary prediction
4. Training the model using DF_train
5. Use the fine-tuned model for predciting on test set(DF_test)

------------------------------------------------

Steps for Fine-tuning in Your Task
Pre-process the data: You've already preprocessed and loaded the dataset. Tokenize and prepare it for the RoBERTa model.
Adapt the model for binary classification:
Add a classification head (a linear layer) on top of RoBERTa for outputting binary predictions.
Train the model:
Use your training data (DF_train) for model training.
Use your dev data (DF_dev) to monitor performance during training and prevent overfitting.
Evaluate:
Use metrics like accuracy, precision, recall, and F1 score to evaluate the fine-tuned model on the validation set (DF_dev).
Predict:
Use the fine-tuned model to predict labels for your test data (DF_test).

In [4]:
# Loading the dataset
DF_train=pd.read_csv('../data/preprocessed/DF_train.csv')
DF_dev=pd.read_csv('../data/preprocessed/DF_dev.csv')
DF_test=pd.read_csv('../data/preprocessed/DF_test.csv')
Actual_labels=pd.read_csv('../data/preprocessed/Actual_labels.csv')

In [5]:
# Combining datasets into format acceptable by HuggingFace:
train_dataset = Dataset.from_pandas(DF_train[['text', 'label_sexist']])
dev_dataset = Dataset.from_pandas(DF_dev[['text', 'label_sexist']])
test_dataset = Dataset.from_pandas(DF_test[['text']])  # Test doesn't need labels for now


After loading and making the datasets ready for the model, we need to tokenize the data which will be done using tokenizer already included in the transformer library:

### Loading the tokenizer and tokenizing the data:

In [6]:
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

def tokenize_data(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

train_dataset = train_dataset.map(tokenize_data, batched=True)
dev_dataset = dev_dataset.map(tokenize_data, batched=True)
test_dataset = test_dataset.map(tokenize_data, batched=True)




Map:   0%|          | 0/14000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

### Formating data for Training:

In [7]:
train_dataset = train_dataset.rename_column("label_sexist", "labels")
dev_dataset = dev_dataset.rename_column("label_sexist", "labels")

train_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
dev_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
test_dataset.set_format(type="torch", columns=["input_ids", "attention_mask"])


### Loading the pre-trained Roberta model:

In [8]:
model = RobertaForSequenceClassification.from_pretrained("roberta-base", num_labels=2)


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Setting up training parameters and arguments:

We are going to use exactly the same parameters and arguments that were used for training RoBERTA model as they should already be the optimized ones. Parameters are copied from huggingFace documentation page which was mentioned before.

In [9]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    save_total_limit=1
)




### Defining Metrics for fine-tuning the model:

In [10]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = torch.argmax(torch.tensor(logits), dim=-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average="binary")
    acc = accuracy_score(labels, predictions)
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}


### Fine-Tuning and training the model:

In [11]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=dev_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

trainer.train()


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3068,0.314841,0.8585,0.685206,0.745763,0.633745
2,0.2621,0.342734,0.8725,0.728435,0.754967,0.703704
3,0.1835,0.400996,0.8785,0.736728,0.778032,0.699588


TrainOutput(global_step=2625, training_loss=0.2861032879466102, metrics={'train_runtime': 38222.8101, 'train_samples_per_second': 1.099, 'train_steps_per_second': 0.069, 'total_flos': 2762666081280000.0, 'train_loss': 0.2861032879466102, 'epoch': 3.0})

### Saving the fine-tuned model for future usage:

In [12]:
output_dir = "../fine_tuned_roberta"
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"Model and tokenizer saved to {output_dir}")

Model and tokenizer saved to ./fine_tuned_roberta


### Evaluating the fine-tuned model:

In [13]:
results = trainer.evaluate()
print(results)

{'eval_loss': 0.40099582076072693, 'eval_accuracy': 0.8785, 'eval_f1': 0.7367280606717227, 'eval_precision': 0.7780320366132724, 'eval_recall': 0.6995884773662552, 'eval_runtime': 528.0448, 'eval_samples_per_second': 3.788, 'eval_steps_per_second': 0.237, 'epoch': 3.0}


### Testing the model on our test dataset:

In [14]:
predictions = trainer.predict(test_dataset)
test_dataset = DF_test.copy()
test_dataset["predictions"] = torch.argmax(torch.tensor(predictions.predictions), dim=-1).numpy()
print(test_dataset.head())

                  rewire_id  \
0    sexism2022_english-845   
1   sexism2022_english-6629   
2  sexism2022_english-17573   
3  sexism2022_english-10268   
4  sexism2022_english-10735   

                                                text  predictions  
0  fuck the niggers and the jews both have a hist...            0  
1  well then good because someone has to knock he...            1  
2  usa texas islam muslims islamization sharialaw...            0  
3  yes normal women want to be dominated social s...            1  
4  she didnt have to be a bitch about it she lite...            1  


In [15]:
test_dataset

Unnamed: 0,rewire_id,text,predictions
0,sexism2022_english-845,fuck the niggers and the jews both have a hist...,0
1,sexism2022_english-6629,well then good because someone has to knock he...,1
2,sexism2022_english-17573,usa texas islam muslims islamization sharialaw...,0
3,sexism2022_english-10268,yes normal women want to be dominated social s...,1
4,sexism2022_english-10735,she didnt have to be a bitch about it she lite...,1
...,...,...,...
3995,sexism2022_english-2356,define blatant if youre with a girl then be wi...,0
3996,sexism2022_english-17641,take a look at mgtow even chads know women are...,1
3997,sexism2022_english-6358,when youre known as the guy who argues that wo...,0
3998,sexism2022_english-8770,you shouldve asked if you could be her side piece,0


### Combining predictions and actual Labels for final Evaluation

In [21]:
# Adding actual labels to the test dataset
evaluation_data = DF_test.copy()
evaluation_data["predictions"] = torch.argmax(torch.tensor(predictions.predictions), dim=-1).numpy()
evaluation_data["actual_labels"] = Actual_labels["label_sexist"]

# Mapping numeric labels to text labels
label_map = {0: "not sexist", 1: "sexist"}
evaluation_data["predictions_text"] = evaluation_data["predictions"].map(label_map)
evaluation_data["actual_labels_text"] = evaluation_data["actual_labels"].map(label_map)

# Calculating evaluation metrics
accuracy = accuracy_score(evaluation_data["actual_labels"], evaluation_data["predictions"])
precision = precision_score(evaluation_data["actual_labels"], evaluation_data["predictions"], average="binary")
recall = recall_score(evaluation_data["actual_labels"], evaluation_data["predictions"], average="binary")
f1 = f1_score(evaluation_data["actual_labels"], evaluation_data["predictions"], average="binary")

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Saving the final result to and Excel file:
output_path = "../results/predictions_with_labels_dl.xlsx"
columns_to_save = ["rewire_id", "text", "predictions_text", "actual_labels_text"]
evaluation_data[columns_to_save].to_excel(output_path, index=False)
print(f"Predictions saved to {output_path}")


Accuracy: 0.8738
Precision: 0.7476
Recall: 0.7237
F1 Score: 0.7355
Predictions saved to ./predictions_with_labels.xlsx


In [22]:
evaluation_data

Unnamed: 0,rewire_id,text,predictions,actual_labels,predictions_text,actual_labels_text
0,sexism2022_english-845,fuck the niggers and the jews both have a hist...,0,0,not sexist,not sexist
1,sexism2022_english-6629,well then good because someone has to knock he...,1,1,sexist,sexist
2,sexism2022_english-17573,usa texas islam muslims islamization sharialaw...,0,0,not sexist,not sexist
3,sexism2022_english-10268,yes normal women want to be dominated social s...,1,1,sexist,sexist
4,sexism2022_english-10735,she didnt have to be a bitch about it she lite...,1,1,sexist,sexist
...,...,...,...,...,...,...
3995,sexism2022_english-2356,define blatant if youre with a girl then be wi...,0,0,not sexist,not sexist
3996,sexism2022_english-17641,take a look at mgtow even chads know women are...,1,1,sexist,sexist
3997,sexism2022_english-6358,when youre known as the guy who argues that wo...,0,0,not sexist,not sexist
3998,sexism2022_english-8770,you shouldve asked if you could be her side piece,0,0,not sexist,not sexist


### Analyzing the Results:

In [27]:
# Filtering rows where predictions and actual_labels are not equal
mismatched_rows = evaluation_data[evaluation_data['predictions'] != evaluation_data['actual_labels']]

mismatched_rows


Unnamed: 0,rewire_id,text,predictions,actual_labels,predictions_text,actual_labels_text
11,sexism2022_english-9087,i would pay to see her ass cracker nsfw,0,1,not sexist,sexist
23,sexism2022_english-295,most men know instinctually the game but their...,1,0,sexist,not sexist
29,sexism2022_english-13224,with interior decorating skills like yours gir...,0,1,not sexist,sexist
41,sexism2022_english-8690,isnt it convenient that she doesnt have the vi...,0,1,not sexist,sexist
48,sexism2022_english-14596,its not the same goys groping women is my come...,0,1,not sexist,sexist
...,...,...,...,...,...,...
3964,sexism2022_english-12476,awalt the guy had social status so she attache...,0,1,not sexist,sexist
3968,sexism2022_english-9520,so leading women on and using them for materia...,1,0,sexist,not sexist
3971,sexism2022_english-5262,a girl who comes up with terms like substantia...,1,0,sexist,not sexist
3976,sexism2022_english-9172,it has always seemed to me that feminists goal...,1,0,sexist,not sexist


In [28]:
output_path_mis = "../results/mismatch_dl.xlsx"
columns = ["rewire_id", "text", "predictions_text", "actual_labels_text"]
mismatched_rows[columns].to_excel(output_path_mis, index=False)