# ABSA Model Training

In this section, we present the process of training a layer on a pre trained Aspect-Based Sentiment Analysis (ABSA) model using BERT.

## Training the ABSA Model

1. **Loading the Data:**
   - Iterate over the datasets containing normalized poem data.
   - Convert the dataset into a suitable format for training.

2. **Model Initialization:**
   - Initialize the ABSAModel instance with a pre-trained BERT tokenizer.
   - Load the pre-trained BERT-based ABSA model.

3. **Training Additional Layer:**
   - Train the additional layer of the ABSA model on each dataset.
   - Iterate over the datasets and train the model while saving the trained model at each iteration.

4. **Save the Trained Model:**
   - Save the trained model with a specific name corresponding to the iteration.

In [None]:
import os, sys
sys.path.insert(1, '../dataset')
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# import matplotlib.pyplot as plt
# import seaborn as sns
import torch
from transformers import BertTokenizer
from collections import Counter
sys.path.append(r'C:\SVSHARE\final\BERT-Aspect-Based-Sentiment-Analysis\src')
from absa import *
# from tabulate import tabulate
import csv


### Load the pre-trained tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Initialize your ABSAModel instance
absa_model = ABSAModel(tokenizer)
### To train additional layer:
## please specify the name of the model and the path to the dataset.
model_name = 'shakespeare_absa_model_test_first.pth'
path_to_training_files = 'C:\\SVSHARE\\final\\BERT-Aspect-Based-Sentiment-Analysis\\dataset\\normalized\\first_25\\'
files = os.listdir(path_to_training_files)
print(files)
counter = 0
## start iterating over the datasets to train the additional layer over and over
for file in files:
    if counter != 0:
        ## specify the name of the model
        absa_model.load_model(model_name)
    input_file = path_to_training_files + file
    print(input_file)
    shakespeare_df = pd.read_csv(input_file)
## convert your dataset to a suitable format for the trainer
    shakespeare_dataset = ABSADataset(shakespeare_df, tokenizer)
## Train only the new layer
    absa_model.train_new_layer(shakespeare_dataset, epochs=5, device='cpu', lr=1e-4)
## Save the trained model - choose the exact name like at the beggining   of the iteration
    absa_model.save_model(model_name)
    counter += 1

### Validating the Trained Model

1. **Load the Trained Model:**
   - Load the trained ABSA model for validation.

2. **Prepare Test Data:**
   - Load the test dataset containing poems to be evaluated.

3. **Test the Model:**
   - Test the trained ABSA model on the test dataset.
   - Evaluate the model's performance using metrics such as accuracy and loss.

In [None]:
import os, sys
sys.path.insert(1, '../dataset')
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# import matplotlib.pyplot as plt
# import seaborn as sns
import torch
from transformers import BertTokenizer
from collections import Counter
sys.path.append(r'C:\SVSHARE\final\BERT-Aspect-Based-Sentiment-Analysis\src')
from absa import *
# from tabulate import tabulate
import csv

### Load the pre-trained tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Initialize your ABSAModel instance
absa_model = ABSAModel(tokenizer)
## to run a test on the trained model
# specify the name of the model you would like to test
absa_model.load_model('shakespeare_absa_model_first_25.pth')
## give the path to the test file
test_file = 'C:\\SVSHARE\\final\\BERT-Aspect-Based-Sentiment-Analysis\\dataset\\normalized\\combined_to_test.csv'
test_df = pd.read_csv(test_file)
## convert your dataset to a suitable format for the test
test_dataset = ABSADataset(test_df, tokenizer)
## Test the trained model
absa_model.test_model(test_dataset, 'cpu')

## Making Predictions

1. **Load the Trained Model:**
   - Load the trained ABSA model for making predictions.

2. **Prepare Prediction Data:**
   - Iterate over the dataset of poems to be predicted.

3. **Execute Prediction Function:**
   - Execute the prediction function on each poem dataset.
   - Determine if each poem is written by Shakespeare or not based on the model's predictions.

4. **Save Predictions:**
   - Write the predictions to a CSV file containing filenames and corresponding predictions.

In [None]:
import os, sys
sys.path.insert(1, '../dataset')
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# import matplotlib.pyplot as plt
# import seaborn as sns
import torch
from transformers import BertTokenizer
from collections import Counter
sys.path.append(r'C:\SVSHARE\final\BERT-Aspect-Based-Sentiment-Analysis\src')
from absa import *
# from tabulate import tabulate
import csv

### Load the pre-trained tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Initialize your ABSAModel instance
absa_model = ABSAModel(tokenizer)
## Function to make predictions
def predict_text_writer(dataset):
    loader = DataLoader(dataset, batch_size=32, shuffle=False, collate_fn=absa_model.padding)
    absa_model.model.eval()
    predictions = []
    with torch.no_grad():
        for batch in tqdm(loader):
            ids_tensors, segments_tensors, masks_tensors, _ = batch
            ids_tensors = ids_tensors.to('cpu')
            segments_tensors = segments_tensors.to('cpu')
            masks_tensors = masks_tensors.to('cpu')

            outputs = absa_model.model(ids_tensors=ids_tensors, segments_tensors=segments_tensors, masks_tensors=masks_tensors)
            _, predicted = torch.max(outputs, 1)
            predictions.extend(predicted.tolist())
    # Convert predictions to labels
    labels = ['Shakespeare' if pred == 2 else 'Non-Shakespeare' for pred in predictions]
    # Count occurrences of Shakespeare and Non-Shakespeare predictions
    shakespeare_count = labels.count('Shakespeare')
    non_shakespeare_count = labels.count('Non-Shakespeare')
    # Determine the majority prediction
    print(f"Number of Shakespeare texts: {shakespeare_count}")
    print(f"Number of non-Shakespeare texts: {non_shakespeare_count}")
    if shakespeare_count > non_shakespeare_count:
            to_table[file] = ["Poem is written by Shakespeare"]
            return 'Shakespeare'
    else:
            to_table[file] = ["Poem is not written by Shakespeare"]
            return 'Non-Shakespeare'
### make predictions on the trained model
## specify the name of the model you would like to predict on
absa_model.load_model('shakespeare_absa_model_first_25.pth')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
## give path to poems dataset
folder_path = 'C:\\SVSHARE\\final\\Shakespeare_pred'
output_location = 'C:\\SVSHARE\\final\\Shakespeare_prediction.csv'
files = os.listdir(folder_path)
print(files)
to_table = {}
for file in files:
    csv_file_path = folder_path + "\\" + file
    print(csv_file_path)
    df = pd.read_csv(csv_file_path)
    ## convert your dataset to a suitable format for the prediction
    shakespeare_dataset = ABSADataset(df, tokenizer)
    
    ## execute the prediction function on the converted dataset
    shakespeare_predictions = predict_text_writer(shakespeare_dataset)
    print(shakespeare_predictions)
# Create a list of dictionaries from the to_table dictionary
rows = [{'Filename': filename, 'Prediction': prediction} for filename, prediction in to_table.items()]

# Write the data to the CSV file
with open(output_location, 'w', newline='') as csvfile:
    fieldnames = ['Filename', 'Prediction']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(rows)