# Roles Classifier Alternative: Transformers Classifier (XLNet)

BERT has several models listed in the following link:

[https://huggingface.co/models](https://huggingface.co/models)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from simpletransformers.model import TransformerModel
from sklearn.model_selection import train_test_split


## Testing Several Files

In [None]:
file_size = [150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 645]
epochs = {150: 10, 200: 7, 250: 6, 300: 5, 350: 4, 400: 3, 450: 3, 500: 3, 550: 2, 600: 2, 645: 2}
accuracy = []

for i in file_size:
    file_name = f'output/balanced_{i}.csv'
    roles = pd.read_csv(f'../{file_name}')
    mapping = {'Student': 0, 'Co-Facilitator': 1, 'Facilitator': 2}
    roles['Role'] = roles['Role'].apply(lambda x: mapping[x])

    X = roles['Text']
    y = roles['Role']
    X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2)

    train_df = pd.DataFrame({'text': X_train, 'label': y_train})
    valid_df = pd.DataFrame({'text': X_valid, 'label': y_valid})
    model = TransformerModel('xlnet',
                             'xlnet-base-cased',
                             num_labels=3,
                             args={
                                 'learning_rate': 1e-5,
                                 'num_train_epochs': 2,
                                 'reprocess_input_data': True,
                                 'overwrite_output_dir': True},
                             use_cuda=False)

    # Train the model
    model.train_model(train_df)
    result, model_outputs, wrong_predictions = model.eval_model(valid_df)
    accuracy_partial = float(i - len(wrong_predictions)) / float(i)
    print(f'Accuracy for file_size {i}: %.3f' % accuracy_partial)
    accuracy.append(accuracy_partial)

## Graphical Performance Analysis

In the following plots we can see the how the model behaves when it is trained with different amounts of data.

In [None]:
%matplotlib inline

plt.plot(file_size, accuracy)
plt.title('# of Rows vs. Accuracy')
plt.suptitle('Transformers Classifier (XLNet)')
plt.xlabel('# of Rows')
plt.ylabel('Accuracy')
plt.show()

In [None]:
print(f'Mean Accuracy: {np.mean(accuracy)}')

## Conclusions

- The model shows a great performance. It seems to have an improvement when using bigger datasets, however, the
accuracy, in general terms is good.

