
# Appeals classification task

The task is to train a model to find the appeals of bank customers consist of description of special type of fraud they faced from all of the customers' feedback. We interested in the situations where a potential impostor calls to the customer and introduces himself as a member of a customer service of the bank. Then, impostor tells the customer the actual balance of customer's card account to convince his prey. The language of the text is Russian

The complexity of the task lies in the fact that there are a lot of messages containing problems with the balance of card account and different fraud types, so it is hard to use regular expressions or similar default approaches to catch what we need here.

We are going to use Transformers and Pytorch libraries to fine-tune a BERT model since BERT shows wonderful results on different tasks and is rather easy to fine-tune

In [None]:
import pandas as pd

In [None]:
import torch

In [None]:
train_data = pd.read_csv('train.csv', sep="\t")
valid_data = pd.read_csv('valid.csv', sep="\t")
test_data  = pd.read_csv('test.csv', sep="\t")

For this specific task we created **two helper classes**. One for classifier and the other for the **data proccessing**.

**CustomDataset** class consists of methods to proccess the input texts and make it ready for DataLoader class from pytorch. More specificaly it tokenizes input texts with tokenizer that was defined previously using padding and converts the target data into tensors. It is written with the help of this tutorial

**BertClassifier** is our main class that trains and evaluates the model. It gets as input path to the model, path to the tokenizer, number of classess to predict, number of epochs.

**Preparation** method initializes dataloaders using our CustomDataset class, optimizer parameters and a loss function

**fit** method defines our train loop, performs optimization steps

**eval** is our evaluation method. It returns losses and accuracy on validation dataset

**train** method performs fit method as many times as needed saving the best model

**predict** method takes a text and outputs predictions by trained model which was saved inside *train method*

In [None]:
from bert_dataset import CustomDataset
from bert_classifier import BertClassifier

### Initialize BERT classifier

Here we Initialize the object of our BertClassifier class. Model that is used is Rubert - popular BERT model for Russian language. You can find it on HuggingFace

In [None]:
classifier = BertClassifier(
        model_path='rubert_cased_L-12_H-768_A-12_v1',
        tokenizer_path='rubert_cased_L-12_H-768_A-12_v1/vocab.txt',
        n_classes=2,
        epochs=2,
        model_save_path='bertmodel_.pt'
)

In [None]:
classifier.preparation(
        X_train=list(train_data['text']),
        y_train=list(train_data['value']),
        X_valid=list(valid_data['text']),
        y_valid=list(valid_data['value'])
    )

In [None]:
if torch.cuda.is_available():
    device = torch.device("cuda", 1)
    print('GPU avaliable')
else:
    device = torch.device("cpu")
    print("GPU UNavaliable")

Train our model

In [None]:
classifier.train()

In [None]:
texts = list(test_data['text'])
labels = list(test_data['value'])

predictions = [classifier.predict(t) for t in texts]

In [None]:
from sklearn.metrics import precision_recall_fscore_support

precision, recall, f1score = precision_recall_fscore_support(labels, predictions,average='macro')[:3]

print(f'precision: {precision}, recall: {recall}, f1score: {f1score}')