# Electricity fraud detection based on hybrid attention model

The notebook is based on original paper "Electricity Theft Detection with self-attention" by Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo (https://arxiv.org/abs/2002.06219). 
Source code is derived from their repository https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.
The purpose of the notebook is refactoring and further investigation of the original paper.

The notebook organazed as follows. First step is an initialization. It is performed to load all high-level modules that incapsulates low-level code. Second step is a data investigation. Next - training. Final step is a performance metrics calculation.

### 1. Initialization

The notebook contains a data investigation, basic training and performance calculation steps. They are based on high-level modeules that are loaded in this section. For Colab environment source code is cloned at first step (it may be uncommented if it is required). 

In [5]:
#! git clone https://github.com/ant-nik/electricity-theft-detection-with-self-attention.git
#% cd electricity-theft-detection-with-self-attention

Below you an find high-level description of imported modules.
- FraudData object simplifies access to various views of a raw data (raw/normalized, thief/regular, autocorelations etc);
- HybridAttentionModel - defines hybrid attention model based on torch library; 
- perform_kfold_cv - a training routine;
- RAdam - training optimizer implementation;

In [1]:
from source.dataset import FraudData, download_data
from source.hybrid_attention import HybridAttentionModel
from source.train import perform_kfold_cv
from source.optimizer import RAdam
from torch import nn
from matplotlib import pyplot

Final step of initialization is a raw data downloading process is required. It can be done manually or by a routine (it may be issue in Windows if there is no unix utils in OS, so manual loading is more stable).

In [2]:
# It may be issues in Windows environments, so manual downloading may be required.
# download_data() # *nix/Colab environments only
data = FraudData('data.csv')

### 1. Dataset investigation

A dataset has a lot of missed values and the authors of the original paper suggested to replace them with zeros and add an additional channel with bitmask that is a signal to classifier that value is mised. It is a good prectice to learn classifiers but according to time series properties such transformation might introduce a distortion to results.

Below the raw data is processed in a different way in order to calculate autocorrelation characteristic. It is split in chunks without missed values and only chanks with enough size are processed to calculate autocorrelation (~ to times greter than maximum autocorrelation lag). Autocorrelation results are obtained for four groups of data: thiefs and regular clients data, thiefs and regular clients normalized data. A normalization is performed to stabilize parameters of the raw data.

In [None]:
fig, ((ax1, ax2), (ax3, ax4)) = pyplot.subplots(2,2, figsize=(12,6))
vis = [
    {'title': "Thiefs autocorelation", 
         'ax': ax1, 'data': data.thief_ac},
    {'title': "Regular client's autocorelation", 
         'ax': ax2, 'data': data.regular_ac},
    {'title': "Thief's autocorelation (normalized)", 
         'ax': ax3, 'data': data.norm_thief_ac},
    {'title': "Regular client's autocorelation (normalized)", 
         'ax': ax4, 'data': data.norm_regular_ac}
]
for view in vis:
    view['ax'].set_title(view['title'])
    view['data'].mean(axis=1).plot(ax=ax1, legend=True, style='b-')
    view['data'].max(axis=1).plot(ax=ax1, legend=True, style='b--')
    view['data'].min(axis=1).plot(ax=ax1, legend=True, style='b--')
    view['ax'].legend(['median', 'min/max'])
    view['ax'].set_xlabel('lag')
    view['ax'].set_ylabel('autocorrelation')

### 2. Attention model learning

The authors of the original paper suggested to use a hybrid classifier that is complex neural network with convolutional and attention layers that forms a basic blocks. A learn process is performed as fold-learning with Adam optimizer and cross entropy loss function.

In [4]:
k_folds = 5
lr = 0.001
device = 'cpu'
models = [HybridAttentionModel().to(device) for _ in range(k_folds)]
optims = [RAdam(model.parameters(), lr) for model in models]
criterion = nn.CrossEntropyLoss()
f1_per_fold = perform_kfold_cv(data.normalized, models, optims, criterion, 
                               k_folds, device=device, n_epochs=10)

--- K Fold [1/2] ---


	addcmul_(Number value, Tensor tensor1, Tensor tensor2)
Consider using one of the following signatures instead:
	addcmul_(Tensor tensor1, Tensor tensor2, *, Number value)


ep: [1/1] -- T: 0.347 -- V: 0.301 -- F1: 0.0228
AUC: 0.721 -- MAP@100: 0.571 -- MAP@200: 0.526
Fold 1 got F1 = 0.0228 at epoch 1
Printing report at best checkpoint for F1
AUC: 0.721 -- MAP@100: 0.571 -- MAP@200: 0.526
--- K Fold [2/2] ---
ep: [1/1] -- T: 0.373 -- V: 0.35 -- F1: 0.0175
AUC: 0.735 -- MAP@100: 0.613 -- MAP@200: 0.576
Fold 2 got F1 = 0.0175 at epoch 1
Printing report at best checkpoint for F1
AUC: 0.735 -- MAP@100: 0.613 -- MAP@200: 0.576


### 3. Attention model performance

Result of learning are represented as a set of learned model files. They can be loaded to classify new inputs. The learning process outputs metric values that is used below to find a model with best score.

In [7]:
best_fold = f1_per_fold.index(sorted(f1_per_fold, key=lambda x:x[0], reverse=True)[0]) + 1
best_f1, best_epoch,_,_ = f1_per_fold[best_fold-1]
print(f'The best fold ,was {best_fold} with F1 of {best_f1} at epoch {best_epoch}')

The best fold ,was 1 with F1 of 0.022838499184339316 at epoch 1
