# Use case Algoan - QA Data Science

We want to verify the quality of an algorithm built for detecting the regularity of a transaction.

## Definitions

### Transaction
A transaction is an incoming or outgoing flow of money, defined by :
- its amount
- its date (date)
- its description (description)
- its nature (type)
- its reason (category)
- its group of similar transactions (group_id)

### Regularity from an expert point of view

A transaction is considered regular if it is repeated at regular time intervals (the amount is not taken into account in the regularity algorithm). The objective is thus to identify weekly transactions (e.g.: purchase of tobacco every Saturday), bi-monthly transactions (e.g.: press subscription paid twice a month), monthly transactions (e.g.: receipt of salary at the beginning of the month), etc.

The time interval detected is referred to as the variable _frequency_. The frequency can take different values based on the mean interval:
"WEEK", "W-F", "FORTNIGHT", "F-M", "MONTH", "M-2M", "2MONTHS"

Obviously, some flexibility on the stability of the interval between 2 transactions should be taken in account to estimate the regularity. For example, if a salary falls on the first day of the first month, then again on the first day of the following month, and in the third month falls on the second day, the regularity is still considered to be monthly. Flexibility in the stability of the interval depends on the frequency: the higher the frequency (e.g. weekly transaction), the less deviation will be tolerated. Thus, a variation of 2 days for a monthly transaction is not significant, whereas it is for a transaction that would appear to be weekly.


## Algorithm
In order to analyse the regularity of transactions, transactions are first grouped into clusters similar in description and amount. The groups of transactions are indicated by the variable _group_id_.


## Problem
Follow the next steps to apply the algorithm on the attached data (transactions.csv). Then you will need to identify for the regularity algorithm in its alpha version at least :
- a critical bug,
- a negative tagging error (unidentified regular transactions), for which you will have to identify the possible reasons,
- a positive labelling error (non-regular transactions labelled as regular), for which you should identify possible reasons.

You can play around with the data by deleting some rows, changing some variables, etc. in order to investigate the limits of the algorithm.

In [None]:
# load libraries and functions
import pandas as pd
from utils import detect_regular_transactions

In [None]:
# load csv
transactions = pd.read_csv("transactions.csv")
# updating dates format from str to datetime.date
transactions["date"] = pd.to_datetime(transactions["date"]).dt.date

In [None]:
# display sample of data
transactions.head()

In [None]:
?detect_regular_transactions

In [None]:
# compute regularity
output = detect_regular_transactions(transactions)
# display output
output