GitHub - E-Kovtun/fraud_detection: Code for the "Sequence embeddings help to identify fraudulent cases in healthcare insurance" paper

You will need TensorFlow 1.12.0 to run experiments

Data preparation

Put arzta_daten_anonym1.csv, arzta_daten_anonym2.csv, arzta_daten_anonym3.csv and arzta_daten_anonym4.csv to data_path folder.
Run python data_prep.py --data_dir data_path

Pre-train your embeddings

Be sure to run data preparation script first
Run python pretrain_embeddings.py --data_path data_path/full.csv (full.csv is the file created by data_prep.py script)

Training

Choose the model swem_aver, swem_max, swem_max_features, gru or gru_feats (see model/model_fn.py) for detailed information)
Create experiment folder exp_path
Put experiments/config.yaml to exp_path and specify model params inside the yaml file.
Run python train.py --model_dir exp_path --data_dir data_path --architecture swem_max --use_pretrained

This command will initialize embeddings from word2vec_filename (specified in exp_path/config.yaml) and train the model (swem_max). After the training exp_path/config.yaml will be updated with ROC AUC and other metrics.

Other scripts

xgb.py will train XGBClassifier
search_hyperparams.py will iterate over hyperparams (hard-coded inside the script) and run train.py multiple times
calculate_metrics.py will calculate metrics

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
configs		configs
experiments		experiments
model		model
notebooks		notebooks
.gitignore		.gitignore
analysis.ipynb		analysis.ipynb
calculate_metrics.py		calculate_metrics.py
data_prep.py		data_prep.py
dim_selection.sh		dim_selection.sh
hyperparams_analysis.ipynb		hyperparams_analysis.ipynb
isp_report.pdf		isp_report.pdf
predict.py		predict.py
pretrain_embeddings.py		pretrain_embeddings.py
readme.md		readme.md
search_hyperparams.py		search_hyperparams.py
train.py		train.py
xgb.py		xgb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data preparation

Pre-train your embeddings

Training

Other scripts

About

Releases

Packages

Languages

E-Kovtun/fraud_detection

Folders and files

Latest commit

History

Repository files navigation

Data preparation

Pre-train your embeddings

Training

Other scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages