# **Elliptic++ Transactions Dataset**


---
---


Released by: Youssef Elmougy, Ling Liu



School of Computer Science, Georgia Institute of Technology

Contact: yelmougy3@gatech.edu


---

Github Repository: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)


If you use our dataset in your work, please cite our paper:





>> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.

---



## [SETUP] Import libraries and csv files 

Download dataset from: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
import plotly.graph_objs as go 
import plotly.offline as py 
import math

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import VotingClassifier
from sklearn.base import clone 

import xgboost as xgb

ImportError: cannot import name 'if_delegate_has_method' from 'sklearn.utils.metaestimators' (/home/ck_bonbon/miniconda3/envs/DRL_env/lib/python3.8/site-packages/sklearn/utils/metaestimators.py)

In [5]:
import gdown
!gdown --fuzzy "https://drive.google.com/file/d/19q09IFhfkOOBOXvn_dKhWjILJtjCcsjc/view?usp=drive_link"
!gdown --fuzzy "https://drive.google.com/file/d/1DiBxn8TXdbJqoSw58pYUeaqO3oOKhuQO/view?usp=drive_link"
!gdown --fuzzy "https://drive.google.com/file/d/1Q2yG_CIDvfdGP-fKVPSw979EYgQukjz5/view?usp=drive_link"

Downloading...
From (original): https://drive.google.com/uc?id=19q09IFhfkOOBOXvn_dKhWjILJtjCcsjc
From (redirected): https://drive.google.com/uc?id=19q09IFhfkOOBOXvn_dKhWjILJtjCcsjc&confirm=t&uuid=ee0be629-84f1-45a1-9ef3-f6c99c939852
To: /home/ck_bonbon/fintech/final_project/EllipticPlusPlus/Transactions Dataset/txs_features.csv
100%|████████████████████████████████████████| 695M/695M [00:18<00:00, 37.0MB/s]
Downloading...
From: https://drive.google.com/uc?id=1DiBxn8TXdbJqoSw58pYUeaqO3oOKhuQO
To: /home/ck_bonbon/fintech/final_project/EllipticPlusPlus/Transactions Dataset/txs_classes.csv
100%|██████████████████████████████████████| 2.36M/2.36M [00:00<00:00, 68.8MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Q2yG_CIDvfdGP-fKVPSw979EYgQukjz5
To: /home/ck_bonbon/fintech/final_project/EllipticPlusPlus/Transactions Dataset/txs_edgelist.csv
100%|██████████████████████████████████████| 4.47M/4.47M [00:00<00:00, 69.6MB/s]


## Transactions Dataset Overview


---

This section loads the 3 csv files (txs_features, txs_classes, txs_edgelist) and provides a quick overview of the dataset structure and features.

Load saved transactions dataset csv files:

In [17]:
print("\nTransaction features: \n")
df_txs_features = pd.read_csv("txs_features.csv")
df_txs_features

print("\nTransaction classes: \n")
df_txs_classes = pd.read_csv("txs_classes.csv")
df_txs_classes

print("\nTransaction-Transaction edgelist: \n")
df_txs_edgelist = pd.read_csv("txs_edgelist.csv")
df_txs_edgelist


Transaction features: 



Unnamed: 0,txId,Time step,Local_feature_1,Local_feature_2,Local_feature_3,Local_feature_4,Local_feature_5,Local_feature_6,Local_feature_7,Local_feature_8,...,in_BTC_min,in_BTC_max,in_BTC_mean,in_BTC_median,in_BTC_total,out_BTC_min,out_BTC_max,out_BTC_mean,out_BTC_median,out_BTC_total
0,3321,1,-0.169615,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.160199,...,0.534072,0.534072,0.534072,0.534072,0.534072,1.668990e-01,0.367074,0.266986,0.266986,0.533972
1,11108,1,-0.137586,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.127429,...,5.611878,5.611878,5.611878,5.611878,5.611878,5.861940e-01,5.025584,2.805889,2.805889,5.611778
2,51816,1,-0.170103,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.160699,...,0.456608,0.456608,0.456608,0.456608,0.456608,2.279902e-01,0.228518,0.228254,0.228254,0.456508
3,68869,1,-0.114267,-0.184668,-1.201369,0.028105,-0.043875,-0.113002,0.547008,-0.161652,...,0.308900,8.000000,3.102967,1.000000,9.308900,1.229000e+00,8.079800,4.654400,4.654400,9.308800
4,89273,1,5.202107,-0.210553,-1.756361,-0.121970,260.090707,-0.113002,-0.061584,5.335864,...,852.164680,852.164680,852.164680,852.164680,852.164680,1.300000e-07,41.264036,0.065016,0.000441,852.164680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203764,158304003,49,-0.165622,-0.139563,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.156113,...,,,,,,,,,,
203765,158303998,49,-0.167040,-0.139563,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.157564,...,,,,,,,,,,
203766,158303966,49,-0.167040,-0.139563,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.157564,...,,,,,,,,,,
203767,161526077,49,-0.172212,-0.139573,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.162856,...,,,,,,,,,,



Transaction classes: 



Unnamed: 0,txId,class
0,3321,3
1,11108,3
2,51816,3
3,68869,2
4,89273,2
...,...,...
203764,158304003,3
203765,158303998,3
203766,158303966,3
203767,161526077,3



Transaction-Transaction edgelist: 



Unnamed: 0,txId1,txId2
0,230425980,5530458
1,232022460,232438397
2,230460314,230459870
3,230333930,230595899
4,232013274,232029206
...,...,...
234350,158365409,157930723
234351,188708874,188708879
234352,157659064,157659046
234353,87414554,106877725


Data structure for an example transaction (txId = 272145560):

In [20]:
print("\ntxs_features.csv for txId = 272145560\n")
df_txs_features[df_txs_features['txId']==272145560]

print("\ntxs_classes.csv for txId = 272145560\n")
df_txs_classes[df_txs_classes['txId']==272145560]

print("\ntxs_edgelist.csv for txId = 272145560\n")
df_txs_edgelist[(df_txs_edgelist['txId1']==272145560) | (df_txs_edgelist['txId2']==272145560)]


txs_features.csv for txId = 272145560



Unnamed: 0,txId,Time step,Local_feature_1,Local_feature_2,Local_feature_3,Local_feature_4,Local_feature_5,Local_feature_6,Local_feature_7,Local_feature_8,...,in_BTC_min,in_BTC_max,in_BTC_mean,in_BTC_median,in_BTC_total,out_BTC_min,out_BTC_max,out_BTC_mean,out_BTC_median,out_BTC_total
105573,272145560,24,-0.155493,-0.107012,-1.201369,-0.12197,-0.043875,-0.113002,-0.061584,-0.145749,...,2.7732,2.7732,2.7732,2.7732,2.7732,0.001917,2.770883,1.3864,1.3864,2.7728



txs_classes.csv for txId = 272145560



Unnamed: 0,txId,class
105573,272145560,1



txs_edgelist.csv for txId = 272145560



Unnamed: 0,txId1,txId2
123072,272145560,296926618
123272,272145560,272145556
125873,299475624,272145560



Transaction features --- 94 local features, 72 aggregate features, 17 augmented features:


In [21]:
list(df_txs_features.columns)

['txId',
 'Time step',
 'Local_feature_1',
 'Local_feature_2',
 'Local_feature_3',
 'Local_feature_4',
 'Local_feature_5',
 'Local_feature_6',
 'Local_feature_7',
 'Local_feature_8',
 'Local_feature_9',
 'Local_feature_10',
 'Local_feature_11',
 'Local_feature_12',
 'Local_feature_13',
 'Local_feature_14',
 'Local_feature_15',
 'Local_feature_16',
 'Local_feature_17',
 'Local_feature_18',
 'Local_feature_19',
 'Local_feature_20',
 'Local_feature_21',
 'Local_feature_22',
 'Local_feature_23',
 'Local_feature_24',
 'Local_feature_25',
 'Local_feature_26',
 'Local_feature_27',
 'Local_feature_28',
 'Local_feature_29',
 'Local_feature_30',
 'Local_feature_31',
 'Local_feature_32',
 'Local_feature_33',
 'Local_feature_34',
 'Local_feature_35',
 'Local_feature_36',
 'Local_feature_37',
 'Local_feature_38',
 'Local_feature_39',
 'Local_feature_40',
 'Local_feature_41',
 'Local_feature_42',
 'Local_feature_43',
 'Local_feature_44',
 'Local_feature_45',
 'Local_feature_46',
 'Local_feature_47',

In [22]:

subset = df_txs_classes[['txId', 'class']]

# 2. 用 merge 把 class 欄併到 txs_features
df_txs_features = df_txs_features.merge(
    subset,
    on='txId',     # 以 txId 當作 key
    how='left'     # 保留 txs_features 所有列，對不到的 class 欄會是 NaN
)


## Machine Learning Model Classification


---

This section does data preprocessing, creates the training and testing sets, and runs the Logistic Regression, Random Forest, Multilayer Perceptrons, and XGBoost models as well as the ensembles on the dataset.


Drop transactions without augmented feature values (0.5% not de-anonymized):

In [23]:
df_txs_features = df_txs_features.dropna()
df_txs_features

Unnamed: 0,txId,Time step,Local_feature_1,Local_feature_2,Local_feature_3,Local_feature_4,Local_feature_5,Local_feature_6,Local_feature_7,Local_feature_8,...,in_BTC_max,in_BTC_mean,in_BTC_median,in_BTC_total,out_BTC_min,out_BTC_max,out_BTC_mean,out_BTC_median,out_BTC_total,class
0,3321,1,-0.169615,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.160199,...,0.534072,0.534072,0.534072,0.534072,1.668990e-01,0.367074,0.266986,0.266986,0.533972,3
1,11108,1,-0.137586,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.127429,...,5.611878,5.611878,5.611878,5.611878,5.861940e-01,5.025584,2.805889,2.805889,5.611778,3
2,51816,1,-0.170103,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.160699,...,0.456608,0.456608,0.456608,0.456608,2.279902e-01,0.228518,0.228254,0.228254,0.456508,3
3,68869,1,-0.114267,-0.184668,-1.201369,0.028105,-0.043875,-0.113002,0.547008,-0.161652,...,8.000000,3.102967,1.000000,9.308900,1.229000e+00,8.079800,4.654400,4.654400,9.308800,2
4,89273,1,5.202107,-0.210553,-1.756361,-0.121970,260.090707,-0.113002,-0.061584,5.335864,...,852.164680,852.164680,852.164680,852.164680,1.300000e-07,41.264036,0.065016,0.000441,852.164680,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
202799,194747812,49,0.558398,-0.198956,-0.091383,-0.121970,-0.043875,-0.113002,-0.061584,0.584665,...,115.952889,115.952889,115.952889,115.952889,1.653300e+00,114.299544,57.976422,57.976422,115.952844,3
202800,194747925,49,0.547658,-0.198956,-0.091383,-0.121970,-0.043875,-0.113002,-0.061584,0.573676,...,114.250098,114.250098,114.250098,114.250098,2.035300e-02,114.229700,57.125027,57.125027,114.250053,3
202801,194748063,49,0.543600,-0.198853,-0.091383,-0.121970,-0.043875,-0.113002,-0.061584,0.569524,...,113.606771,113.606771,113.606771,113.606771,9.257490e-01,112.680977,56.803363,56.803363,113.606726,3
202802,194748070,49,0.537760,-0.198853,-0.091383,-0.121970,-0.043875,-0.113002,-0.061584,0.563549,...,112.680977,112.680977,112.680977,112.680977,3.026970e-01,112.378235,56.340466,56.340466,112.680932,3


Data transformation on the augmented features using MinMaxScaler:

In [24]:
for column in df_txs_features.columns[-18:-1]:
    feature = np.array(df_txs_features[column]).reshape(-1,1)
    scaler = MinMaxScaler()
    scaler.fit(feature)
    feature_scaled = scaler.transform(feature)
    df_txs_features[column] = feature_scaled.reshape(1,-1)[0]

In [25]:
# remove 'unknown' transactions
data = df_txs_features.loc[(df_txs_features['class'] != 3), 'txId']
df_txs_features_selected = df_txs_features.loc[df_txs_features['txId'].isin(data)]
df_txs_features_selected

Unnamed: 0,txId,Time step,Local_feature_1,Local_feature_2,Local_feature_3,Local_feature_4,Local_feature_5,Local_feature_6,Local_feature_7,Local_feature_8,...,in_BTC_max,in_BTC_mean,in_BTC_median,in_BTC_total,out_BTC_min,out_BTC_max,out_BTC_mean,out_BTC_median,out_BTC_total,class
3,68869,1,-0.114267,-0.184668,-1.201369,0.028105,-0.043875,-0.113002,0.547008,-0.161652,...,7.022548e-04,2.723834e-04,8.778200e-05,8.171503e-04,6.113009e-04,7.142783e-04,0.001552,1.552291e-03,8.171446e-04,2
4,89273,1,5.202107,-0.210553,-1.756361,-0.121970,260.090707,-0.113002,-0.061584,5.335864,...,7.480472e-02,7.480472e-02,7.480472e-02,7.480472e-02,6.466160e-11,3.647866e-03,0.000022,1.451405e-07,7.480473e-02,2
11,293323,1,-0.172726,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.163383,...,3.577994e-06,3.578006e-06,3.579195e-06,3.575573e-06,4.715323e-07,3.511341e-06,0.000007,6.780735e-06,3.569892e-06,2
22,1494462,1,-0.172921,-0.158783,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.163581,...,8.766174e-07,8.766298e-07,8.778192e-07,8.741973e-07,1.442451e-06,6.094506e-07,0.000002,1.632382e-06,8.597370e-07,2
25,1582950,1,-0.169967,-0.184668,-1.201369,-0.121970,-0.043875,-0.113002,-0.061584,-0.160559,...,4.198289e-05,4.198290e-05,4.198409e-05,4.198047e-05,2.302948e-05,3.817869e-05,0.000080,7.973672e-05,4.197479e-05,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
202762,194334585,49,-0.039416,-0.118083,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.026985,...,1.858864e-03,1.858864e-03,1.858865e-03,1.858862e-03,1.992449e-05,1.868443e-03,0.003531,3.531138e-03,1.858833e-03,2
202763,194334621,49,-0.050308,-0.112834,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.038129,...,1.707287e-03,1.707287e-03,1.707288e-03,1.707285e-03,4.973969e-06,1.718449e-03,0.003243,3.243191e-03,1.707255e-03,2
202764,194335206,49,-0.154605,-0.116753,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,-0.144840,...,2.557972e-04,2.557972e-04,2.557984e-04,2.557948e-04,1.933202e-04,2.232165e-04,0.000486,4.858661e-04,2.557661e-04,2
202765,194335216,49,0.708000,-0.118083,1.018602,-0.121970,-0.043875,-0.113002,-0.061584,0.737731,...,1.226060e-02,1.226060e-02,1.226060e-02,1.226060e-02,5.065905e-05,1.233830e-02,0.023291,2.329083e-02,1.226057e-02,2


Split the data into training and testing sets with respect to time steps.

**Training set**: Time steps 1 to 34

**Testing set**: Time steps 35 to 49

In [26]:
# Goal: binary classification of 0,1
# 0: licit, 1: illicit

X_data = df_txs_features_selected.loc[(df_txs_features_selected['Time step'] < 35) & (df_txs_features_selected['class'] != 3), 'txId']
X_training_timesteps = df_txs_features_selected.loc[df_txs_features_selected['txId'].isin(X_data)]
X_train = X_training_timesteps.drop(columns=['txId', 'class', 'Time step'])

X_data_test = df_txs_features_selected.loc[(df_txs_features_selected['Time step'] >= 35) & (df_txs_features_selected['class'] != 3), 'txId']
X_testing_timesteps = df_txs_features_selected.loc[df_txs_features_selected['txId'].isin(X_data_test)]
X_test = X_testing_timesteps.drop(columns=['txId', 'class', 'Time step'])

y_training_timesteps = X_training_timesteps[['class']]
y_training_timesteps = y_training_timesteps['class'].apply(lambda x: 0 if x == 2 else 1 ) # change illicit (class-2) to '0' for classification
y_train = y_training_timesteps

y_testing_timesteps = X_testing_timesteps[['class']]
y_testing_timesteps = y_testing_timesteps['class'].apply(lambda x: 0 if x == 2 else 1 ) # change illicit (class-2) to '0' for classification
y_test = y_testing_timesteps

Run classifiers (LR, RF, MLP, XGB):

In [27]:
# LOGISTIC REGRESSION (LR)
cLR = LogisticRegression(max_iter=1000).fit(X_train.values,y_train.values)
y_preds_LR = cLR.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_LR)

print("Logistic Regression")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_LR, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Logistic Regression
Precision: 0.327 
Recall: 0.707 
F1 Score: 0.448
Micro-Average F1 Score: 0.884


In [28]:
# RANDOM FOREST (RF)
cRF = RandomForestClassifier(n_estimators=50).fit(X_train.values,y_train.values)
y_preds_RF = cRF.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_RF)

print("Random Forest")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_RF, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Random Forest
Precision: 0.961 
Recall: 0.719 
F1 Score: 0.823
Micro-Average F1 Score: 0.979


In [29]:
# MULTILAYER PERCEPTRON (MLP)
cMLP = MLPClassifier(solver='adam', learning_rate_init=0.001, max_iter=200).fit(X_train.values,y_train.values)
y_preds_MLP = cMLP.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_MLP)

print("Multilayer Perceptron (MLP)")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_MLP, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Multilayer Perceptron (MLP)
Precision: 0.600 
Recall: 0.622 
F1 Score: 0.611
Micro-Average F1 Score: 0.947


In [30]:
# XGBOOST (XGB)
cXGB = xgb.XGBClassifier(objective="multi:softmax", num_class=2, random_state=42)
cXGB.fit(X_train.values, y_train.values)
y_preds_XGB = cXGB.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGB)

print("XGBOOST")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_XGB, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))
#print(confusion_matrix(y, y_pred))

XGBOOST
Precision: 0.906 
Recall: 0.733 
F1 Score: 0.811
Micro-Average F1 Score: 0.977


Run ensemble classifiers (RF+MLP, RF+XGB, MLP+XGB, RF+MLP+XGB):

In [32]:
#create a dictionary of our models
estimatorsXGBRF=[('RF', cRF), ('XGB', cXGB)]
#create our voting classifier, inputting our models
ensembleXGBRF = VotingClassifier(estimatorsXGBRF, voting='hard')
ensembleXGBRF.fit(X_train.values, y_train.values)
y_preds_XGBRF = ensembleXGBRF.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGBRF)

print("Ensemble: XGBoost (XGB) + Random Forest (RF)")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_XGBRF, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Ensemble: XGBoost (XGB) + Random Forest (RF)
Precision: 0.980 
Recall: 0.717 
F1 Score: 0.828
Micro-Average F1 Score: 0.980


In [33]:
#create a dictionary of our models
estimatorsMLPXGB=[('MLP', cMLP), ('XGB', cXGB)]
#create our voting classifier, inputting our models
ensembleMLPXGB = VotingClassifier(estimatorsMLPXGB, voting='hard')
ensembleMLPXGB.fit(X_train.values, y_train.values)
y_preds_MLPXGB = ensembleMLPXGB.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_MLPXGB)

print("Ensemble: Multilayer Perceptron (MLP) + XGBoost")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_MLPXGB, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Ensemble: Multilayer Perceptron (MLP) + XGBoost
Precision: 0.973 
Recall: 0.623 
F1 Score: 0.760
Micro-Average F1 Score: 0.974


In [34]:
#create a dictionary of our models
estimatorsRFMLP=[('MLP', cMLP), ('RF', cRF)]
#create our voting classifier, inputting our models
ensembleRFMLP = VotingClassifier(estimatorsRFMLP, voting='hard')
ensembleRFMLP.fit(X_train.values, y_train.values)
y_preds_RFMLP = ensembleRFMLP.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_RFMLP)

print("Ensemble: Random Forest (RF) + Multilayer Perceptron (MLP)")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_RFMLP, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Ensemble: Random Forest (RF) + Multilayer Perceptron (MLP)
Precision: 0.989 
Recall: 0.651 
F1 Score: 0.785
Micro-Average F1 Score: 0.976


In [35]:
#create a dictionary of our models
estimatorsXGBRFMLP=[('XGB', cXGB), ('MLP', cMLP), ('RF', cRF)]
#create our voting classifier, inputting our models
ensembleXGBRFMLP = VotingClassifier(estimatorsXGBRFMLP, voting='hard')
ensembleXGBRFMLP.fit(X_train.values, y_train.values)
y_preds_XGBRFMLP = ensembleXGBRFMLP.predict(X_test.values)
prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGBRFMLP)

print("Ensemble (all): XGBoost + Random Forest (RF) + Multilayer Perceptron (MLP)")
print("Precision: %.3f \nRecall: %.3f \nF1 Score: %.3f"%(prec[1],rec[1],f1[1]))
micro_f1 = f1_score(y_test, y_preds_XGBRFMLP, average='micro')
print("Micro-Average F1 Score: %.3f"%(micro_f1))

Ensemble (all): XGBoost + Random Forest (RF) + Multilayer Perceptron (MLP)
Precision: 0.968 
Recall: 0.723 
F1 Score: 0.828
Micro-Average F1 Score: 0.980


(LSTM not included)

# **Acknowledgements**


---
---


Released by: Youssef Elmougy, Ling Liu



School of Computer Science, Georgia Institute of Technology

Contact: yelmougy3@gatech.edu


---

Github Repository: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)


If you use our dataset in your work, please cite our paper:





>> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.

---

