# **Supervised Learning**

## Metaverse Financial Transactions

<img src="../images/ia.jpg" alt="Inteligência Artificial">

## **Introduction**

### **About Dataset**

The Metaverse Financial Transactions Dataset offers a comprehensive collection of blockchain-based financial activities within the Open Metaverse. With 78,600 transactions, it provides detailed attributes like timestamps, addresses, transaction types, and risk scores. This dataset supports research in fraud detection, risk assessment, and user behavior analysis in virtual environments. It's curated to reflect the complexity of blockchain activities, aiding the development of secure digital asset management. 

### **About the Problem**

The objective of this project is to develop predictive models for anomaly detection, fraud analysis, and risk assessment in financial transactions within the Metaverse. This constitutes a multivariate classification problem, where the aim is to categorize transactions into different risk levels based on attributes such as transaction type, user behavior, and transaction amount. The target variable is the risk level, which can be categorized as high risk, moderate risk, or low risk.

### **About the Solution**

The solution to this problem involves employing supervised learning techniques on the provided Metaverse Financial Transactions Dataset. A portion of the dataset will be designated as the training set, utilized to train the model, while another portion will serve as the test set for model evaluation.

The chosen supervised learning algorithm will undergo training on the training set, learning patterns and relationships between various attributes of Metaverse financial transactions. Subsequently, the trained model will be evaluated using the test set to assess its performance and generalization ability.

The primary evaluation metric for this solution will be accuracy, representing the percentage of correctly classified transactions by the model. Achieving high accuracy is crucial for ensuring the effectiveness of the model in accurately categorizing transactions into their respective risk levels.

---

This project was made by:

| Name | UP |
|-|-|
| Ana Carolina Coutinho | up202108685 |
| José Costa | upxxxx |
| Afonso Poças | upxxxx |


### Library Installation

Before proceeding, make sure you have the necessary libraries installed for this project. 
Open your terminal and navigate to the project's root directory. Then, run the following command:

```bash
pip install -r requirements.txt
```

After installing the required libraries, import them into your project. 
Also, let's suppress any warnings to keep the notebook clean and organized.


### Create a dataframe with the dataset from the csv file

In [None]:
import warnings # Needed to ignore warnings
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import math
from sklearn.preprocessing import LabelEncoder, StandardScaler, RobustScaler, MinMaxScaler, MaxAbsScaler
from sklearn.model_selection import train_test_split, cross_val_score, validation_curve, KFold
from sklearn.ensemble import RandomForestClassifier,BaggingClassifier, AdaBoostClassifier, ExtraTreesClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.decomposition import KernelPCA
from imblearn.under_sampling import AllKNN
from xgboost import XGBClassifier, plot_tree
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from catboost import CatBoostClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import AllKNN
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc, precision_recall_curve, average_precision_score, ConfusionMatrixDisplay
from pycaret.classification import *

warnings.filterwarnings('ignore')


In [26]:
df = pd.read_csv('../data/metaverse_transactions_dataset.csv')

df.head()

  return method()
  return method()


Unnamed: 0,timestamp,hour_of_day,sending_address,receiving_address,amount,transaction_type,location_region,ip_prefix,login_frequency,session_duration,purchase_pattern,age_group,risk_score,anomaly
0,2022-04-11 12:47:27,12,0x9d32d0bf2c00f41ce7ca01b66e174cc4dcb0c1da,0x39f82e1c09bc6d7baccc1e79e5621ff812f50572,796.949206,transfer,Europe,192.0,3,48,focused,established,18.75,low_risk
1,2022-06-14 19:12:46,19,0xd6e251c23cbf52dbd472f079147873e655d8096f,0x51e8fbe24f124e0e30a614e14401b9bbfed5384c,0.01,purchase,South America,172.0,5,61,focused,established,25.0,low_risk
2,2022-01-18 16:26:59,16,0x2e0925b922fed01f6a85d213ae2718f54b8ca305,0x52c7911879f783d590af45bda0c0ef2b8536706f,778.19739,purchase,Asia,192.168,3,74,focused,established,31.25,low_risk
3,2022-06-15 09:20:04,9,0x93efefc25fcaf31d7695f28018d7a11ece55457f,0x8ac3b7bd531b3a833032f07d4e47c7af6ea7bace,300.838358,transfer,South America,172.0,8,111,high_value,veteran,36.75,low_risk
4,2022-02-18 14:35:30,14,0xad3b8de45d63f5cce28aef9a82cf30c397c6ceb9,0x6fdc047c2391615b3facd79b4588c7e9106e49f2,775.569344,sale,Africa,172.16,6,100,high_value,veteran,62.5,moderate_risk


# Data preprocessing

### Pre analysis

This should be done before any data manipulation, to get a better understanding of the data and filter out any outliers.

In [27]:
df.describe()

  return method()


Unnamed: 0,hour_of_day,amount,ip_prefix,login_frequency,session_duration,risk_score
count,78600.0,78600.0,78600.0,78600.0,78600.0,78600.0
mean,11.532634,502.574903,147.64443,4.178702,69.684606,44.956722
std,6.935897,245.898146,69.388143,2.366038,40.524476,21.775365
min,0.0,0.01,10.0,1.0,20.0,15.0
25%,6.0,331.319966,172.0,2.0,35.0,26.25
50%,12.0,500.0295,172.16,4.0,60.0,40.0
75%,18.0,669.528311,192.0,6.0,100.0,52.5
max,23.0,1557.150905,192.168,8.0,159.0,100.0


In [28]:
# Check missing values
df.isna().any()

timestamp            False
hour_of_day          False
sending_address      False
receiving_address    False
amount               False
transaction_type     False
location_region      False
ip_prefix            False
login_frequency      False
session_duration     False
purchase_pattern     False
age_group            False
risk_score           False
anomaly              False
dtype: bool