<a href="https://colab.research.google.com/github/Sameersah/CMPE-255-3/blob/main/Tabular_Kaggle_1_IEEE_Fraud_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install Kaggle

In [5]:
!pip install kaggle



## Mount Drive

In [6]:
from google.colab import drive
drive.mount('mount')

Drive already mounted at mount; to attempt to forcibly remount, call drive.mount("mount", force_remount=True).


## Copy Kaggle.json to working directory

In [7]:
!mkdir -p ~/.kaggle
!cp /content/mount/MyDrive/kaggle.json ~/.kaggle/kaggle.json

## Download compitition files

In [4]:
!rm -rf /content/ieee-fraud-detection
!mkdir -p /content/ieee-fraud-detection
!kaggle competitions download -c ieee-fraud-detection -p /content/ieee-fraud-detection

!unzip /content/ieee-fraud-detection/ieee-fraud-detection.zip -d /content/ieee-fraud-detection/

Downloading ieee-fraud-detection.zip to /content/ieee-fraud-detection
 97% 115M/118M [00:06<00:00, 24.7MB/s]
100% 118M/118M [00:06<00:00, 19.3MB/s]
Archive:  /content/ieee-fraud-detection/ieee-fraud-detection.zip
  inflating: /content/ieee-fraud-detection/sample_submission.csv  
  inflating: /content/ieee-fraud-detection/test_identity.csv  
  inflating: /content/ieee-fraud-detection/test_transaction.csv  
  inflating: /content/ieee-fraud-detection/train_identity.csv  
  inflating: /content/ieee-fraud-detection/train_transaction.csv  


## Install Autogluon

In [3]:
!pip install autogluon



## Use pandas to merge/join CSV files

In [4]:
import pandas as pd
import numpy as np
from autogluon.tabular import TabularPredictor

directory = '/content/ieee-fraud-detection/'
label = 'isFraud'
eval_metric = 'roc_auc'
save_path = directory + 'auto-gluon-model'

train_identity = pd.read_csv(directory + 'train_identity.csv', nrows=1000)
train_transaction = pd.read_csv(directory + 'train_transaction.csv', nrows=1000)

In [5]:
train_data = pd.merge(train_transaction, train_identity, on='TransactionID', how='left')

## Train Model using autoGluon

In [7]:
predictor = TabularPredictor(label=label, eval_metric=eval_metric, path=save_path, verbosity=2).fit(
    train_data.sample(n=1000), presets='medium_quality_faster_train',time_limit=60,num_gpus=1)



Preset alias specified: 'medium_quality_faster_train' maps to 'medium_quality'.
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.1.1
Python Version:     3.10.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Memory Avail:       6.64 GB / 12.67 GB (52.4%)
Disk Space Avail:   65.25 GB / 112.64 GB (57.9%)
Presets specified: ['medium_quality_faster_train']
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "/content/ieee-fraud-detection/auto-gluon-model"
Train Data Rows:    1000
Train Data Columns: 433
Label Column:       isFraud
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [0, 1]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 

## Get Summary of Model Training

In [8]:
results = predictor.fit_summary()

*** Summary of fit() ***
Estimated performance of each model:
                  model  score_val eval_metric  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0   WeightedEnsemble_L2   0.939086     roc_auc       0.232404   9.158270                0.000746           0.131192            2       True         13
1        ExtraTreesEntr   0.931472     roc_auc       0.072306   0.889566                0.072306           0.889566            1       True          9
2        NeuralNetTorch   0.930626     roc_auc       0.159351   8.137512                0.159351           8.137512            1       True         11
3        ExtraTreesGini   0.927242     roc_auc       0.071505   1.067641                0.071505           1.067641            1       True          8
4            LightGBMXT   0.917090     roc_auc       0.011820   3.300994                0.011820           3.300994            1       True          3
5              CatBoost   0.9103

## Predict

### Merge Test Data

In [9]:
import pandas as pd


directory = '/content/ieee-fraud-detection/'
test_identity = pd.read_csv(directory+'test_identity.csv', nrows=1000)
test_transaction = pd.read_csv(directory+'test_transaction.csv', nrows=1000)
test_data = pd.merge(test_transaction, test_identity, on='TransactionID', how='left')  # same join applied to training files



### Predict

In [12]:
y_predproba = predictor.predict_proba(train_data)
y_predproba.head(5)  # some example predicted fraud-probabilities


Unnamed: 0,0,1
0,1.0,1.515025e-09
1,1.0,1.49425e-09
2,1.0,1.580698e-09
3,0.999999,5.853282e-07
4,1.0,1.41933e-10


## Prepare Submissions

In [14]:
submission = pd.read_csv(directory+'sample_submission.csv')
submission['isFraud'] = y_predproba.iloc[:, 1]
submission.head()
submission.to_csv(directory+'my_submission.csv', index=False)

## Submit

In [21]:
file = '/content/ieee-fraud-detection/sample_submission.csv'
!kaggle competitions submit -c ieee-fraud-detection -f /content/ieee-fraud-detection/sample_submission.csv -m "my first submission"

100% 5.80M/5.80M [00:02<00:00, 2.42MB/s]
Successfully submitted to IEEE-CIS Fraud Detection

# DONE!! SUBMITTED YOUR FIRST KAGGLE SUBMISSION