<a href="https://colab.research.google.com/github/ashivashankars/archana_Autogluon_repo/blob/main/AutoGluon_for_Kaggle(ieee_fraud_detection)_competitions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##IEEE-CIS Fraud Detection
Can you detect fraud from customer transactions?

Description
Imagine standing at the check-out counter at the grocery store with a long line behind you and the cashier not-so-quietly announces that your card has been declined. In this moment, you probably aren’t thinking about the data science that determined your fate.

Embarrassed, and certain you have the funds to cover everything needed for an epic nacho party for 50 of your closest friends, you try your card again. Same result. As you step aside and allow the cashier to tend to the next customer, you receive a text message from your bank. “Press 1 if you really tried to spend $500 on cheddar cheese.”

While perhaps cumbersome (and often embarrassing) in the moment, this fraud prevention system is actually saving consumers millions of dollars per year. Researchers from the IEEE Computational Intelligence Society (IEEE-CIS) want to improve this figure, while also improving the customer experience. With higher accuracy fraud detection, you can get on with your chips without the hassle.

IEEE-CIS works across a variety of AI and machine learning areas, including deep neural networks, fuzzy systems, evolutionary computation, and swarm intelligence. Today they’re partnering with the world’s leading payment service company, Vesta Corporation, seeking the best solutions for fraud prevention industry, and now you are invited to join the challenge.

In this competition, you’ll benchmark machine learning models on a challenging large-scale dataset. The data comes from Vesta's real-world e-commerce transactions and contains a wide range of features from device type to product features. You also have the opportunity to create new features to improve your results.

If successful, you’ll improve the efficacy of fraudulent transaction alerts for millions of people around the world, helping hundreds of thousands of businesses reduce their fraud loss and increase their revenue. And of course, you will save party people just like you the hassle of false positives.

Acknowledgements:



Vesta Corporation provided the dataset for this competition. Vesta Corporation is the forerunner in guaranteed e-commerce payment solutions. Founded in 1995, Vesta pioneered the process of fully guaranteed card-not-present (CNP) payment transactions for the telecommunications industry. Since then, Vesta has firmly expanded data science and machine learning capabilities across the globe and solidified its position as the leader in guaranteed ecommerce payments. Today, Vesta guarantees more than $18B in transactions annually.

Header Photo by Tim Evans on Unsplash

##This tutorial will teach you how to use AutoGluon to become a serious Kaggle competitor without writing lots of code. We first outline the general steps to use AutoGluon in Kaggle contests. Here, we assume the competition involves tabular data which are stored in one (or more) CSV files.

##1. Run Bash command: pip install kaggle!

In [8]:
!pip -q install -U kaggle

##2. Navigate to: https://www.kaggle.com/account and create an account (if necessary). Then , click on “Create New API Token” and move downloaded file to this location on your machine: ~/.kaggle/kaggle.json. For troubleshooting, see Kaggle API instructions.

In [1]:
from google.colab import files
files.upload()  # choose kaggle.json you downloaded from Kaggle > Account > Create New API Token

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json


Saving kaggle.json to kaggle.json


##3. To download data programmatically: Execute this Bash command in your terminal:

kaggle competitions download -c [COMPETITION]

Here, [COMPETITION] should be replaced by the name of the competition you wish to enter. Alternatively, you can download data manually: Just navigate to website of the Kaggle competition you wish to enter, click “Download All”, and accept the competition’s terms.

In [2]:
import json, os
p = os.path.expanduser('~/.kaggle/kaggle.json')
creds = json.load(open(p))
print("Using Kaggle account:", creds["username"])


Using Kaggle account: archanshivashankar


In [3]:
!kaggle competitions files -c ieee-fraud-detection


name                         size  creationDate                
---------------------  ----------  --------------------------  
sample_submission.csv     6080314  2019-07-15 00:19:01.536000  
test_identity.csv        25797161  2019-07-15 00:19:01.536000  
test_transaction.csv    613194934  2019-07-15 00:19:01.536000  
train_identity.csv       26529680  2019-07-15 00:19:01.536000  
train_transaction.csv   683351067  2019-07-15 00:19:01.536000  


In [4]:
!kaggle competitions download -c ieee-fraud-detection -p /content -w
# (The competition slug is all lowercase: ieee-fraud-detection)


Downloading ieee-fraud-detection.zip to .
  0% 0.00/118M [00:00<?, ?B/s]
100% 118M/118M [00:00<00:00, 1.75GB/s]


##4. If the competition’s training data is comprised of multiple CSV files, use pandas to properly merge/join them into a single data table where rows = training examples, columns = features.

In [5]:
import zipfile
import os

zip_file_path = '/content/ieee-fraud-detection.zip'
destination_path = '/content/'

if os.path.exists(zip_file_path):
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(destination_path)
    print(f"Extracted {zip_file_path} to {destination_path}")
else:
    print(f"Error: {zip_file_path} not found. Please ensure the competition data is downloaded.")

Extracted /content/ieee-fraud-detection.zip to /content/


4(a): we first load the competition’s training data into Python:

In [11]:
import pandas as pd
import numpy as np
from pathlib import Path
from autogluon.tabular import TabularPredictor

directory = Path("/content")  # directory where you have downloaded the data CSV files from the competition
label = 'isFraud'  # name of target variable to predict in this competition
eval_metric = 'roc_auc'  # Optional: specify that competition evaluation metric is AUC
save_path = directory/'AutoGluonModels/'  # where to store trained models

train_identity = pd.read_csv(directory/'train_identity.csv')
train_transaction = pd.read_csv(directory/'train_transaction.csv')

4(b):Since the training data for this competition is comprised of multiple CSV files, we just first join them into a single large table (with rows = examples, columns = features) before applying AutoGluon:

In [12]:
train_data = pd.merge(train_transaction, train_identity, on='TransactionID', how='left')

4(c):we specify the presets argument to maximize AutoGluon’s predictive accuracy which usually requires that you run fit()

In [13]:
predictor = TabularPredictor(label=label, eval_metric=eval_metric, path=save_path, verbosity=3).fit(
    train_data, presets='best_quality', time_limit=3600
)

results = predictor.fit_summary()

Verbosity: 3 (Detailed Logging)
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          12
GPU Count:          1
Memory Avail:       76.01 GB / 83.47 GB (91.1%)
Disk Space Avail:   188.70 GB / 235.68 GB (80.1%)
Presets specified: ['best_quality']
User Specified kwargs:
{'auto_stack': True, 'num_bag_sets': 1}
Full kwargs:
{'_experimental_dynamic_hyperparameters': False,
 '_feature_generator_kwargs': None,
 '_save_bag_folds': None,
 'ag_args': None,
 'ag_args_ensemble': None,
 'ag_args_fit': None,
 'auto_stack': True,
 'calibrate': 'auto',
 'delay_bag_sets': False,
 'ds_args': {'clean_up_fits': True,
             'detection_time_frac': 0.25,
             'enable_callbacks': False,
             'enable_ray_logging': True,
             'holdout_data': None,
             'holdout_frac': 0.1111111111111111,
             'memory_safe_fits': True,
             'n_folds

[36m(_ray_fit pid=11081)[0m [50]	valid_set's binary_logloss: 0.0963321
[36m(_ray_fit pid=11079)[0m [100]	valid_set's binary_logloss: 0.088068[32m [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
[36m(_ray_fit pid=11079)[0m [150]	valid_set's binary_logloss: 0.0833867[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=11079)[0m [200]	valid_set's binary_logloss: 0.0802051[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=11079)[0m [250]	valid_set's binary_logloss: 0.0777755[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=11079)[0m [300]	valid_set's binary_logloss: 0.0757618[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=11079)[0m [350]	valid_set's binary_logloss: 0.0741231[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=11079)[0m [400]	valid_s

[36m(_ray_fit pid=11081)[0m 	Ran out of time, early stopping on iteration 1650. Best iteration is:
[36m(_ray_fit pid=11081)[0m 	[1649]	valid_set's binary_logloss: 0.0577122
[36m(_ray_fit pid=11079)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=11082)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1/S1F4/model.pkl
[36m(_ray_fit pid=11079)[0m 	Ran out of time, early stopping on iteration 1655. Best iteration is:[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=11079)[0m 	[1655]	valid_set's binary_logloss: 0.0577996[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=12372)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}
[36m(_ray_fit pid=11080)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1/S1F1/model.pkl[32m [repeated 3x across cluster][0m


[36m(_ray_fit pid=12375)[0m [50]	valid_set's binary_logloss: 0.0953804[32m [repeated 2x across cluster][0m
[36m(_ray_fit pid=12373)[0m [100]	valid_set's binary_logloss: 0.085936[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12373)[0m [150]	valid_set's binary_logloss: 0.0804302[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12372)[0m [200]	valid_set's binary_logloss: 0.0790555[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12372)[0m [250]	valid_set's binary_logloss: 0.0765972[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12372)[0m [300]	valid_set's binary_logloss: 0.0746635[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12372)[0m [350]	valid_set's binary_logloss: 0.072986[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12373)[0m [400]	valid_set's binary_logloss: 0.0695193[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=12372)[0m [450]	valid_set's binary_logloss: 0.0704401[32m [repeated 4x across cluster]

[36m(_ray_fit pid=12372)[0m 	Ran out of time, early stopping on iteration 1660. Best iteration is:
[36m(_ray_fit pid=12372)[0m 	[1660]	valid_set's binary_logloss: 0.0567617
[36m(_ray_fit pid=12374)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=12372)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1/S1F5/model.pkl
[36m(_ray_fit pid=12374)[0m 	Ran out of time, early stopping on iteration 1643. Best iteration is:[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=12374)[0m 	[1643]	valid_set's binary_logloss: 0.0550345[32m [repeated 3x across cluster][0m
[36m(_dystack pid=10056)[0m 	0.9588	 = Validation score   (roc_auc)
[36m(_dystack pid=10056)[0m 	474.74s	 = Training   runtime
[36m(_dystack pid=10056)[0m 	68.03s	 = Validation runtime
[36m(_dystack pid=10056)[0m 	964.6	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dys

[36m(_ray_fit pid=13783)[0m [50]	valid_set's binary_logloss: 0.0912408[32m [repeated 2x across cluster][0m


[36m(_ray_fit pid=13785)[0m 	Ran out of time, early stopping on iteration 78. Best iteration is:
[36m(_ray_fit pid=13785)[0m 	[78]	valid_set's binary_logloss: 0.0814447
[36m(_ray_fit pid=13784)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05}[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=13785)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/S1F4/model.pkl
[36m(_ray_fit pid=13790)[0m 	Ran out of time, early stopping on iteration 75. Best iteration is:[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=13790)[0m 	[75]	valid_set's binary_logloss: 0.0835502[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=14173)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05}
[36m(_ray_fit pid=13790)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/S1F1/model.pkl[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=14174)[0m 	Fitting 10000 rounds... Hyperparame

[36m(_ray_fit pid=14174)[0m [50]	valid_set's binary_logloss: 0.0906613[32m [repeated 4x across cluster][0m


[36m(_ray_fit pid=14173)[0m 	Ran out of time, early stopping on iteration 79. Best iteration is:
[36m(_ray_fit pid=14173)[0m 	[79]	valid_set's binary_logloss: 0.0827676
[36m(_ray_fit pid=14176)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05}[32m [repeated 2x across cluster][0m
[36m(_ray_fit pid=14174)[0m 	Ran out of time, early stopping on iteration 80. Best iteration is:
[36m(_ray_fit pid=14174)[0m 	[80]	valid_set's binary_logloss: 0.0845878
[36m(_ray_fit pid=14173)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/S1F6/model.pkl
[36m(_dystack pid=10056)[0m 	0.9122	 = Validation score   (roc_auc)
[36m(_dystack pid=10056)[0m 	71.22s	 = Training   runtime
[36m(_dystack pid=10056)[0m 	4.26s	 = Validation runtime
[36m(_dystack pid=10056)[0m 	15419.9	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=10056)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack

[36m(_ray_fit pid=14727)[0m [50]	valid_set's binary_logloss: 0.0680904[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14727)[0m [100]	valid_set's binary_logloss: 0.0629021[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14727)[0m [150]	valid_set's binary_logloss: 0.0589571[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14726)[0m [200]	valid_set's binary_logloss: 0.0553215[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14726)[0m [250]	valid_set's binary_logloss: 0.0541586[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14728)[0m [300]	valid_set's binary_logloss: 0.0510983[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14728)[0m [350]	valid_set's binary_logloss: 0.0508322[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14728)[0m [400]	valid_set's binary_logloss: 0.0506422[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=14728)[0m [450]	valid_set's binary_logloss: 0.0504115[32m [repeated 4x across cluste

[36m(_ray_fit pid=14726)[0m 	Ran out of time, early stopping on iteration 725. Best iteration is:
[36m(_ray_fit pid=14726)[0m 	[725]	valid_set's binary_logloss: 0.0512399
[36m(_ray_fit pid=14727)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=14726)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2/S1F1/model.pkl
[36m(_ray_fit pid=14727)[0m 	Ran out of time, early stopping on iteration 714. Best iteration is:[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=14727)[0m 	[714]	valid_set's binary_logloss: 0.0514595[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=15475)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}
[36m(_ray_fit pid=14727)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2/S1F4/model.pkl[32m [repeated 3x across cluster][0m
[36m(_ray_fit p

[36m(_ray_fit pid=15474)[0m [50]	valid_set's binary_logloss: 0.0671704[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15473)[0m [100]	valid_set's binary_logloss: 0.0597075[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15472)[0m [150]	valid_set's binary_logloss: 0.0564712[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15472)[0m [200]	valid_set's binary_logloss: 0.0543127[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15472)[0m [250]	valid_set's binary_logloss: 0.0535037[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15472)[0m [300]	valid_set's binary_logloss: 0.0526323[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15472)[0m [350]	valid_set's binary_logloss: 0.0522169[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15474)[0m [400]	valid_set's binary_logloss: 0.0527908[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=15474)[0m [450]	valid_set's binary_logloss: 0.0521935[32m [repeated 4x across cluste

[36m(_ray_fit pid=15475)[0m 	Ran out of time, early stopping on iteration 694. Best iteration is:
[36m(_ray_fit pid=15475)[0m 	[694]	valid_set's binary_logloss: 0.0509034
[36m(_ray_fit pid=15473)[0m 	Fitting 10000 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}[32m [repeated 2x across cluster][0m
[36m(_ray_fit pid=15472)[0m 	Ran out of time, early stopping on iteration 705. Best iteration is:
[36m(_ray_fit pid=15472)[0m 	[702]	valid_set's binary_logloss: 0.0508138
[36m(_ray_fit pid=15472)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2/S1F7/model.pkl
[36m(_dystack pid=10056)[0m 	0.9632	 = Validation score   (roc_auc)
[36m(_dystack pid=10056)[0m 	233.16s	 = Training   runtime
[36m(_dystack pid=10056)[0m 	33.53s	 = Validation runtime
[36m(_dystack pid=10056)[0m 	620.1	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=10056)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models

*** Summary of fit() ***
Estimated performance of each model:
                 model  score_val eval_metric  pred_time_val     fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0  WeightedEnsemble_L3   0.973575     roc_auc     302.010587  2239.423150                0.112242          11.517788            3       True          6
1      LightGBM_BAG_L2   0.973178     roc_auc     263.561942  1793.902466                4.119410         142.279027            2       True          5
2    LightGBMXT_BAG_L2   0.972793     roc_auc     297.778936  2085.626334               38.336403         434.002896            2       True          4
3    LightGBMXT_BAG_L1   0.970375     roc_auc     225.577895  1423.141122              225.577895        1423.141122            1       True          1
4  WeightedEnsemble_L2   0.970375     roc_auc     225.690872  1428.894601                0.112977           5.753479            2       True          3
5      LightGBM_BAG_L1   0

In [22]:
import pandas as pd
import numpy as np
from pathlib import Path

directory = Path("/content")
ID_COL = "TransactionID"
LABEL  = "isFraud"  # your training label

# --- load & merge test ---
test_identity    = pd.read_csv(directory/"test_identity.csv")
test_transaction = pd.read_csv(directory/"test_transaction.csv")
test_data = test_transaction.merge(test_identity, on=ID_COL, how="left")

# drop accidental index columns
test_data = test_data.loc[:, ~test_data.columns.astype(str).str.startswith("Unnamed")]

# don't pass target
if LABEL in test_data.columns:
    test_data = test_data.drop(columns=[LABEL])

# --- align to exactly the features used in training ---
feat = predictor.features()            # the columns AutoGluon expects
test_data = test_data.reindex(columns=feat)   # no fill_value; new cols will be NaN

# --- make dtypes NumPy-friendly; replace pd.NA with np.nan ---
test_data = test_data.replace({pd.NA: np.nan})

for col in test_data.columns:
    dt = test_data[col].dtype
    # pandas nullable integers (Int64/Int32/Int16) -> float (allows np.nan)
    if str(dt) in {"Int64", "Int32", "Int16", "UInt64", "UInt32", "UInt16"}:
        test_data[col] = test_data[col].astype("float32")
    # pandas nullable boolean -> float (0.0/1.0 + nan)
    elif str(dt) == "boolean":
        test_data[col] = test_data[col].astype("float32")
    # pandas string dtype -> plain object (np.nan-compatible)
    elif pd.api.types.is_string_dtype(dt):
        test_data[col] = test_data[col].astype("object")

# --- predict ---
y_predproba = predictor.predict_proba(test_data)

Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/WeightedEnsemble_L3/model.pkl


In [23]:
y_predproba.head(5)

Unnamed: 0,0,1
0,0.999558,0.000442
1,0.999841,0.000159
2,0.99983,0.00017
3,0.99953,0.00047
4,0.999817,0.000183


For binary classification tasks, you can see which class AutoGluon’s predicted probabilities correspond to via:

In [24]:
predictor.positive_class

1

For multiclass classification tasks, you can see which classes AutoGluon’s predicted probabilities correspond to via:

In [25]:
predictor.class_labels  # classes in this list correspond to columns of predict_proba() output

[0, 1]

Now, let’s get prediction probabilities for the entire test data, while only getting the positive class predictions by specifying:

In [26]:
y_predproba = predictor.predict_proba(test_data, as_multiclass=False)

Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/WeightedEnsemble_L3/model.pkl


Now that we have made a prediction for each row in the test dataset, we can submit these predictions to Kaggle. Most Kaggle competitions provide a sample submission file, in which you can simply overwrite the sample predictions with your own as we do below:

In [27]:
submission = pd.read_csv(directory/'sample_submission.csv')
submission['isFraud'] = y_predproba
submission.head()
submission.to_csv(directory/'archie_submission.csv', index=False)

##To submit your predictions to Kaggle, you can run the following command in your terminal

In [29]:
!kaggle competitions submit -c ieee-fraud-detection -f archie_submission.csv -m "my first submission"

100% 10.0M/10.0M [00:00<00:00, 19.7MB/s]
Successfully submitted to IEEE-CIS Fraud Detection

In [34]:
%%bash
set -euo pipefail

# 0) Install helper (optional)
pip -q install nbstripout

# 1) Clone your repo
cd /content
rm -rf repo
git clone https://github.com/ashivashankars/archana_AutoGluon_repo.git repo
cd repo

echo "== nbstripout status =="
nbstripout --status || true

echo -e "\n== .gitattributes mentions =="
grep -n "nbstripout" .gitattributes .git/info/attributes 2>/dev/null || echo "No nbstripout in attributes"

echo -e "\n== Git config entries =="
git config --list --show-origin | grep -i nbstripout || echo "No nbstripout in git config"

echo -e "\n== Hooks mentioning strip/nbconvert =="
grep -nE "nbstripout|nbconvert|strip.*output" .git/hooks/* 2>/dev/null || echo "No matching hooks"

echo -e "\n== Pre-commit config (if any) =="
test -f .pre-commit-config.yaml && grep -nE "nbstripout|nbconvert|jupyter|strip" .pre-commit-config.yaml || echo "No pre-commit config"

echo -e "\n== GitHub Actions that might strip outputs =="
test -d .github/workflows && grep -nRE "nbstripout|nbconvert|jupyter|strip.*output" .github/workflows 2>/dev/null || echo "No workflow steps found"


== nbstripout status ==
nbstripout is not installed in repository '/content/repo'

== .gitattributes mentions ==
No nbstripout in attributes

== Git config entries ==
No nbstripout in git config

== Hooks mentioning strip/nbconvert ==
No matching hooks

== Pre-commit config (if any) ==
No pre-commit config

== GitHub Actions that might strip outputs ==
No workflow steps found


Cloning into 'repo'...
