<h3>Carga y exploración de datos</h3>

In [1]:
import pandas as pd

df = pd.read_csv("creditcard.csv")

print(df.head())
print(df.info())
print(df['Class'].value_counts())


   Time        V1        V2        V3        V4        V5        V6        V7  \
0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   
1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   
2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   
3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   
4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   

         V8        V9  ...       V21       V22       V23       V24       V25  \
0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   
1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   
2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   
3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   
4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   

        V26       V27       V28 

<h3>Preprocesamiento de datos</h3>

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop("Class", axis=1)
y = df["Class"]

scaler = StandardScaler()
X[['Amount', 'Time']] = scaler.fit_transform(X[['Amount', 'Time']])

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)


<h3>Función para evaluación de modelos</h3>

In [3]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def evaluar_modelo(nombre, modelo, X_test, y_test):
    y_pred = modelo.predict(X_test)
    print(f"\nModelo: {nombre}")
    print("Accuracy :", accuracy_score(y_test, y_pred))
    print("Precision:", precision_score(y_test, y_pred))
    print("Recall   :", recall_score(y_test, y_pred))
    print("F1-score :", f1_score(y_test, y_pred))


<h3>Entrenamiento de modelos</h3>

<h4>Decision Tree</h4>

In [4]:
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier(class_weight='balanced', random_state=42)
dt.fit(X_train, y_train)

evaluar_modelo("Decision Tree", dt, X_test, y_test)


Modelo: Decision Tree
Accuracy : 0.9989291106351603
Precision: 0.6761904761904762
Recall   : 0.7244897959183674
F1-score : 0.6995073891625616


<h4>Support Vector Machine (SVM)</h4>

In [5]:
from sklearn.svm import SVC

svm = SVC(kernel='rbf', class_weight='balanced')
svm.fit(X_train, y_train)

evaluar_modelo("SVM", svm, X_test, y_test)



Modelo: SVM
Accuracy : 0.9965239984551104
Precision: 0.3046875
Recall   : 0.7959183673469388
F1-score : 0.4406779661016949


<h4>Random Forest</h4>

In [6]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(
    n_estimators=100,
    class_weight='balanced',
    random_state=42
)
rf.fit(X_train, y_train)

evaluar_modelo("Random Forest", rf, X_test, y_test)



Modelo: Random Forest
Accuracy : 0.9995084442259752
Precision: 0.9605263157894737
Recall   : 0.7448979591836735
F1-score : 0.8390804597701149


<h4>LightGBM</h4>

In [7]:
from lightgbm import LGBMClassifier

lgbm = LGBMClassifier(
    n_estimators=100,
    class_weight='balanced',
    random_state=42
)
lgbm.fit(X_train, y_train)

evaluar_modelo("LightGBM", lgbm, X_test, y_test)


[LightGBM] [Info] Number of positive: 394, number of negative: 227451
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.021034 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 7650
[LightGBM] [Info] Number of data points in the train set: 227845, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Info] Start training from score 0.000000

Modelo: LightGBM
Accuracy : 0.999403110845827
Precision: 0.826530612244898
Recall   : 0.826530612244898
F1-score : 0.826530612244898


<h4>XGBoost</h4>

In [8]:
from xgboost import XGBClassifier

xgb = XGBClassifier(
    n_estimators=100,
    scale_pos_weight=len(y_train[y_train==0]) / len(y_train[y_train==1]),
    eval_metric='logloss',
    random_state=42
)
xgb.fit(X_train, y_train)

evaluar_modelo("XGBoost", xgb, X_test, y_test)



Modelo: XGBoost
Accuracy : 0.9995259997893332
Precision: 0.8817204301075269
Recall   : 0.8367346938775511
F1-score : 0.8586387434554974
