# Intro to creating a model with scikit-learn or XGBoost

All Rights Reserved © <a href="http://www.louisdorard.com">Louis Dorard</a>

-----

In this notebook we show how to create a model with scikit-learn or with XGBoost, on Kaggle's _Give Me Some Credit_ challenge.

## Load prepared datasets

In [None]:
from pandas import read_csv
train_prepared = read_csv('train_prepared_V1.csv', index_col=0)
val_prepared = read_csv('val_prepared_V1.csv', index_col=0)

When using most ML libraries in Python, we need to present data as  separate variables (arrays) for inputs and outputs, e.g. `X_train` and `y_train` for training data.

Let's start with outputs:

In [None]:
target_column = 'SeriousDlqin2yrs'
y_train = train_prepared[target_column].values
print(y_train)

Inputs:

In [None]:
X_train = train_prepared.drop(target_column, axis=1).values
print(X_train)

Likewise for val data:

In [None]:
X_val = val_prepared.drop(target_column, axis=1).values
y_val = val_prepared[target_column].values

## Create model from train set

Initialize model by specifying which learning technique to use:

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier() # this class contains the implementation of the Random Forest learning technique for classification

In [None]:
from xgboost import XGBClassifier
model = XGBClassifier()

This "model" is empty, since it hasn't seen any data yet. Train the model:

In [None]:
model.fit(X_train, y_train)

## Apply model to val set

Getting class probability values:

In [None]:
y_val_proba = model.predict_proba(X_val)
print(y_val_proba)