# Modeling and Fine Tuning

> Owner: Daniel Soukup - Created: 2025.11.01

In this notebook, we load the processed data and fit our models.

## Data loading

In [0]:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
processed_learn = dataiku.Dataset("processed_learn")
processed_learn_df = processed_learn.get_dataframe()

processed_test = dataiku.Dataset("processed_test")
processed_test_df = processed_test.get_dataframe()

processed_learn_df.shape, processed_test_df.shape

In [0]:
processed_learn_df.head()

In [0]:
TARGET = 'income'

In [0]:
X_train, y_train = processed_learn_df.drop(columns=TARGET), processed_learn_df[TARGET]
X_test, y_test = processed_test_df.drop(columns=TARGET), processed_test_df[TARGET]

## Fit Baseline

In [0]:
from xgboost import XGBClassifier

In [0]:
model = XGBClassifier(n_estimators=50, max_depth=2, objective='binary:logistic')

model.fit(X_train, y_train)

## Predict

We save the predicted class and probabilities both:

In [0]:
predictions_learn_df = pd.DataFrame(
    {
        TARGET: y_train,
        'pred': model.predict(X_train),
        'pred_proba': model.predict_proba(X_train)[:, 0]
    }
)

predictions_test_df = pd.DataFrame(
    {
        TARGET: y_test,
        'pred': model.predict(X_test),
        'pred_proba': model.predict_proba(X_test)[:, 0]
    }
)

## Save predictions

In [0]:
# Write recipe outputs
predictions_learn = dataiku.Dataset("predictions_learn")
predictions_learn.write_with_schema(predictions_learn_df)

predictions_test = dataiku.Dataset("predictions_test")
predictions_test.write_with_schema(predictions_test_df)