## <span style="color:#ff5f27">👨🏻‍🏫 Train Ranking Model </span>

In this notebook, we will train a ranking model using gradient boosted trees. 

In [3]:
import pandas as pd
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report, precision_recall_fscore_support
import joblib

In [1]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

2025-06-24 12:00:50,715 INFO: Initializing external client
2025-06-24 12:00:50,721 INFO: Base URL: https://c.app.hopsworks.ai:443
2025-06-24 12:00:52,311 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1220788


### get feature groups

In [2]:

users_fg = fs.get_feature_group(
    name="users",
    version=1
)

events_fg = fs.get_feature_group(
    name="events",
    version=1
)

interactions_fg = fs.get_feature_group(
    name="interactions",
    version=1
)
weather_rank_fg = fs.get_feature_group(
    name="weather_ranking",
    version=1
)

no_weather_rank_fg = fs.get_feature_group(
    name="no_weather_ranking",
    version=1
)

## <span style="color:#ff5f27">⚙️ Feature View Creation </span>

In [38]:
# Select features
selected_features_customers = users_fg.select_all()

fs.get_or_create_feature_view( 
    name='users',
    query=selected_features_customers,
    version=1,
)

<hsfs.feature_view.FeatureView at 0x7fd3a81830d0>

In [39]:
# Select features
selected_features_articles = events_fg.select_all()

fs.get_or_create_feature_view(
    name='events',
    query=selected_features_articles,
    version=1,
)

<hsfs.feature_view.FeatureView at 0x7fd3a876ab30>

In [3]:
selected_features_intractions = interactions_fg.select_all()

fs.get_or_create_feature_view(
    name='interactions',
    query=selected_features_intractions,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1220788/fs/1208418/fv/interactions/version/1


<hsfs.feature_view.FeatureView at 0x738b8b42bbe0>

In [5]:
# Lists of selected features for weather-based and no-weather models
NO_WEATHER_SELECTED_FEATURES =['interaction_distance_to_event', 'event_type', 'event_city','title',
       'attendance_rate', 'event_indoor_capability', 'user_city', 'age',
       'user_interests','interaction_label']

WEATHER_SELECTED_FEATURES =['interaction_distance_to_event', 'event_type', 'event_city','title', 
       'weather_condition', 'temperature','precipitation', 'attendance_rate',
       'event_indoor_capability', 'user_city', 'indoor_outdoor_preference',
       'age', 'user_interests','user_weather_condition', 'user_temperature',	'user_precipitation','interaction_label']


In [40]:
# Select weather features
features_weather_ranking = weather_rank_fg.select(WEATHER_SELECTED_FEATURES)
# Select no weather features
features_no_weather_ranking = no_weather_rank_fg.select(NO_WEATHER_SELECTED_FEATURES)

In [41]:
# Create feature view for weather ranking
feature_view_ranking_weather = fs.get_or_create_feature_view(
    name='weather_ranking_2',
    query=features_weather_ranking,
    labels=['interaction_label'],
    version=1,
)
# Create feature view for no weather ranking
feature_view_ranking_no_weather = fs.get_or_create_feature_view(
    name='no_weather_ranking_2',
    query=features_no_weather_ranking,
    labels=["interaction_label"],
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1220788/fs/1208418/fv/weather_ranking_2/version/1
Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1220788/fs/1208418/fv/no_weather_ranking_2/version/1


---

---

---

---

## <span style="color:#ff5f27">🗄️ Train Data loading </span>

In [6]:
# Get feature views weather ranking
feature_view_ranking_weather = fs.get_feature_view(name='weather_ranking_2', version=1)


In [7]:
# Get feature views no weather ranking
feature_view_ranking_no_weather = fs.get_feature_view(name='no_weather_ranking_2', version=1)


In [8]:
# Get training and validation data directly from feature views for weather ranking
weather_X_train, weather_X_val, weather_y_train, weather_y_val = \
    feature_view_ranking_weather.train_test_split(
    test_size=0.1,
    description='Weather ranking training dataset',
)


Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (22.92s) 



In [9]:

# Get training and validation data directly from feature views for no weather ranking
no_weather_X_train, no_weather_X_val, no_weather_y_train, no_weather_y_val = \
    feature_view_ranking_no_weather.train_test_split(
    test_size=0.1,
    description='No-weather ranking training dataset',
)


Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (15.61s) 



In [18]:
weather_X_train.columns


Index(['interaction_distance_to_event', 'event_type', 'event_city',
       'weather_condition', 'temperature', 'precipitation', 'attendance_rate',
       'event_indoor_capability', 'user_city', 'indoor_outdoor_preference',
       'age', 'user_interests', 'user_weather_condition', 'user_temperature',
       'user_precipitation'],
      dtype='object')

In [10]:
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report, precision_recall_fscore_support
import numpy as np


def train_catboost(
    train_df, val_df, train_y, val_y
):
    # Identify categorical features
    cat_features = train_df.select_dtypes(include=["object", "bool"]).columns.tolist()

    # Create CatBoost Pools
    train_pool = Pool(train_df, train_y, cat_features=cat_features)
    val_pool = Pool(val_df, val_y, cat_features=cat_features)

    # Calculate class weights
    pos_weight = len(train_y[train_y == 0]) / len(train_y[train_y == 1])

    # Train the model
    model = CatBoostClassifier(
        learning_rate=0.01,
        iterations=100,
        depth=5,
        early_stopping_rounds=5,
        use_best_model=True,
        scale_pos_weight=pos_weight,  
        verbose=False
    )

    model.fit(train_pool, eval_set=val_pool)

    # Evaluation
    preds = model.predict(val_pool)
    precision, recall, fscore, _ = precision_recall_fscore_support(val_y, preds, average="binary")
    print("\nClassification Report:")
    print(classification_report(val_y, preds))

    metrics = {
        "precision": precision,
        "recall": recall,
        "fscore": fscore,
    }
    
    preds_proba = model.predict_proba(val_pool)[:, 1] 
    print("Predicted Class Distribution:", np.unique(preds_proba, return_counts=True))

    return model, metrics, val_pool

In [9]:
weather_y_val.value_counts()

interaction_label
1                    35448
0                    14292
Name: count, dtype: int64

In [11]:
# Use this function to train on your weather / no-weather datasets
weather_model, weather_metrics, weather_val_pool = train_catboost(
    train_df=weather_X_train,
    val_df=weather_X_val,
    train_y=weather_y_train,
    val_y=weather_y_val
)

# #Save the models using Joblib
joblib.dump(weather_model, '/home/nkama/masters_thesis_project/thesis/models/weather_ranking_model.pkl')
print("\nModels saved successfully!")




Classification Report:



              precision    recall  f1-score   support

           0       0.00      0.00      0.00     14292
           1       0.71      1.00      0.83     35448

    accuracy                           0.71     49740
   macro avg       0.36      0.50      0.42     49740
weighted avg       0.51      0.71      0.59     49740

Predicted Class Distribution: (array([0.5934457 , 0.59348359, 0.59351597, ..., 0.68195727, 0.68196254,
       0.68212717]), array([1, 1, 1, ..., 1, 1, 1]))


In [15]:
feat_to_score = {
    feature: score 
    for feature, score 
    in zip(
        weather_X_train.columns, 
        weather_model.feature_importances_,
    )
}

feat_to_score = dict(
    sorted(
        feat_to_score.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)
feat_to_score

{'interaction_distance_to_event': 95.93204998677713,
 'event_indoor_capability': 0.4346953089621326,
 'title': 0.4240075671567045,
 'user_temperature': 0.40476148658368594,
 'precipitation': 0.36677061158728846,
 'weather_condition': 0.3435573834861786,
 'age': 0.33834087598832935,
 'user_weather_condition': 0.30151470123997165,
 'attendance_rate': 0.30036077578444514,
 'event_city': 0.2893172652883507,
 'temperature': 0.25110583733846803,
 'user_interests': 0.22352187552828293,
 'user_city': 0.14621329280679,
 'indoor_outdoor_preference': 0.14097193571307504,
 'event_type': 0.0540516872020414,
 'user_precipitation': 0.0487594085571333}

In [35]:
weather_X_train.columns

Index(['interaction_distance_to_event', 'event_type', 'event_city',
       'weather_condition', 'temperature', 'precipitation', 'attendance_rate',
       'event_indoor_capability', 'user_city', 'indoor_outdoor_preference',
       'age', 'user_interests', 'user_weather_condition', 'user_temperature',
       'user_precipitation'],
      dtype='object')

In [13]:

# Use this function to train on your weather / no-weather datasets
no_weather_model, no_weather_metrics, no_weather_val_pool = train_catboost(
    train_df=no_weather_X_train,
    val_df=no_weather_X_val,
    train_y=no_weather_y_train,
    val_y=no_weather_y_val
)

joblib.dump(no_weather_model, '/home/nkama/masters_thesis_project/thesis/models/no_weather_ranking_model.pkl')
print("\nModels saved successfully!")


Classification Report:



              precision    recall  f1-score   support

           0       0.00      0.00      0.00     14312
           1       0.71      1.00      0.83     35428

    accuracy                           0.71     49740
   macro avg       0.36      0.50      0.42     49740
weighted avg       0.51      0.71      0.59     49740

Predicted Class Distribution: (array([0.59322731, 0.59340502, 0.59341716, ..., 0.68802799, 0.68803421,
       0.68805272]), array([2, 1, 1, ..., 1, 1, 1]))


In [16]:

feat_to_score = {
    feature: score 
    for feature, score 
    in zip(
        no_weather_X_train.columns, 
        no_weather_model.feature_importances_,
    )
}

feat_to_score = dict(
    sorted(
        feat_to_score.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)
feat_to_score

{'interaction_distance_to_event': 98.56858103820176,
 'title': 0.32138396014526593,
 'event_city': 0.28768716105271774,
 'event_type': 0.2739099062313311,
 'event_indoor_capability': 0.1959338405594333,
 'attendance_rate': 0.12532672542750817,
 'age': 0.12158177157819017,
 'user_city': 0.10559559680378885,
 'user_interests': 0.0}

## Save models to Hopsworks


In [28]:
# Connect to Hopsworks Model Registry
mr = project.get_model_registry()

In [29]:

from hsml.schema import Schema
from hsml.model_schema import ModelSchema

# Create model schema for weather ranking model
input_example = weather_X_train.sample().to_dict("records")
input_schema = Schema(weather_X_train)
output_schema = Schema(weather_y_train)
model_schema = ModelSchema(input_schema, output_schema)

weather_ranking_model = mr.python.create_model(
    name="weather_ranking_model", 
    metrics=weather_metrics,
    model_schema=model_schema,
    input_example=input_example,
    description="Ranking model that scores item candidates",
)
weather_ranking_model.save("/home/nkama/masters_thesis_project/thesis/models/weather_ranking_model.pkl")

  0%|          | 0/6 [00:00<?, ?it/s]

Uploading /home/nkama/masters_thesis_project/thesis/models/weather_ranking_model.pkl: 0.000%|          | 0/880…

Uploading /home/nkama/masters_thesis_project/thesis/notebooks/input_example.json: 0.000%|          | 0/447 ela…

Uploading /home/nkama/masters_thesis_project/thesis/notebooks/model_schema.json: 0.000%|          | 0/1283 ela…

Model created, explore it at https://c.app.hopsworks.ai:443/p/1220788/models/weather_ranking_model/1


Model(name: 'weather_ranking_model', version: 1)

In [30]:
# Create model schema for no weather ranking model  
input_example = no_weather_X_train.sample().to_dict("records")
input_schema = Schema(no_weather_X_train)
output_schema = Schema(no_weather_y_train)
model_schema = ModelSchema(input_schema, output_schema)

no_weather_ranking_model = mr.python.create_model(
    name="no_weather_ranking_model", 
    metrics=no_weather_metrics,
    model_schema=model_schema,
    input_example=input_example,
    description="Ranking model that scores item candidates",
)
no_weather_ranking_model.save("/home/nkama/masters_thesis_project/thesis/models/no_weather_ranking_model.pkl")

  0%|          | 0/6 [00:00<?, ?it/s]

Uploading /home/nkama/masters_thesis_project/thesis/models/no_weather_ranking_model.pkl: 0.000%|          | 0/…

Uploading /home/nkama/masters_thesis_project/thesis/notebooks/input_example.json: 0.000%|          | 0/264 ela…

Uploading /home/nkama/masters_thesis_project/thesis/notebooks/model_schema.json: 0.000%|          | 0/799 elap…

Model created, explore it at https://c.app.hopsworks.ai:443/p/1220788/models/no_weather_ranking_model/1


Model(name: 'no_weather_ranking_model', version: 1)