# <span style="color:#ff5f27"> 👨🏻‍🏫 Custom Transformation Functions Registration</span>

In this notebook you will see how **to write custom transformation functions for feature view** and how to **register and retrieve Keras model** using Hopsworks Model Registry, **and then use it for batch and serving feature vector prediction**.

## <span style="color:#ff5f27">🗄️ Table of Contents</span>
- [📝 Imports](#1)
- [💽 Loading Data](#2)
- [🔮 Connecting to Hopsworks Feature Store](#3)
- [🪄 Creating Feature Groups](#4)
- [👩🏻‍🔬 Custom Transformation Functions](#12)
- [✍🏻 Registering Custom Transformation Functions in Hopsworks](#5)
- [🖍 Feature View Creation](#6)
- [🧬 Modeling](#7)
- [💾 Saving the Model in Model Registry](#8)
- [📮 Retrieving the Model from Model Registry](#9)
- [👨🏻‍⚖️ Batch Prediction](#10)
- [👨🏻‍⚖️ Serving Feature Vector Prediction](#11)

<a name='1'></a>
## <span style='color:#ff5f27'> 📝 Imports </span>

In [None]:
import pandas as pd
import numpy as np
import os
import joblib

import xgboost as xgb
from sklearn.metrics import accuracy_score

<a name='2'></a>
## <span style="color:#ff5f27;"> 💽 Loading Data </span>

In [None]:
# Load the data
df_original = pd.read_csv("https://repo.hops.works/dev/davit/air_quality/backfill_pm2_5_eu.csv")
# Generate a binary target column
df_original['target'] = np.random.choice([0, 1], size=len(df_original))
df_original.head(3)

<a name='3'></a>
## <span style="color:#ff5f27;"> 🔮 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

<a name='4'></a>
## <span style="color:#ff5f27;">🪄 Creating Feature Groups</span>

In [None]:
feature_group = fs.get_or_create_feature_group(
    name='feature_group_online',
    description='Online Feature Group',
    version=1,
    primary_key=['city_name', 'date'],
    online_enabled=True,
)    
feature_group.insert(df_original)

<a name='12'></a>
## <span style="color:#ff5f27;">👩🏻‍🔬 Custom Transformation Functions</span>

In the `transformations.py` file you can find the custom `encode_city_name` and `scale_pm2_5` transformation functions.

Let's import them and see how they work.

In [None]:
from transformations import encode_city_name, scale_pm2_5

In [None]:
city_name = 'Madrid'
encoded_city_name = encode_city_name(city_name)
print("⛳️ Encoded City Name:", encoded_city_name)  # Output: Encoded City Name: 0

In [None]:
pm2_5_value = 13.0
scaled_pm2_5 = scale_pm2_5(pm2_5_value)
print("⛳️ Scaled PM2.5 Value:", scaled_pm2_5)  # Output: Scaled PM2.5 Value: 0.0

<a name='5'></a>
## <span style="color:#ff5f27;"> ✍🏻 Registering Custom Transformation Functions in Hopsworks</span>

The next step is to **register custom transformation functions** in Hopsworks Feature Store. 

In [None]:
# Check existing transformation functions
fns = [fn.name for fn in fs.get_transformation_functions()]
fns

In [None]:
# Register encode_city_name in Hopsworks
if "encode_city_name" not in fns:
    encoder = fs.create_transformation_function(
        encode_city_name, 
        output_type=int,
        version=1,
    )
    encoder.save()
    
# Register scale_pm2_5 in Hopsworks
if "scale_pm2_5" not in fns:
    scaler = fs.create_transformation_function(
        scale_pm2_5, 
        output_type=float,
        version=1,
    )
    scaler.save()

In [None]:
# Check it your transformation functions are present in the feature store
fns = [fn.name for fn in fs.get_transformation_functions()]
fns

<a name='6'></a>
## <span style="color:#ff5f27;"> 🖍 Feature View Creation</span>

In [None]:
# Retrieve encode_city_name transformation function
encoder = fs.get_transformation_function(
    name="encode_city_name",
    version=1
)

# Retrieve scale_pm2_5 transformation function
scaler = fs.get_transformation_function(
    name="scale_pm2_5",
    version=1
)

In [None]:
# Build a Query object
query = feature_group.select_except(['date'])

# Get or create a feature view
feature_view = fs.get_or_create_feature_view(
    name='serving_fv',
    version=1,
    query=query,
    # Apply your custom transformation functions to necessary columns
    transformation_functions={
        "city_name": encoder,
        "pm2_5": scaler,
    },
    labels=['target'],
)

## <span style="color:#ff5f27;"> 🏋️ Training Dataset Creation</span>


In [None]:
# Create a train-test split dataset
td_version, job = feature_view.create_train_test_split(
    test_size=0.1,
    description='Description of the dataset',
    data_format='csv'
)

### <span style="color:#ff5f27;">🪝 Training Dataset Retrieval</span>

In [None]:
# Retrieve the train-test split
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(
    training_dataset_version=td_version
)

In [None]:
X_train.head(3)

In [None]:
y_train.head(3)

<a name='7'></a>
## <span style="color:#ff5f27;">🧬 Modeling</span>
The next step is to fit XGBClassifier.

In [None]:
# Initialize XGBClassifier
xgb_classifier = xgb.XGBClassifier()

# Fit the classifier
xgb_classifier.fit(X_train, y_train)

# Evaluate the model
y_pred = xgb_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("👮🏻‍♂️ Accuracy:", accuracy)

## <span style="color:#ff5f27;">🗄 Model Registry</span>

In [None]:
mr = project.get_model_registry()

### <span style="color:#ff5f27;">⚙️ Model Schema</span>


In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(X_train.values)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

<a name='8'></a>
### <span style="color:#ff5f27;">💾 Saving the Model</span>

In [None]:
model_dir = "xgb_model"

if os.path.isdir(model_dir) == False:
    os.mkdir(model_dir)

# Save the model
joblib.dump(xgb_classifier, model_dir + '/xgb_classifier.pkl')

In [None]:
# Create a model in the model registry
model = mr.python.create_model(
    name="xgb_model",
    metrics={"Accuracy": accuracy}, 
    description="XGB model",
    input_example=X_train.sample(),
    model_schema=model_schema
)

model.save(model_dir)

<a name='9'></a>
## <span style="color:#ff5f27;"> 📮 Retrieving the Model from Model Registry </span>

In [None]:
# Retrieve your model from the model registry
retrieved_model = mr.get_model(
    name="xgb_model",
    version=1
)
saved_model_dir = retrieved_model.download()

In [None]:
# Retrieve the XGB model
retrieved_xgboost_model = joblib.load(saved_model_dir + "/xgb_classifier.pkl")
retrieved_xgboost_model

<a name='10'></a>
## <span style="color:#ff5f27;"> 👨🏻‍⚖️ Batch Prediction </span>

In [None]:
# Initialise feature view to retrieve batch data
feature_view.init_batch_scoring(training_dataset_version=td_version)

# Retrieve batch data
batch_data = feature_view.get_batch_data()
batch_data.head(3)

In [None]:
# Predict batch data using retrieved model
predictions_batch = retrieved_xgboost_model.predict(batch_data)
predictions_batch[:10]

<a name='11'></a>
## <span style="color:#ff5f27;"> 👨🏻‍⚖️ Serving Feature Vector Prediction</span>

Feature Vectors are retrieved from feature store as a Python list. So to make it suitable for model prediction let's write a `to_df` function will transform a feature vector(s) list into a pandas DataFrame.

In [None]:
def to_df(feature_vector):
    """
    Convert a feature vector or a list of feature vectors into a pandas DataFrame.

    Parameters:
        feature_vector (a list, or list of lists): 
            A feature vector or a list of feature vectors. A feature vector is 
            represented as a list containing two elements: the first 
            element corresponds to the city name (categorical feature), and the 
            second element corresponds to the PM2.5 value (numerical feature).

    Returns:
        pandas.DataFrame: A DataFrame representing the feature vector(s). 
        The DataFrame will have two columns: 'city_name' for the city names 
        and 'pm2_5' for the corresponding PM2.5 values.

    Example:
        >>> feature_vector = ['New York', 15.3]
        >>> to_df(feature_vector)
           city_name  pm2_5
        0  New York   15.3

        >>> multiple_vectors = [['New York', 15.3], ['Los Angeles', 10.7]]
        >>> to_df(multiple_vectors)
          city_name  pm2_5
        0  New York   15.3
        1  Los Angeles 10.7
    """
    if isinstance(feature_vector[0], list): 
        city_names = [vector[0] for vector in feature_vector]
        pm2_5_values = [vector[1] for vector in feature_vector]
        data = pd.DataFrame(
            {
                'city_name': city_names,
                'pm2_5': pm2_5_values,
            }
        )
        return data

    data = pd.DataFrame(
            {
                'city_name': [feature_vector[0]],
                'pm2_5': [feature_vector[1]],
            }
        )
    return data

In [None]:
# Initialise feature view to retrieve feature vector
feature_view.init_serving(1)

# Retrieve a feature vector
feature_vector = feature_view.get_feature_vector(
    entry = {
        "city_name": 'Amsterdam',
        "date": '2013-01-01',
    }
)
feature_vector

In [None]:
# Transform feature vector to pandas dataframe
feature_vector_df = to_df(feature_vector)
feature_vector_df

In [None]:
# Predict feature vector dataframe using retrieved model
prediction_feature_vector = retrieved_xgboost_model.predict(feature_vector_df)
prediction_feature_vector

In [None]:
# Retrieve feature vectors from feature store
feature_vectors = feature_view.get_feature_vectors(
    entry = [
        {"city_name": 'Amsterdam', "date": '2013-01-01'},
        {"city_name": 'Amsterdam', "date": '2014-01-01'},
    ]
)
feature_vectors

In [None]:
# Convert feature vectors to pandas dataframe
feature_vectors_df = to_df(feature_vectors)
feature_vectors_df

In [None]:
# Predict dataframe of feature vectors using retrieved model
prediction_feature_vectors = retrieved_xgboost_model.predict(feature_vectors_df)
prediction_feature_vectors

---