# Product Recommendation with Feathr

This notebook illustrates the use of Feathr Feature Store to create a model that predict users' rating for different products for a e-commerce website.

### Model Problem Statement
The e-commerce website has collected past user ratings for various products. The website also collected data about user and product, like user age, product category etc. Now we want to predict users' product rating for new product so that we can recommend the new product to users that give a high rating for those products.

### Feature Creation Illustration
In this example, our observation data has compound entity key where a record is uniquely identified by `user_id` and `product_id`. With that, we can think about three types of features:
1. **User features** that are different for different users but are the same for different products. For example, user age is different for different users but it's product-agnostic.
2. **Product features** that are different for different products but are the same for all the users.
3. **User-to-product** features that are different for different users AND different products. For example, a feature to represent if the user has bought this product before or not.

In this example, we will focus on the first two types of features. After we train a model based on those features, we predict the product ratings that users will give for the products.

The feature creation flow is as below:
![Feature Flow](https://github.com/feathr-ai/feathr/blob/main/docs/images/product_recommendation_advanced.jpg?raw=true)

## Set SparkSession and Feathr client

#### Imports

In [1]:
import glob
import os
from pathlib import Path

import feathr
import pandas as pd
from feathr import (
    BOOLEAN,
    FLOAT,
    INPUT_CONTEXT,
    INT32,
    BackfillTime,
    DerivedFeature,
    FeathrClient,
    Feature,
    FeatureAnchor,
    FeatureQuery,
    HdfsSource,
    MaterializationSettings,
    ObservationSettings,
    RedisSink,
    TypedKey,
    ValueType,
    WindowAggTransformation,
)
from feathr.datasets.utils import maybe_download
from feathr.utils.config import generate_config
from feathr.utils.job_utils import get_result_df
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import col, rand
from pyspark.sql.types import DoubleType, IntegerType

# PATH_TO_APP_DATA = "hdfs://namenode:9000/data"
PATH_TO_APP_DATA = "s3a://data-bucket"
PROJECT_NAME = "product_recommendation"

print(f"Feathr version: {feathr.__version__}")

Feathr version: 1.0.0


#### SparkSession

In [None]:
spark = (
    SparkSession.builder.appName("write-synthetic-parquet-to-hdfs")  # type: ignore[attr-defined]
    .master("local[*]")
    .config("spark.hadoop.fs.defaultFS", "hdfs://namenode:9000")
    .config("spark.hadoop.fs.s3a.endpoint", "http://localhost:9007")
    .config("spark.hadoop.fs.s3a.access.key", "minioadmin")
    .config("spark.hadoop.fs.s3a.secret.key", "minioadmin")
    .config("spark.hadoop.fs.s3a.path.style.access", "true")
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.406")
    .getOrCreate()
)

    # .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.406")

    #     .config(
    #     "spark.jars",
    #     "/home/nazarov.aleksey64/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.3.4.jar,"
    #     "/home/nazarov.aleksey64/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.12.406.jar"
    # )

26/01/26 12:28:08 WARN Utils: Your hostname, PF5L73QZ resolves to a loopback address: 127.0.1.1; using 192.168.123.113 instead (on interface wlp9s0f0)
26/01/26 12:28:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
26/01/26 12:30:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


#### Feathr client

In [3]:
os.environ['SPARK_LOCAL_IP'] = "127.0.0.1"
os.environ['REDIS_PASSWORD'] = ""

jar_name = glob.glob("./*.jar")[0]
print(f"Found jar file at {jar_name}")

feathr_workspace_folder = Path("./feathr_config.yaml")

client = FeathrClient(str(feathr_workspace_folder))

2026-01-26 17:30:17.170 | INFO     | feathr.utils._env_config_reader:get:60 - Config secrets__azure_key_vault__name is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:17.174 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__s3__s3_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:17.176 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__adls__adls_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:17.177 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__wasb__wasb_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:17

Found jar file at ./feathr_2.12-1.0.0.jar


## 1. SparkSession and Feathr Client

### Upload quick start data to hdfs

In [4]:
quick_start_data_list = !ls feathr_data/
quick_start_hdfs_data_dict = {}

for i in quick_start_data_list:
    df_name = i.split(".")[0]
    hdfs_path = f"{PATH_TO_APP_DATA}/{df_name}"

    quick_start_hdfs_data_dict[df_name] = hdfs_path

    df = spark.createDataFrame(pd.read_csv(f"feathr_data/{i}"))
    df.repartition(1).write.mode("overwrite").parquet(hdfs_path)
    
    last_path = hdfs_path.split("/")[-1]

    if "observation" in last_path:
        print(f"====== {last_path} ======")
        spark.read.parquet(f"{hdfs_path}").show(5)

26/01/26 12:30:19 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
                                                                                



26/01/26 12:30:27 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


+-------+----------+---------------+--------------+
|user_id|product_id|event_timestamp|product_rating|
+-------+----------+---------------+--------------+
|     u1|        p1|     2023-01-01|             5|
|     u1|        p2|     2023-01-02|             4|
|     u2|        p1|     2023-01-03|             3|
|     u2|        p3|     2023-01-04|             5|
|     u3|        p2|     2023-01-05|             2|
+-------+----------+---------------+--------------+



### Prepare Datasets

In [5]:
user_observation_source_path = quick_start_hdfs_data_dict["user_observation"]
user_profile_source_path = quick_start_hdfs_data_dict["user_profile"]
user_purchase_history_source_path = quick_start_hdfs_data_dict["user_purchase_history"]
product_detail_source_path = quick_start_hdfs_data_dict["product_detail"]

### Initialize Feathr Client

In [6]:
client = FeathrClient(str(feathr_workspace_folder))

2026-01-26 17:30:30.568 | INFO     | feathr.utils._env_config_reader:get:60 - Config secrets__azure_key_vault__name is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:30.571 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__s3__s3_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:30.571 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__adls__adls_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:30.572 | INFO     | feathr.utils._env_config_reader:get:60 - Config offline_store__wasb__wasb_enabled is not found in the environment variable, configuration file, or the remote key value store. Returning the default value: None.
2026-01-26 17:30:30

## 4. Define Sharable Features using Feathr API

### Understand raw datasets
We have three datasets to work with:
* Observation dataset (a.k.a. labeled dataset)
* User profile
* User purchase history
* Product details

In [7]:
user_observation_file_path = "feathr_data/user_observation.csv"
user_profile_file_path = "feathr_data/user_profile.csv"
user_purchase_history_file_path = "feathr_data/user_purchase_history.csv"
product_detail_file_path = "feathr_data/product_detail.csv"

In [8]:
# Observation dataset
# Observation dataset usually comes with a event_timestamp to denote when the observation happened.
# The label here is product_rating. Our model objective is to predict a user's rating for this product.
pd.read_csv(user_observation_file_path)

Unnamed: 0,user_id,product_id,event_timestamp,product_rating
0,u1,p1,2023-01-01,5
1,u1,p2,2023-01-02,4
2,u2,p1,2023-01-03,3
3,u2,p3,2023-01-04,5
4,u3,p2,2023-01-05,2


In [9]:
# User profile dataset
# Used to generate user features
pd.read_csv(user_profile_file_path).head()

Unnamed: 0,user_id,age,gender,tax_rate,gift_card_balance,number_of_credit_cards
0,u1,34,M,10,100.0,2
1,u2,28,F,8,25.0,1
2,u3,45,M,12,0.0,0


In [10]:
# User purchase history dataset.
# Used to generate user features. This is activity type data, so we need to use aggregation to generate features.
pd.read_csv(user_purchase_history_file_path).head()

Unnamed: 0,user_id,product_id,purchase_date,purchase_amount
0,u1,p1,2022-12-15,120.5
1,u1,p2,2022-12-20,80.0
2,u1,p3,2022-12-25,60.0
3,u2,p1,2022-12-18,40.0
4,u2,p3,2022-12-28,90.0


In [11]:
# Product detail dataset.
# Used to generate product features.
pd.read_csv(product_detail_file_path).head()

Unnamed: 0,product_id,category,brand,price,quantity,release_year
0,p1,electronics,brandA,299.99,10,2021
1,p2,books,brandB,19.99,50,2019
2,p3,clothing,brandC,49.99,30,2022


### What's a feature in Feathr
A feature is an individual measurable property or characteristic of a phenomenon which is sometimes time-sensitive.

In Feathr, a feature is defined by the following characteristics:
* The typed key (a.k.a. entity id): identifies the subject of feature, e.g. a user id of 123, a product id of SKU234456.
* The feature name: the unique identifier of the feature, e.g. user_age, total_spending_in_30_days.
* The feature value: the actual value of that aspect at a particular time, e.g. the feature value of the person's age is 30 at year 2022.
* The timestamp: this indicates when the event happened. For example, the user purchased certain product on a certain timestamp. This is usually used for point-in-time join.

You can feel that this is defined from a feature consumer (a person who wants to use a feature) perspective. It only tells us what a feature is like. In later sections, you can see how a feature consumer can access the features in a very simple way.

To define how to produce the feature, we need to specify:
* Feature source: what source data that this feature is based on
* Transformation: what transformation is used to transform the source data into feature. Transformation can be optional when you just want to take a column out from the source data.

(For more details on feature definition, please refer to the [Feathr Feature Definition Guide](https://feathr-ai.github.io/feathr/concepts/feature-definition.html).)

Note: in some cases, such as features defined on top of request data, may have no entity key or timestamp.
It is merely a function/transformation executing against request data at runtime.
For example, the day of week of the request, which is calculated by converting the request UNIX timestamp.
(We won't cover this in the tutorial.)

### Define Sources Section with UDFs

A feature is called an anchored feature when the feature is directly extracted from the source data, rather than computed on top of other features. The latter case is called derived feature.

A [feature source](https://feathr.readthedocs.io/en/latest/#feathr.Source) is needed for anchored features that describes the raw data in which the feature values are computed from. See the python documentation to get the details on each input column.

See [the python API documentation](https://feathr.readthedocs.io/en/latest/#feathr.HdfsSource) to get the details of each input fields.

In [12]:
def feathr_udf_preprocessing(df: DataFrame) -> DataFrame:
    from pyspark.sql.functions import col

    return df.withColumn("tax_rate_decimal", col("tax_rate") / 100)

batch_source = HdfsSource(
    name="userProfileData",
    path=user_profile_source_path,
    preprocessing=feathr_udf_preprocessing,
)

In [13]:
# Let's define some features for users so our recommendation can be customized for users.
user_id = TypedKey(
    key_column="user_id",
    key_column_type=ValueType.INT32,
    description="user id",
    full_name="product_recommendation.user_id",
)

feature_user_age = Feature(
    name="feature_user_age",
    key=user_id,
    feature_type=INT32,
    transform="age",
)
feature_user_tax_rate = Feature(
    name="feature_user_tax_rate",
    key=user_id,
    feature_type=FLOAT,
    transform="tax_rate_decimal",
)
feature_user_gift_card_balance = Feature(
    name="feature_user_gift_card_balance",
    key=user_id,
    feature_type=FLOAT,
    transform="gift_card_balance",
)
feature_user_has_valid_credit_card = Feature(
    name="feature_user_has_valid_credit_card",
    key=user_id,
    feature_type=BOOLEAN,
    transform="number_of_credit_cards > 0",
)

features = [
    feature_user_age,
    feature_user_tax_rate,
    feature_user_gift_card_balance,
    feature_user_has_valid_credit_card,
]

user_feature_anchor = FeatureAnchor(
    name="anchored_features", source=batch_source, features=features
)

In [14]:
# Let's define some features for the products so our recommendation can be customized for products.
product_batch_source = HdfsSource(
    name="productProfileData",
    path=product_detail_source_path,
)

product_id = TypedKey(
    key_column="product_id",
    key_column_type=ValueType.INT32,
    description="product id",
    full_name="product_recommendation.product_id",
)

feature_product_quantity = Feature(
    name="feature_product_quantity",
    key=product_id,
    feature_type=FLOAT,
    transform="quantity",
)
feature_product_price = Feature(
    name="feature_product_price", key=product_id, feature_type=FLOAT, transform="price"
)

product_features = [feature_product_quantity, feature_product_price]

product_feature_anchor = FeatureAnchor(
    name="product_anchored_features",
    source=product_batch_source,
    features=product_features,
)

### Define window aggregation features

[Window aggregation](https://en.wikipedia.org/wiki/Window_function_%28SQL%29) helps us to create more powerful features by compressing large amount of information. For example, we can compute *average purchase amount over the last 90 days* from the purchase history to capture user's recent consumption trend.

To create window aggregation features, we define `WindowAggTransformation` with following arguments:
1. `agg_expr`: the field/column you want to aggregate. It can be an ANSI SQL expression, e.g. `cast_float(purchase_amount)` to cast `str` type values to `float`.
2. `agg_func`: the aggregation function, e.g. `AVG`. See below table for the full list of supported functions.
3. `window`: the aggregation window size, e.g. `90d` to aggregate over the 90 days.

| Aggregation Type | Input Type | Description |
| --- | --- | --- |
| `SUM`, `COUNT`, `MAX`, `MIN`, `AVG` | Numeric | Applies the the numerical operation on the numeric inputs. |
| `MAX_POOLING`, `MIN_POOLING`, `AVG_POOLING`	| Numeric Vector | Applies the max/min/avg operation on a per entry basis for a given a collection of numbers. |
| `LATEST` | Any | Returns the latest not-null values from within the defined time window. |

After you have defined features and sources, bring them together to build an anchor:

> Note that if the features comes directly from the observation data, the `source` argument should be `INPUT_CONTEXT` to indicate the source of the anchor is the observation data.

In [15]:
purchase_history_data = HdfsSource(
    name="purchase_history_data",
    path=user_purchase_history_source_path,
    event_timestamp_column="purchase_date",
    timestamp_format="yyyy-MM-dd",
)

agg_features = [
    Feature(
        name="feature_user_avg_purchase_for_90days",
        key=user_id,
        feature_type=FLOAT,
        transform=WindowAggTransformation(
            agg_expr="cast_float(purchase_amount)", agg_func="AVG", window="90d"
        ),
    )
]

user_agg_feature_anchor = FeatureAnchor(
    name="aggregationFeatures", source=purchase_history_data, features=agg_features
)

### Derived Features Section
Derived features are the features that are computed from other features. They could be computed from anchored features or other derived features.

In [16]:
derived_features = [
    DerivedFeature(
        name="feature_user_purchasing_power",
        key=user_id,
        feature_type=FLOAT,
        input_features=[feature_user_gift_card_balance, feature_user_has_valid_credit_card],
        transform="feature_user_gift_card_balance + if(boolean(feature_user_has_valid_credit_card), 100, 0)",
    )
]

### Build features

Lastly, we need to build those features so that it can be consumed later. Note that we have to build both the "anchor" and the "derived" features which is not anchored to a source.

In [17]:
client.build_features(
    anchor_list=[user_agg_feature_anchor, user_feature_anchor, product_feature_anchor],
    derived_feature_list=derived_features,
    verbose=True,
)

"aggregationFeatures is the achor of ['feature_user_avg_purchase_for_90days']"
("anchored_features is the achor of ['feature_user_age', "
 "'feature_user_tax_rate', 'feature_user_gift_card_balance', "
 "'feature_user_has_valid_credit_card']")
("product_anchored_features is the achor of ['feature_product_quantity', "
 "'feature_product_price']")


## 5. Create Training Data using Point-in-Time Correct Feature join

To create a training dataset using Feathr, we need to provide a **feature join settings** to specify what features and how these features should be joined to the observation data.

Also note that since a `FeatureQuery` accepts features of the same join key, we define two query objects, one for `user_id` key and the other one for `product_id` and pass them together to compute offline features.

To learn more on this topic, please refer to [Point-in-time Correctness document](https://feathr-ai.github.io/feathr/concepts/point-in-time-join.html).

In [18]:
user_feature_query = FeatureQuery(
    feature_list=[feat.name for feat in features + agg_features + derived_features],
    key=user_id,
)

product_feature_query = FeatureQuery(
    feature_list=[feat.name for feat in product_features],
    key=product_id,
)

settings = ObservationSettings(
    observation_path=user_observation_source_path,
    event_timestamp_column="event_timestamp",
    timestamp_format="yyyy-MM-dd",
)
client.get_offline_features(
    observation_settings=settings,
    feature_query=[user_feature_query, product_feature_query],
    output_path=user_profile_source_path.rpartition("/")[0] + f"/product_recommendation_features.parquet",
    verbose=True,
)

Feathr is unable to read the Observation data from s3a://data-bucket/user_observation due to permission issue or invalid path. Please either grant the permission or supply the observation column names in the filed: observation_column_names.
2026-01-26 17:30:30.879 | INFO     | feathr.spark_provider._localspark_submission:_get_debug_file_name:283 - Spark log path is debug/product_recommendation_feathr_feature_join_job20260126173030
2026-01-26 17:30:30.880 | INFO     | feathr.spark_provider._localspark_submission:_init_args:258 - Spark job: product_recommendation_feathr_feature_join_job is running on local spark with master: local[*].
2026-01-26 17:30:30.884 | INFO     | feathr.spark_provider._localspark_submission:submit_feathr_job:136 - Detail job stdout and stderr are in debug/product_recommendation_feathr_feature_join_job20260126173030/log.
2026-01-26 17:30:30.885 | INFO     | feathr.spark_provider._localspark_submission:submit_feathr_job:146 - Local Spark job submit with pid: 168457

Features in feature_query: ['feature_product_quantity', 'feature_product_price']


<Popen: returncode: None args: ['spark-submit', '--master', 'local[*]', '--n...>

Register not materialized features  

Тут бросает на авторизацию в Azure

In [19]:
try:
    client.register_features()
except Exception as e:
    print(e)
print(client.list_registered_features(project_name=PROJECT_NAME))

https://repository.mulesoft.org/nexus/content/repositories/public/ added as a remote repository with the name: repo-1
https://linkedin.jfrog.io/artifactory/open-source/ added as a remote repository with the name: repo-2
Ivy Default Cache set to: /home/nazarov.aleksey64/.ivy2/cache
The jars for the packages stored in: /home/nazarov.aleksey64/.ivy2/jars
org.apache.spark#spark-avro_2.12 added as a dependency
com.microsoft.sqlserver#mssql-jdbc added as a dependency
com.microsoft.azure#spark-mssql-connector_2.12 added as a dependency
org.apache.logging.log4j#log4j-core added as a dependency
com.typesafe#config added as a dependency
com.fasterxml.jackson.core#jackson-databind added as a dependency
org.apache.hadoop#hadoop-mapreduce-client-core added as a dependency
org.apache.hadoop#hadoop-common added as a dependency
org.apache.hadoop#hadoop-azure added as a dependency
org.apache.avro#avro added as a dependency
org.apache.xbean#xbean-asm6-shaded added as a dependency
org.apache.spark#spark-

KeyboardInterrupt: 

Materialize features

In [None]:
client.wait_job_to_finish(timeout_sec=100)

Let's use the helper function `get_result_df` to download the result and view it:

In [None]:
res_df = get_result_df(client)
res_df.head()

2026-01-26 07:13:31.423 | INFO     | feathr.spark_provider._localspark_submission:wait_for_completion:156 - 1 local spark job(s) in this Launcher, only the latest will be monitored.
2026-01-26 07:13:31.426 | INFO     | feathr.spark_provider._localspark_submission:wait_for_completion:157 - Please check auto generated spark command in debug/product_recommendation_feathr_feature_join_job20260126071211/command.sh and detail logs in debug/product_recommendation_feathr_feature_join_job20260126071211/log.
2026-01-26 07:13:31.428 | INFO     | feathr.spark_provider._localspark_submission:wait_for_completion:219 - Spark job with pid 237 finished in: 0 seconds.


Unnamed: 0,user_id,product_id,event_timestamp,product_rating,feature_user_avg_purchase_for_90days,feature_product_price,feature_product_quantity,feature_user_gift_card_balance,feature_user_has_valid_credit_card,feature_user_tax_rate,feature_user_age,feature_user_purchasing_power
0,u3,p2,2023-01-05,2,30.0,19.99,50.0,0.0,False,0.12,45,0.0
1,u2,p1,2023-01-03,3,65.0,299.98999,10.0,25.0,True,0.08,28,125.0
2,u2,p3,2023-01-04,5,65.0,49.990002,30.0,25.0,True,0.08,28,125.0
3,u1,p1,2023-01-01,5,86.833336,299.98999,10.0,100.0,True,0.1,34,200.0
4,u1,p2,2023-01-02,4,86.833336,19.99,50.0,100.0,True,0.1,34,200.0


In [None]:
from interpret import show
from interpret.glassbox import ExplainableBoostingRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Fill None values with 0
final_df = (
    res_df
    .drop(["event_timestamp"], axis=1, errors="ignore")
    .fillna(0)
)

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
    final_df.drop(["product_rating"], axis=1),
    final_df["product_rating"].astype("float64"),
    test_size=0.2,
    random_state=42,
)

ebm = ExplainableBoostingRegressor()
ebm.fit(X_train, y_train)

# show(ebm_global) # Will run on 127.0.0.1/localhost at port 7080
# Note, currently InterpretML's visualization dashboard doesn't work w/ VSCODE notebook viewer
# https://github.com/interpretml/interpret/issues/317
ebm_global = ebm.explain_global()
show(ebm_global)

In [None]:
# Predict and evaluate
y_pred = ebm.predict(X_test)
rmse = sqrt(mean_squared_error(y_test.values.flatten(), y_pred))

print(f"Root mean squared error: {rmse}")

## 6. Feature Materialization

While Feathr can compute the feature value from the feature definition on-the-fly at request time, it can also pre-compute
and materialize the feature value to offline and/or online storage.

We can push the generated features to the online store like below:

In [None]:
# Materialize user features
# Note, you can only materialize features of same entity key into one table.
redisSink = RedisSink(table_name="user_features")
settings = MaterializationSettings(
    name="user_feature_setting",
    sinks=[redisSink],
    feature_names=["feature_user_age", "feature_user_gift_card_balance"],
)

client.materialize_features(settings=settings, allow_materialize_non_agg_feature=True)
client.wait_job_to_finish(timeout_sec=100)

We can then get the features from the online store (Redis):

In [None]:
client.get_online_features(
    "user_features", "2", ["feature_user_age", "feature_user_gift_card_balance"]
)

In [None]:
client.multi_get_online_features(
    "user_features", ["1", "2"], ["feature_user_age", "feature_user_gift_card_balance"]
)

## 7. Feature Registration
Lastly, we can also register the features and share them across teams:

In [None]:
try:
    client.register_features()
except Exception as e:
    print(e)
print(client.list_registered_features(project_name=PROJECT_NAME))

26/01/26 12:06:36 INFO Executor: Adding file:/tmp/spark-40e77f32-bd84-4863-987f-51c7b697a2fd/userFiles-68f52d7e-040a-4e86-9403-ab539aef0e85/javax.inject_javax.inject-1.jar to class loader default
26/01/26 12:06:36 INFO Executor: Fetching spark://localhost:45069/jars/org.apache.hadoop_hadoop-yarn-common-2.7.7.jar with timestamp 1769418391825
26/01/26 12:06:36 INFO Utils: Fetching spark://localhost:45069/jars/org.apache.hadoop_hadoop-yarn-common-2.7.7.jar to /tmp/spark-40e77f32-bd84-4863-987f-51c7b697a2fd/userFiles-68f52d7e-040a-4e86-9403-ab539aef0e85/fetchFileTemp10734503595575196352.tmp
26/01/26 12:06:36 INFO Utils: /tmp/spark-40e77f32-bd84-4863-987f-51c7b697a2fd/userFiles-68f52d7e-040a-4e86-9403-ab539aef0e85/fetchFileTemp10734503595575196352.tmp has been previously copied to /tmp/spark-40e77f32-bd84-4863-987f-51c7b697a2fd/userFiles-68f52d7e-040a-4e86-9403-ab539aef0e85/org.apache.hadoop_hadoop-yarn-common-2.7.7.jar
26/01/26 12:06:36 INFO Executor: Adding file:/tmp/spark-40e77f32-bd84-4

DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	AzureCliCredential: Azure CLI not found on path
	AzurePowerShellCredential: PowerShell is not installed
	AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
	InteractiveBrowserCredential: Authentication failed: User did not complete the flow in time
To mitiga

Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	AzureCliCredential: Azure CLI not found on path
	AzurePowerShellCredential: PowerShell is not installed
	AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
	InteractiveBrowserCredential: Authentication failed: User did not complete the flow in time
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka

ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	AzureCliCredential: Azure CLI not found on path
	AzurePowerShellCredential: PowerShell is not installed
	AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
	InteractiveBrowserCredential: Authentication failed: User did not complete the flow in time
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.

You should be able to see the Feathr UI by visiting the website below:

http://localhost:8081/

In [None]:
from IPython.display import IFrame
IFrame("http://localhost:8081/projects", 900,500)

## Cleanup

In [None]:
# # Cleaning up the output files. CAUTION: this maybe dangerous if you "reused" the project name.
# import shutil
# shutil.rmtree(WORKING_DIR, ignore_errors=False)