<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Enterprise Feature Store - FeatureProcess
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:18px;font-family:Arial;'><b>Sales Analytics Feature Store Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;'>This notebook demonstrates how to build and manage a feature store for sales analytics using TeradataML. It covers the end-to-end process of feature engineering, transformation, and governance for sales transaction data. The workflow includes:
<ul>
  <li>Loading and transforming raw sales data into engineered features such as transaction counts, maximum and total sales amounts, and region-based aggregations.</li>
  <li>Creating and managing a centralized feature store `analytics` within the `sales_transactions` data domain.</li>
  <li>Ingesting features like count, max, and total sales per region, as well as new and updated features for iterative analytics.</li>
  <li>Demonstrating feature lineage, versioning, and time-travel for reproducible machine learning workflows.</li>
  <li>Enabling collaboration and reusability of high-quality, governed features for downstream ML models and analytics.</li>
</ul>
<p style = 'font-size:16px;font-family:Arial;'>The notebook focuses on features such as aggregated sales metrics, region-based statistics, and temporal feature updates, providing a robust foundation for enterprise-scale sales analytics and machine learning.</p>

<p style = 'font-size:18px;font-family:Arial;'><b>Disclaimer</b></p>

<p style = 'font-size:12px;font-family:Arial;'>
The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.</p>

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>1. Connect to Vantage, Import python packages and explore the dataset</b></p>

In [None]:
!pip install teradataml==20.0.0.7 --quiet

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;'><b>Note: </b><i>Please execute the above pip install to get the latest version of the required library. Be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>

In [None]:
import os 
from teradataml import *
from collections import OrderedDict
from teradatasqlalchemy import INTEGER, FLOAT, VARCHAR
import warnings
warnings.filterwarnings('ignore')

display.max_rows = 5

<hr style="height:2px;border:none;">
<b style = 'font-size:18px;font-family:Arial;'> 1.1 Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=EFS-FeatureProcess.ipynb;' UPDATE FOR SESSION; ''')

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>2. Setup a Feature Store Repository</b></p>
<p style = 'font-size:18px;font-family:Arial;'><b>2.1 Create the FeatureStore</b></p>

In [None]:
fs = FeatureStore(repo="analytics", data_domain="sales_transactions")

<p style = 'font-size:18px;font-family:Arial;'><b>2.2 Setup the FeatureStore</b></p>

In [None]:
fs.setup()

<p style = 'font-size:18px;font-family:Arial;'><b>2.3 Checking Availability</b></p>

In [None]:
fs = FeatureStore(repo="analytics", data_domain="sales_transactions")

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>3. Get Data For demo</b>
<p style = 'font-size:18px;font-family:Arial;'><b>3.1 Load the sales Data for first current week</b></p>

In [None]:
sales_dt = OrderedDict(CustomerID=INTEGER, Quantity=INTEGER, TotalAmount=INTEGER, Region=VARCHAR(20))
df1 = read_csv(table_name="week1_sales",
               filepath=r"data/sales.csv",
               types=sales_dt)
df1.head(3)

<p style = 'font-size:18px;font-family:Arial;'><b>3.2 Perform Data Transformation</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>Transformation Details:</b>    
In this step, we group the sales data by the 'Region' column and compute three key aggregated features for each region:
<ul>
  <li><code>total_TotalAmount</code>: The sum of all sales amounts in the region.</li>
  <li><code>count_TotalAmount</code>: The total number of sales transactions in the region.</li>
  <li><code>max_TotalAmount</code>: The maximum sales amount for a single transaction in the region.</li>
</ul>
<p style = 'font-size:16px;font-family:Arial;'>These features are essential for region-level analytics and are used as managed features in the feature store.</p>

In [None]:
df1 = df1.groupby('Region').assign(total_TotalAmount=df1.TotalAmount.sum(),
                                   count_TotalAmount=df1.TotalAmount.count(),
                                   max_TotalAmount=df1.TotalAmount.max())
df1

<p style = 'font-size:18px;font-family:Arial;'><b>3.3 Load the sales Data for week 2</b></p>

In [None]:
sales_new_dt = OrderedDict(CustomerID=INTEGER, Quantity=INTEGER, TotalAmount=INTEGER, Region=VARCHAR(20))
df2 = read_csv(table_name="week2_sales", 
               filepath=r"data/sales_week2.csv", 
               types=sales_new_dt)
df2.head(3)

<p style = 'font-size:18px;font-family:Arial;'><b>3.4 Perform Data Transformation</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>Transformation Details:</b>    
In this step, we perform the following transformations on the sales data for week 2:
<br>
<b> 1. Group by 'Region' :</b> The data is grouped by the 'Region' column to enable region-level aggregations.
<br>
<b> 2. Aggregate Features :</b> 
<ul>
  <li><code>total_TotalAmount :</code> Calculates the sum of the `TotalAmount` for each region, representing the total sales amount per region.</li>
  <li><code>count_TotalAmount :</code> Counts the number of sales transactions for each region.</li>
  <li><code>max_TotalAmount :</code> Finds the maximum `TotalAmount` for a single transaction in each region.</li>
</ul>
<b> 3. Tentative Feature Engineering :</b> 
<ul>
  <li><code>tentative_incr_in_max_TotalAmount :</code> Adds 100 to the `max_TotalAmount` for each region, simulating a scenario where the maximum sales amount is projected to increase.</li>
  <li><code>tentative_incr_in_total_TotalAmount :</code> Adds 150 to the `total_TotalAmount` for each region, simulating a projected increase in total sales.</li>
    </ul>
<p style = 'font-size:16px;font-family:Arial;'>These engineered features are useful for advanced analytics, scenario planning, and as managed features in the feature store.</p>

In [None]:
df2 = df2.groupby('Region').assign(total_TotalAmount=df2.TotalAmount.sum(),
                                   count_TotalAmount=df2.TotalAmount.count(),
                                   max_TotalAmount=df2.TotalAmount.max())
df2 = df2.assign(tentative_incr_in_max_TotalAmount=df2.max_TotalAmount+100, 
                tentative_incr_in_total_TotalAmount=df2.total_TotalAmount+150)
df2

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>4. Store the data transformations</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We are storing the transformation here. So, even if underlying data varies, the data transformation steps remain same.</p>

In [None]:
sales_df1 = df1.create_view("week1_sales_view")
sales_df1

In [None]:
sales_df2 = df2.create_view("week2_sales_view")
sales_df2

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>5. Ingest the features for current week</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
<ul>
  <li>Store the values of count_TotalAmount, max_TotalAmount, total_TotalAmount for every region.</li>
  <li>Run the FeatureProcess</li>
</ul>

<p style = 'font-size:18px;font-family:Arial;'><b>5.1 Create the FeatureProcess and run it</b></p>

In [None]:
fp1 = FeatureProcess(
    repo="analytics",
    data_domain="sales_transactions",
    object=sales_df1,
    entity="Region",
    features=["count_TotalAmount", "max_TotalAmount", "total_TotalAmount"]
)
fp1.run()

<p style = 'font-size:18px;font-family:Arial;'><b>5.2 See the mind_map for Feature Store</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We ingested three features—<code>count_TotalAmount</code>, <code>max_TotalAmount</code>, and <code>total_TotalAmount</code>—from a single feature process. This demonstrates how multiple related features can be managed and tracked together within the feature store, maintaining their lineage to the originating process.</p>

In [None]:
fs.mind_map()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>6. Exploration</b>
<p style = 'font-size:18px;font-family:Arial;'><b>6.1 Explore properties</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>process_id:</b> The <code>process_id</code> property provides a unique identifier for this feature process run.</p>

In [None]:
fp1.process_id

<p style = 'font-size:16px;font-family:Arial;'><b>entity:</b> The <code>entity</code> property indicates the primary key or business entity for which the features were ingested. </p>

In [None]:
fp1.entity

<p style = 'font-size:16px;font-family:Arial;'><b>features:</b> The <code>features</code> property lists all the features that were ingested by this feature process. </p>

In [None]:
fp1.features

<p style = 'font-size:16px;font-family:Arial;'><b>status:</b> The <code>status</code> property displays the current execution state of the feature process (e.g., running, completed, failed). </p>

In [None]:
fp1.status

<p style = 'font-size:18px;font-family:Arial;'><b>6.2 Let's cross verify week1 ingested data</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>Note :</b> To build real datasets on the ingested features, users should refer to `DatasetCatalog.build_dataset()`. The same applies to time series datasets as well.</p>

In [None]:
df_fp1 = fs.get_data(process_id=fp1.process_id)
df_fp1

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>7. Ingest the features for week2</b>
<p style = 'font-size:18px;font-family:Arial;'><b>7.1 Create the FeatureProcess and run it for only North_week2 Region</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
<ul>
  <li>Store the values of count_TotalAmount, max_TotalAmount, total_TotalAmount, new_count_TotalAmount, new_max_TotalAmount</li>
  <li>Run it for every North_week2 region.</li>
</ul>

In [None]:
fp2 = FeatureProcess(
    repo="analytics",
    data_domain="sales_transactions",
    object=sales_df2,
    entity="Region",
    features=["count_TotalAmount", "max_TotalAmount", "total_TotalAmount", 
              "tentative_incr_in_max_TotalAmount", "tentative_incr_in_total_TotalAmount"]
)
fp2.run(filters=[f"Region='North'"])

<p style = 'font-size:18px;font-family:Arial;'><b>7.2 See the mind_map for Feature Store</b></p>

In [None]:
fs.mind_map()

<p style = 'font-size:18px;font-family:Arial;'><b>7.3 Let's cross verify week2 ingested data for only North region</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>Note :</b> To build real datasets on the ingested features, users should refer to <code>DatasetCatalog.build_dataset()</code>. The same applies to time series datasets as well.</p>

In [None]:
df_fp2 = fs.get_data(process_id=fp2.process_id)
df_fp2

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>8. Ingest the features at specific time</b>
<p style = 'font-size:18px;font-family:Arial;'><b>8.1 Get the Feature Version table </p>
<p style = 'font-size:16px;font-family:Arial;'>Create FeatureCatalog Object</p>

In [None]:
fc = FeatureCatalog(repo="analytics",
                    data_domain="sales_transactions")
fc

<p style = 'font-size:16px;font-family:Arial;'>Check the feature versions</p>

In [None]:
f_ver = fc.list_feature_versions()
f_ver

<p style = 'font-size:16px;font-family:Arial;'>Get the data from feature table</p>

In [None]:
feat_df = DataFrame(in_schema('analytics', 'FS_T_74f39696_8ca1_2744_5460_b3d357904b4c'))
feat_df

<p style = 'font-size:18px;font-family:Arial;'><b>8.2 Ingest the features at specific time</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
<ul>
    <li>Store the values from FeatureProcess2 at specific time</li>
</ul></p>

In [None]:
fp3 = FeatureProcess(
    repo="analytics",
    data_domain="sales_transactions",
    object=fp2.process_id
)
fp3.run(as_of='2025-09-05 07:21:59.000000+00:00')

<p style = 'font-size:18px;font-family:Arial;'><b>8.3 See the mind_map for Feature Store</b></p>

In [None]:
fs.mind_map()

<p style = 'font-size:18px;font-family:Arial;'><b>8.4 Let's cross verify week1 ingested data</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>Note :</b> To build real datasets on the ingested features, users should refer to <code>DatasetCatalog.build_dataset()</code>. The same applies to time series datasets as well.</p>

In [None]:
df_fp3 = fs.get_data(process_id=fp3.process_id)
df_fp3

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>9. Explore the FeatureCatalog</b>
<p style = 'font-size:18px;font-family:Arial;'><b>9.1 Explore FeatureCatalog Properties</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>features:</b> The <code>features</code> property of the dataset catalog lists all features currently available in the datasetcatalog.</p>

In [None]:
fc.features

<p style = 'font-size:16px;font-family:Arial;'><b>entities:</b> The <code>entities</code> property of the dataset catalog lists all entities currently available in the datasetcatalog.</p>

In [None]:
fc.entities

<p style = 'font-size:16px;font-family:Arial;'><b>data_domain:</b> The <code>data_domain</code> property shows the business domain associated with the feature catalog.</p>

In [None]:
fc.data_domain

<p style = 'font-size:18px;font-family:Arial;'><b>9.2 Explore FeatureCatalog Methods</b></p>
<p style = 'font-size:16px;font-family:Arial;'><b>List Features</b></p>

In [None]:
fc.list_features()

<p style = 'font-size:16px;font-family:Arial;'><b>Archive Features</b></p>

In [None]:
fc.archive_features(features='total_TotalAmount')

In [None]:
fc.list_features()

<p style = 'font-size:16px;font-family:Arial;'><b>Delete Features</b></p>

In [None]:
fc.delete_features(features='total_TotalAmount')

In [None]:
fc.list_features()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>10. Explore the FeatureStore</b>
<p style = 'font-size:16px;font-family:Arial;'><b>List feature_catalogs</b></p>

In [None]:
fs.list_feature_catalogs()

<p style = 'font-size:16px;font-family:Arial;'><b>List Entities</b></p>

In [None]:
fs.list_entities()

<p style = 'font-size:16px;font-family:Arial;'><b>List feature_processes</b></p>

In [None]:
fs.list_feature_processes()

<p style = 'font-size:16px;font-family:Arial;'><b>List feature_runs</b></p>

In [None]:
fs.list_feature_runs()

<p style = 'font-size:16px;font-family:Arial;'><b>List Features</b></p>

In [None]:
fs.list_features()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>11. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial;'> <b>Work Tables and Views </b></p>

In [None]:
db_drop_view('week1_sales_view')

In [None]:
db_drop_view('week2_sales_view')

In [None]:
db_drop_table('week1_sales')

In [None]:
db_drop_table('week2_sales')

In [None]:
remove_context()

<p style = 'font-size:18px;font-family:Arial;'><b>11.1 Delete the Feature Store</b></p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

<p style = 'font-size:16px;font-family:Arial;'><b>Note :</b> This will drop the database if all objects are removed.</p>

In [None]:
fs = FeatureStore(repo="analytics", data_domain="sales_transactions")

In [None]:
fs.delete()

In [None]:
remove_context()

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>