# Enterprise Feature Store - DataDomain

## Disclaimer
The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.

## Context 
**Multi-Domain Feature Store Demo**

This notebook demonstrates how to build and manage a feature store across multiple business domains (such as sales and marketing) using TeradataML. Key steps include:
- Loading, transforming, and aggregating sales and marketing data to engineer features relevant to each domain.
- Creating a centralized feature store repository to manage features, entities, and processes for different data domains.
- Ingesting features into separate data domains for robust governance, traceability, and reusability.
- Building datasets and exploring the feature landscape for scalable, collaborative machine learning and analytics.
The workflow provides a practical example of operationalizing feature engineering and feature management in a modern enterprise environment with multiple subject areas.

## 1. Import the required libraries

In [3]:
import os
from teradataml import create_context, DataFrame, DataSource, Entity, Feature, FeatureGroup, FeatureStore, \
FeatureType, FeatureStatus, load_example_data, remove_context, FeatureProcess, FeatureCatalog, in_schema, \
DatasetCatalog, db_drop_table, db_drop_view, DataDomain, read_csv
from getpass import getpass
from collections import OrderedDict
from teradatasqlalchemy import VARCHAR, INTEGER

## 2. Connect to Vantage with Admin user 

Connecting to Vantage with an Admin user is required for initial setup tasks such as creating the feature store, configuring storage, and granting permissions to other users. These operations typically require elevated privileges.

In [6]:
context=create_context(config_file='admin_config_file.env')

## 3. Setup a Feature Store Repository

### 3.1. Create the FeatureStore

In [10]:
fs = FeatureStore(repo="enterprise_marketing_sales")

Repo enterprise_marketing_sales does not exist. Run FeatureStore.setup() to create the repo and setup FeatureStore.


### 3.2. Setup the FeatureStore

In [13]:
fs.setup()

True

### 3.3. Grant the access to user

**Note:** 
Granting read/write access to a user is necessary so they can create, modify, and manage features and metadata within the feature store. This ensures the specified user has the required permissions to work with the feature store objects. If needed, you can later revoke these rights using `fs.revoke.read_write(username)`.

In [18]:
username = getpass(prompt = 'username: ')
fs.grant.read_write(username)

username:  ········


True

## 4. Connect to Vantage with non-admin user

### 4.1. Remove context with Admin user 

In [20]:
remove_context()

True

### 4.2. Create context with non-admin user

In [22]:
context=create_context(config_file='non_admin_config_file.env')

### 4.3. Create Feature Store object with non-admin user to ingest feature values. 

In [25]:
fs = FeatureStore(repo="enterprise_marketing_sales")

FeatureStore is ready to use.


## 5. Get Data For demo

### 5.1 Load the sales_data 

In [28]:
sales2_dt = OrderedDict(CustomerID=VARCHAR(10), Sales_Q1=INTEGER, Sales_Q2=INTEGER, Region=VARCHAR(20), Loayalty_Score=INTEGER, Channel=VARCHAR(20))
sales = read_csv(filepath=r"../data/sales_data.csv", 
                 table_name="sales2_data", 
                 types=sales2_dt)
sales.head(3)



CustomerID,Sales_Q1,Sales_Q2,Region,Loayalty_Score,Channel
S003,115,212,East,74,Retail
S002,153,181,North,89,Online
S001,138,186,West,97,Retail


### 5.2. Perform Data Transformation

**Transformation Details:**
In this step, we aggregate the sales data by the 'Region' column. For each region, we calculate:
- The mean of the 'Loayalty_Score' to understand average customer loyalty per region.
- The count of 'Channel' to determine the number of sales channels or transactions in each region.
These aggregated features provide insights into regional sales performance and customer engagement.

In [31]:
sales_agg = sales.groupby("Region").agg({"Loayalty_Score": "mean", "Channel": "count"})
sales_agg



Region,mean_Loayalty_Score,count_Channel
West,90.42857142857144,7
East,89.0,8
South,88.57142857142857,7
North,82.125,8


### 5.3. Load the marketing_data

In [35]:
marketing2_dt = OrderedDict(AccountID=VARCHAR(10), Campaign_1=INTEGER, Campaign_2=INTEGER, Region=VARCHAR(20),
                            Loayalty_Score=INTEGER, Engagement_Channel=VARCHAR(20))
marketing = read_csv(filepath=r"../data/marketing_data.csv", 
                     table_name="marketing2_data", 
                     types=marketing2_dt)
marketing.head(3)



AccountID,Campaign_1,Campaign_2,Region,Loayalty_Score,Engagement_Channel
M103,44,32,East,74,Event
M102,13,55,South,89,Social
M101,36,44,North,97,Email


### 5.4. Perform Data Transformation

**Transformation Details:**
In this step, we aggregate the sales data by the 'Region' column. For each region, we calculate:
- The mean of the 'Loayalty_Score' to understand average customer loyalty per region.
- The count of 'Channel' to determine the number of sales channels or transactions in each region.
These aggregated features provide insights into regional sales performance and customer engagement.

In [38]:
marketing_agg = marketing.groupby("Region").agg({"Loayalty_Score": "mean", "Engagement_Channel": "count"})
marketing_agg



Region,mean_Loayalty_Score,count_Engagement_Channel
West,89.42857142857143,7
East,89.0,8
South,88.42857142857143,7
North,83.125,8


## 6. Store the data transformations

We are storing the transformation here. So, even if underlying data varies, the data transformation steps remain same.

In [41]:
sales_df = sales_agg.create_view('sales_data_view')
marketing_df = marketing_agg.create_view('marketing_data_view')

## 7. Create FeatureStore with sales and marketing domain

In [43]:
fs_sales = FeatureStore("enterprise_marketing_sales", data_domain='sales')
fs_marketing = FeatureStore("enterprise_marketing_sales", data_domain='marketing')

FeatureStore is ready to use.
FeatureStore is ready to use.


## 8. Perform operation in sales domain

### 8.1 Ingest features from sales data

**Note**: Feature ingestion can also be performed using `FeatureStore.get_feature_process()`.

In [44]:
fp_sales = FeatureProcess(repo='enterprise_marketing_sales',
                          data_domain='sales',
                          entity='Region',
                          object=sales_df,
                          features=['mean_Loayalty_Score', 'count_Channel'],
                          description='Ingesting Features in sales DD')
fp_sales.run()

Process '7e950f67-8a27-11f0-93d4-b0dcef8381ea' started.
Process '7e950f67-8a27-11f0-93d4-b0dcef8381ea' completed.


True

### 8.2 Build dataset in sales domain

In [46]:
dc_sales = fs_sales.get_dataset_catalog()

dc_sales.build_dataset(entity='Region',
                       selected_features={'mean_Loayalty_Score': fp_sales.process_id,
                                          'count_Channel': fp_sales.process_id},
                       view_name="sales_dc_view",
                       description="Building datatset for sales")




Region,mean_Loayalty_Score,count_Channel
North,82.125,8
West,90.42857142857144,7
South,88.57142857142857,7
East,89.0,8


### 8.3 See the mind_map for FeatureStore in sales domain

We ingested three features—`count_price`, `max_price`, and `total_price`—from a single feature process. This demonstrates how multiple related features, datasets can be managed and tracked together within the feature store, maintaining their lineage to the originating process.

In [49]:
fs_sales.mind_map()

## 9. Perform operation in marketing domain

### 9.1. Ingest features from marketing data

In [53]:
fp_mar = fs_marketing.get_feature_process(entity='Region',
                                          features=['mean_Loayalty_Score', 'count_Engagement_Channel'],
                                          object=marketing_df,
                                          description='Ingesting Features in marketing DD')
fp_mar.run()

Process 'a30efba1-8a27-11f0-9119-b0dcef8381ea' started.
Process 'a30efba1-8a27-11f0-9119-b0dcef8381ea' completed.


True

### 9.2. Build dataset in marketing domain

In [55]:
dc_mar = fs_marketing.get_dataset_catalog()

dc_mar.build_dataset(entity='Region',
                     selected_features={'mean_Loayalty_Score': fp_mar.process_id,
                                        'count_Engagement_Channel': fp_mar.process_id},
                      view_name='marketing_dc_view',
                      description='Building dataset for marketing')



Region,mean_Loayalty_Score,count_Engagement_Channel
South,88.42857142857143,7
North,83.125,8
East,89.0,8
West,89.42857142857143,7


### 9.3. See the mind_map for FeatureStore in marketing domain 

We ingested three features—`count_price`, `max_price`, and `total_price`—from a single feature process. This demonstrates how multiple related features, datasets can be managed and tracked together within the feature store, maintaining their lineage to the originating process.

In [59]:
fs_marketing.mind_map()

## 10. Explore DataDomain

### 10.1. Explore `sales` datadomain

#### 10.1.1. Create DataDomain object for sales 

In [63]:
sales_domain = DataDomain(repo='enterprise_marketing_sales',
                          data_domain='sales')
sales_domain

DataDomain(repo=enterprise_marketing_sales, data_domain=sales)

#### 10.1.2. Explore properties

##### 10.1.2.1.  features

The `features` property of the dataset catalog lists all features currently available in the datasetcatalog.

In [67]:
sales_domain.features

[Feature(name=mean_Loayalty_Score), Feature(name=count_Channel)]

##### 10.1.2.2. entities

The `entities` property of the dataset catalog lists all entities currently available in the datasetcatalog.

In [70]:
sales_domain.entities

[Entity(name=Region)]

##### 10.1.2.3. processes

The `processes` property of the dataset catalog lists all processes currently available in the datasetcatalog.

In [73]:
sales_domain.processes

[FeatureProcess(repo=enterprise_marketing_sales, data_domain=sales, process_id=7e950f67-8a27-11f0-93d4-b0dcef8381ea)]

##### 10.1.2.4. datasets

The `datasets` property of the dataset catalog lists all datasets currently available in the datasetcatalog.

In [76]:
sales_domain.datasets

[Dataset(repo=enterprise_marketing_sales, id=f991cf24-0a32-43ad-a740-e9b8c858e306, data_domain=sales)]

### 10.2. Explore `marketing` datadomain

#### 10.2.1. Create DataDomain object for marketing

In [79]:
marketing_domain = DataDomain(repo='enterprise_marketing_sales',
                              data_domain='marketing')
marketing_domain

DataDomain(repo=enterprise_marketing_sales, data_domain=marketing)

#### 10.2.2. Explore properties

##### 10.2.2.1. features

The `features` property of the dataset catalog lists all features currently available in the datasetcatalog.

In [84]:
marketing_domain.features

[Feature(name=mean_Loayalty_Score), Feature(name=count_Engagement_Channel)]

##### 10.2.2.2. entities

The `entities` property of the dataset catalog lists all entities currently available in the datasetcatalog.

In [87]:
marketing_domain.entities

[Entity(name=Region)]

##### 10.2.2.3. entities

The `processes` property of the dataset catalog lists all processes currently available in the datasetcatalog.

In [90]:
marketing_domain.processes

[FeatureProcess(repo=enterprise_marketing_sales, data_domain=marketing, process_id=a30efba1-8a27-11f0-9119-b0dcef8381ea)]

##### 10.2.2.4. datasets

The `datasets` property of the dataset catalog lists all datasets currently available in the datasetcatalog.

In [93]:
marketing_domain.datasets

[Dataset(repo=enterprise_marketing_sales, id=2419f263-2de2-403f-b332-94013e1ff2cc, data_domain=marketing)]

## 11. Cleanup 

### 11.1. Drop Views

In [96]:
db_drop_view('sales_data_view')

True

In [97]:
db_drop_view('marketing_data_view')

True

### 11.2. Drop Tables

In [99]:
db_drop_table('sales2_data')

True

In [100]:
db_drop_table('marketing2_data')

True

### 11.3. Remove Context

In [103]:
remove_context()

True

### 11.4. Delete the FeatureStore

In [105]:
context=create_context(config_file='admin_config_file.env')

**Note** : This will drop the database if all objects are removed.

In [107]:
fs = FeatureStore(repo="enterprise_marketing_sales")
fs.delete()

FeatureStore is ready to use.


The function removes Feature Store and drops the corresponding repo also. Are you sure you want to proceed? (Y/N):  y


True

In [130]:
remove_context()

True