<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Enterprise Feature Store Functions - Getting Started
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

### Disclaimer
The sample code (“Sample Code”) provided is not covered by any Teradata agreements. Please be aware that Teradata has no control over the model responses to such sample code and such response may vary. The use of the model by Teradata is strictly for demonstration purposes and does not constitute any form of certification or endorsement. The sample code is provided “AS IS” and any express or implied warranties, including the implied warranties of merchantability and fitness for a particular purpose, are disclaimed. In no event shall Teradata be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) sustained by you or a third party, however caused and on any theory of liability, whether in contract, strict liability, or tort arising in any way out of the use of this sample code, even if advised of the possibility of such damage.


<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
Teradata Enterprise Feature Store (EFS) Functions are designed to handle feature management within the Vantage environment. While inspired by the syntax of Feast, Teradata EFS Functions stands out, offering efficiency and robustness in data management and feature handling tailored specifically for the use of Teradata Vantage.  
</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
 Teradata EFS Functions use Teradata Dataframes for Feature management, to the contrary of the pandas dataframe of Feast. With Teradata Dataframes we avoid extracting the data to create or use Features from the Enterprise Feature Store (EFS). The EFS Functions are crafted to empower Data Science teams for effective and streamlined feature management. This notebook will walk you through the capabilities of EFS Functions, demonstrating how it integrates seamlessly with your data models and processes.

</p>





<div style = 'font-size:16px;font-family:Arial;color:#00233C'>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'>
<b>Key Concepts of the Enterprise Feature Store (EFS) SDK
</b>
</p>
The Enterprise Feature Store (EFS) SDK is designed with a totally object-oriented approach, focusing on intuitive interaction with feature stores. Central to this design are several core objects: Feature, Entity, DataSource, FeatureGroup. Together, these objects facilitate the efficient management and utilization of features within your data ecosystem, leveraging Teradata Vantage for metadata storage. Here's a closer look at each of these objects and their roles:
</p>
</div>

![EFS Concepts Diagram](./EFS_key_concepts.png)


<div style = 'font-size:16px;font-family:Arial;color:#00233C'>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'>
<b>Feature
</b>

A Feature represents a single, distinct piece of data that can be used in machine learning models. Features are the fundamental building blocks of the EFS, designed to encapsulate specific data types, validation rules, and metadata essential for downstream analysis and modeling.
</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'>
<b>Entity
</b>
</p>
<p>
An Entity serves as the anchor for one or more Features, grouping them by a common identifier. This shared identifier ensures that features within an entity relate to the same logical unit, such as a customer or transaction. The Entity concept ensures data consistency and simplifies the management of feature relationships.
</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'>
<b>Data Source
</b>
</p>

The DataSource object provides a flexible mapping between the results of a SQL query or DataFrame and Features. It describes how raw data from Teradata Vantage can be transformed into structured features ready for machine learning. This abstraction allows for the separation of data retrieval logic from feature management, promoting modularity and reuse.

<p style = 'font-size:18px;font-family:Arial;color:#00233C'>
<b>Feature Group
</b>
</p>

A FeatureGroup represents a collection of Features that are related by a common Entity and originate from the same DataSource. By grouping features this way, the EFS SDK encourages logical organization of features and simplifies batch operations like updates, retrievals, and analysis.


<p style = 'font-size:18px;font-family:Arial;color:#00233C'>
<b>Repository
</b>
</p>
A Repository is a logical workspace to enable the user to work in their Feature Groups, with the possibility to promote features between repositories. This enables the possibility to have personal repositories (for example, a Lab), team repositories to collaborate and a central production repository.
</div>


<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>What You Will Do in This Notebook</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
This notebook is designed to guide you through a series of practical exercises that demonstrate the use of Teradata's Enterprise Feature Store capabilities. By the end of this tutorial, you will have a comprehensive understanding of how to manage and utilize feature stores for machine learning workflows. Here's what you'll learn:
</p>

<div style = 'font-size:16px;font-family:Arial;color:#00233C'>
<ol>
<li>
<b>Setup a Feature Store Repository and Grant access on it to users</b>
<ul>
<li>Learn how to set up a new feature store repository, using Feature Groups, which serves as the foundational environment for storing and managing your data features.</li>
<li>Owner of the FeatureStore can grant/revoke read only, write only or read and write authorization to other user(s) </li>
</ul>
</li>

<li>
<b>Create and Register objects with FeatureStore</b>
<ul>
    <li>Discover <span style="color: #FF4500">how to create a Feature</span> from Teradata DataFrame and register the feature with the Teradata Enterprise Feature Store.</li>
    <li>Discover <span style="color: #FF4500">how to create an Entity</span> from Teradata DataFrame and register the Entity with the Teradata Enterprise Feature Store.</li>
    <li>Discover <span style="color: #FF4500">different ways to create DataSource</span> and register DataSource with the Teradata Enterprise Feature Store.</li>
    <li>Discover <span style="color: #FF4500">different ways to create FeatureGroup</span> and register FeatureGroup with the Teradata Enterprise Feature Store.</li>
</ul>
</li>

<li>
<b>Searching inside Teradata Enterprise Feature Store</b>
<ul>
<li>Explore methods to search in Features, Entities, DataSources and FeatureGroups. </li>
</ul>
</li>


<li>
<b>Modifying FeatureStore Objects</b>
<ul>
<li>Explore methods to modify existing features and other objects within your Enterprise Feature Store to adapt to changes in your data or analysis requirements. </li>
</ul>
</li>

<li>
<b>Combining multiple FeatureGroups.</b>
<ul>
<li>Explore a way to combine multiple FeatureGroups to a single FeatureGroup and store the combined FeatureGroup within your Enterprise Feature Store. </li>
</ul>
</li>


<li>
<b>Archive and Delete objects in FeatureStore.</b>
<ul>
<li>Explore method to archive and delete different objects from FeatureStore. </li>
</ul>
</li>


<li>
<b>Creating Datasets and historic Datasets for ML models</b>
<ul>
<li>Teradata EFS approach to get Datasets or historic Datasets to feed your ML Model.</li>
</ul>
</li>


<li>
<b>Use Enterprise Feature Store with teradataml anaylytic functions.</b>
<ul>
<li>Apply Teradata EFS Functions to create suitable for training a machine learning model, ensuring it is clean, well-structured, and aligned with your model's requirements.</li>
</ul>
</li>


<li>
<b>Repository Governance</b>
<ul>
<li>Promote features from one repository to another repository.</li>
</ul>
</li>
</ol>
</div>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, then use down arrow to go to next cell. Begin running steps with Shift + Enter keys.</p>

In [2]:
from getpass import getpass
from teradataml import create_context, DataFrame, DataSource, Entity, FeatureGroup, FeatureStore, FeatureType, FeatureStatus, load_example_data, remove_context

# Connect to Vantage using create_context.
username = getpass(prompt = 'username: ')
password = getpass(prompt = 'password: ')
hostname = getpass(prompt = 'hostname: ')
context=create_context(host=hostname, username=username, password=password)

username:  ········
password:  ········
hostname:  ········



<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:14px;font-family:Arial;color:#00233C'>
In this tutorial, we will use the <span style="background-color: #eee; font-style: italic; "> load_example_data() </span> function provided by teradataml, which is responsible to load the data to Vantage.
</p>




<p style = 'font-size:14px;font-family:Arial;color:#00233C'>
This will create two tables in Vantage. <b>patient_profile </b> and <b>medical_readings</b></p>


In [3]:
load_example_data('dataframe', 'patient_profile')
load_example_data('dataframe', 'medical_readings')



In [4]:
patient_profile_df = DataFrame('patient_profile')
patient_profile_df

patient_id,record_timestamp,pregnancies,age,bmi,skin_thickness
17,2024-04-10 11:10:59.000000,7,31,29.6,0.0
34,2024-04-10 11:10:59.000000,10,45,27.6,31.0
13,2024-04-10 11:10:59.000000,1,59,30.1,23.0
53,2024-04-10 11:10:59.000000,8,58,33.7,34.0
11,2024-04-10 11:10:59.000000,10,34,38.0,0.0
51,2024-04-10 11:10:59.000000,1,26,24.2,15.0
32,2024-04-10 11:10:59.000000,3,22,24.8,11.0
15,2024-04-10 11:10:59.000000,7,32,30.0,0.0
99,2024-04-10 11:10:59.000000,1,31,49.7,51.0
0,2024-04-10 11:10:59.000000,6,50,33.6,35.0


In [5]:
medical_readings_df = DataFrame('medical_readings')
medical_readings_df

patient_id,record_timestamp,glucose,blood_pressure,insulin,diabetes_pedigree_function,outcome
19,2024-04-10 11:10:59.000000,115,70,96,0.529,1
59,2024-04-10 11:10:59.000000,105,64,142,0.173,0
38,2024-04-10 11:10:59.000000,90,68,0,0.503,1
78,2024-04-10 11:10:59.000000,131,0,0,0.27,1
36,2024-04-10 11:10:59.000000,138,76,0,0.42,0
97,2024-04-10 11:10:59.000000,71,48,76,0.323,0
57,2024-04-10 11:10:59.000000,100,88,110,0.962,0
80,2024-04-10 11:10:59.000000,113,44,0,0.14,0
40,2024-04-10 11:10:59.000000,180,64,70,0.271,0
61,2024-04-10 11:10:59.000000,133,72,0,0.27,1


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Setup a Feature Store Repository and Grant access on it to different users. </b>


<p style = 'font-size:14px;font-family:Arial;color:#00233C'>Let's first setup the FeatureStore with repo name as <span style="background-color: #eee; font-style: italic; "> LabRepoOne </span><i>. Look at Prerequisite for setting up FeatureStore in teradataml user guide.</i>
</p>

In [7]:
# Before creating Repo, let's check existing FeatureStores.
FeatureStore.list_repos()

repos
FSStaging
ProdLabRepoOne
ProdRepoOne
AdmissionsStaging


In [8]:
# FeatureStore is not setup for repo LabRepoOne. Let's setup.
fs = FeatureStore('LabRepoOne')
fs.setup(perm_size='10e8')

In [9]:
# Let's verify by listing the repo's.
FeatureStore.list_repos()

repos
ProdRepoOne
ProdLabRepoOne
FSStaging
LabRepoOne
AdmissionsStaging


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let's Look at the ways to authorize access for FeatureStore to user <span style="background-color: #eee; font-style: italic; "> user1 </span>. </p>

##### Grant Access

In [10]:
user='user1'

In [12]:
# Grant read only access to user1. user1 can able to only see the all objects in FeatureStore but cannot modify these objects.
fs.grant.read(user)

True

In [13]:
# Grant write only access to user1. user1 can able to modify all objects in FeatureStore but cannot see these objects.
fs.grant.write(user)

True

In [14]:
# Grant read and write to user1. user1 will get full access on all objects of FeatureStore.
fs.grant.read_write(user)

True

##### Revoke Access.

In [15]:
# Revoke read access from user1 on FeatureStore LabRepoOne.
fs.revoke.read(user)

True

In [16]:
# Revoke write access from user1 on FeatureStore LabRepoOne.
fs.revoke.write(user)

True

In [17]:
# Revoke read and write access from user1 on FeatureStore LabRepoOne.
fs.revoke.write(user)

True

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Create and Register objects with FeatureStore </b>


### Create and Register Feature

#### creating a Feature

In [18]:
# Creating Feature for Column 'age' from Teradata DataFrame 'patient_profile_df'.
f1 = Feature(name='PatientAge', 
             column=patient_profile_df.age, 
             feature_type=FeatureType.CONTINUOUS, 
             description=None, 
             tags=["PatientProfile", "PatientDetails"])

In [19]:
# Look at underlying properties.
f1.name, f1.column_name, f1.data_type, f1.description, f1.tags, f1.status

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


('PatientAge',
 'age',
 BIGINT(),
 None,
 ['PatientProfile', 'PatientDetails'],
 <FeatureStatus.ACTIVE: 1>)

#### Register Feature with FeatureStore `fs`

In [20]:
# Before even register the Feature, let's look at available Features.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name


In [21]:
# FeatureStore.apply() register every object.
fs.apply(f1)

True

In [22]:
# Let's look at available Features again.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientAge,age,,"PatientProfile, PatientDetails",BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,,


### Create and Register Entity

#### Creating an Entity

In [23]:
# Create entity for DataFrame 'patient_profile_df'
entity=Entity(name='PatientEntity', columns=patient_profile_df.patient_id)

In [24]:
# Look at Entity properties.
entity.name, entity.columns, entity.description

('PatientEntity', ['patient_id'], None)

#### Register Entity with FeatureStore `fs`

In [25]:
# Before even registering Entity, let's look at existing Entities.
fs.list_entities()

name,description,creation_time,modified_time,entity_column


In [26]:
# Register the Entity.
fs.apply(entity)

True

In [27]:
# Look at existing Entities after registering the Entity.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
PatientEntity,,2024-10-18 09:42:05.465919,,patient_id


### Create and Register DataSource
- ##### DataSource can be either created from a SQL Query or from Teradata DataFrame.
- ##### DataSource has argument `timestamp_col_name` which accepts the name of Column in DataSource which indicates when the corresponding record is created. This is much helpfull to get historic dataset. 

#### Creating a DataSource from Teradata DataFrame

In [28]:
# Let's create DataSource from DataFrame `patient_profile_df`.
ds = DataSource(name='PatientProfileSource', source=patient_profile_df, timestamp_col_name='record_timestamp')

In [29]:
# Let's look at properties of DataSource.
ds.name, ds.source, ds.description, ds.timestamp_col_name

('PatientProfileSource',
 'select * from "patient_profile"',
 None,
 'record_timestamp')

#### Creating a DataSource from SQL Query

In [30]:
# Let's create DataSource from DataFrame `patient_profile_df`.
ds = DataSource(name='PatientProfileSource', source="SELECT * FROM PATIENT_PROFILE")

In [31]:
# Let's look at properties of DataSource.
ds.name, ds.source, ds.description

('PatientProfileSource', 'SELECT * FROM PATIENT_PROFILE', None)

#### Register DataSource with FeatureStore `fs`

In [32]:
# Before registering let's look at existing DataSources.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time


In [33]:
# Register DataSource with repo.
fs.apply(ds)

True

In [34]:
# Let's look at available DataSources after registration.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,,,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,


### Create and Register FeatureGroup
- ##### FeatureGroup can be created using Teradata DataFrame. 
- ##### FeatureGroup can be created using SQL Query.
- ##### FeatureGroup can be created using objects of Feature, Entity, DataSource. 

#### Creating a FeatureGroup from Teradata DataFrame

In [35]:
fg = FeatureGroup.from_DataFrame(
    name='PatientProfileDF', 
    entity_columns='patient_id', 
    df=patient_profile_df, 
    timestamp_col_name='record_timestamp'
)

In [36]:
# Let's look at Properties.
fg.features, fg.entity, fg.data_source, fg.description

([Feature(name=pregnancies),
  Feature(name=age),
  Feature(name=bmi),
  Feature(name=skin_thickness)],
 Entity(name=PatientProfileDF),
 DataSource(name=PatientProfileDF),
 None)

#### Creating a FeatureGroup from SQL Query

In [37]:
fg = FeatureGroup.from_query(
    name='PatientProfileQuery', 
    entity_columns='patient_id', 
    query="select * from patient_profile", 
    timestamp_col_name='record_timestamp'
)

In [38]:
# Let's look at Properties.
fg.features, fg.entity, fg.data_source, fg.description

([Feature(name=pregnancies),
  Feature(name=age),
  Feature(name=bmi),
  Feature(name=skin_thickness)],
 Entity(name=PatientProfileQuery),
 DataSource(name=PatientProfileQuery),
 None)

#### Creating a FeatureGroup using objects of Feature, Entity and DataSource

In [39]:
fg = FeatureGroup(name='PatientProfileObjs', features=[f1], entity=entity, data_source=ds)

In [40]:
# Let's look at Properties.
fg.features, fg.entity, fg.data_source, fg.description

([Feature(name=PatientAge)],
 Entity(name=PatientEntity),
 DataSource(name=PatientProfileSource),
 None)

#### Register FeatureGroup with FeatureStore `fs`

In [41]:
# Let's look at underlying FeatureGroups first.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time


In [42]:
# Let's look at Available Features also. Notice: Feature is not associated with any group.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientAge,age,,"PatientProfile, PatientDetails",BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,,


In [43]:
# Register FeatureGroup with FeatureStore.
fs.apply(fg)

True

In [44]:
# Let's look at FeatureGroups after registration.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
PatientProfileObjs,,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,


In [45]:
# Let's look at Available Features again. Notice: Feature is now associated with group.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientAge,age,,"PatientProfile, PatientDetails",BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:44:00.820498,PatientProfileObjs


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Searching inside Teradata Enterprise Feature Store </b>

- ##### How to search for Features: `FeatureStore.list_features()` returns Teradata DataFrame. All the filter options available on Teradata DataFrame can be used for searching. Look at example below.
- ##### How to search for Entities: `FeatureStore.list_entities()` returns Teradata DataFrame. All the filter options available on Teradata DataFrame can be used for searching.
- ##### How to search for DataSources: `FeatureStore.list_data_sources()` returns Teradata DataFrame. All the filter options available on Teradata DataFrame can be used for searching.
- ##### How to search for FeatureGroups: `FeatureStore.list_feature_groups()` returns Teradata DataFrame. All the filter options available on Teradata DataFrame can be used for searching. 

In [46]:
# Let's first create some more Features and register with repo. Then we can use same for searching.
f1=Feature(name='PatientBMI', column=patient_profile_df.bmi)
fs.apply(f1)

True

In [47]:
# First list the Features.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,,
PatientAge,age,,"PatientProfile, PatientDetails",BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:44:00.820498,PatientProfileObjs


In [49]:
# Filter the Features registerd at day 18. 
# Note: One can use all the filter options available on Teradata DataFrame. Look at user guide to look at available filter options.
features_df = fs.list_features()
features_df[features_df.creation_time.day_of_month()==18]

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,,
PatientAge,age,,"PatientProfile, PatientDetails",BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:44:00.820498,PatientProfileObjs


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Modifying FeatureStore Objects </b>

#### Teradata EFS exposed below API's to get the objects from FeatureStore
- ##### `FeatureStore.get_feature()` to get the `Feature` object from FeatureStore.
- ##### `FeatureStore.get_entity()` to get the `Entity` object from FeatureStore.
- ##### `FeatureStore.get_data_source()` to get the `DataSource` object from FeatureStore.
- ##### `FeatureStore.get_feature_group()` to get the `FeatureGroup` object from FeatureStore.
##### Use these API's to get corresponding object, modify the corresponding property, then again register object with repository using `FeatureStore.apply()`.

#### Update description for Feature `PatientAge`

In [50]:
feature=fs.get_feature('PatientAge')
feature

Feature(name=PatientAge)

In [51]:
# Update Description and tags.
feature.description="Patient's age for patient profile."
feature.tags = ['PatientProfile']
fs.apply(feature)

True

In [52]:
# Let's look at features again. Look for description column. 
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,,
PatientAge,age,Patient's age for patient profile.,PatientProfile,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:45:00.961683,PatientProfileObjs


#### Update description for Entity `PatientEntity`

In [53]:
# Before updating description, let's look at Entities.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
PatientEntity,,2024-10-18 09:42:05.465919,2024-10-18 09:44:00.863871,patient_id


In [54]:
# Get Entity from FeatureStore.
entity = fs.get_entity('PatientEntity')
entity

Entity(name=PatientEntity)

In [55]:
# Update Entity description.
entity.description = "Entity for Patient Profile."
fs.apply(entity)

True

In [56]:
# After updating description, let's look at Entities.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
PatientEntity,Entity for Patient Profile.,2024-10-18 09:42:05.465919,2024-10-18 09:45:24.025992,patient_id


#### Update `timestamp_col_name` for DataSource `PatientProfileSource`

In [57]:
# Before updating time stamp column, let's look at DataSource.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,,,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:44:00.972704


In [58]:
# First get the DataSource.
data_source = fs.get_data_source('PatientProfileSource')
data_source

DataSource(name=PatientProfileSource)

In [59]:
# Update time stamp column.
data_source.timestamp_col_name = 'record_timestamp'
fs.apply(data_source)

True

In [60]:
# After updating time stamp column, let's look at DataSource.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:45:54.985598


#### Update `FeatureGroup` 
- ##### Note: Updating FeatureGroup will update the underlying Feature(s), DataSource, Entity.

In [61]:
# Before updating description, let's look at FeatureGroup.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
PatientProfileObjs,,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,


In [62]:
# Before updating description, let's look at DataSources.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:45:54.985598


In [63]:
# Get FeatureGroup.
fg = fs.get_feature_group('PatientProfileObjs')

# Update DataSource description and FeatureGroup description.
fg.data_source.description = "Data Source for Patient Profile."
fg.description = "FeatureGroup for Patient Profile."

# Register FeatureGroup with FeatureStore.
fs.apply(fg)

True

In [64]:
# After updating description, let's look at FeatureGroup.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:46:25.247166


In [65]:
# After updating description, let's look at DataSource.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,Data Source for Patient Profile.,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:46:25.205373


#### How to add a new Feature or change Entity or DataSource to an Existing FeatureGroup
- ##### You can always modify FeatureGroup with `FeatureGroup.apply()` method.
- ##### Note: `FeatureGroup.apply()` will not update details to `repo`. You should do `FeatureStore.apply()` to update repo.

In [66]:
# Before adding Feature, let's look at available Features.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,,
PatientAge,age,Patient's age for patient profile.,PatientProfile,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:46:25.053138,PatientProfileObjs


In [67]:
# Let's add a new Feature for FeatureGroup PatientProfileObjs
f2 = fs.get_feature('PatientBMI')
# First register the Feature with FeatureGroup.
fg.apply(f2)

True

In [68]:
# Then, Register FeatureGroup with FeatureStore.
fs.apply(fg)

True

In [69]:
# Let's look at Features.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,2024-10-18 09:47:56.250386,PatientProfileObjs
PatientAge,age,Patient's age for patient profile.,PatientProfile,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:47:56.207898,PatientProfileObjs


#### How to remove a new Feature from an Existing FeatureGroup
- ##### You can use `FeatureGroup.remove()` method to remove object from FeatureGroup.
- ##### Note: `FeatureGroup.remove()` will not update details to `repo`. You should do `FeatureStore.apply()` to update repo.

In [70]:
# Let's remove Feature `PatientBMI` from FeatureGroup `PatientProfileObjs`.
fg.remove(f2)

True

In [71]:
# Update FeatureGroup with FeatureStore.
fs.apply(fg)

True

In [72]:
# Let's look at Features.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,2024-10-18 09:47:56.250386,
PatientAge,age,Patient's age for patient profile.,PatientProfile,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:48:31.080967,PatientProfileObjs


In [73]:
# Let's look at FeatureGroups.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:48:31.277138


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Combining multiple FeatureGroups </b>

- ##### One can combine multiple FeatureGroups to a single group using `+` operator. Once you combine multiple FeatureGroups, you will again get a new FeatureGroup. The name of new FeatureGroup is combined name of all FeatureGroups. For example, if you are combining FeatureGroups `group1`, `group2`, `group3`, then the new name is `group1_group2_group3` . You can change the name if you are looking to.
- ##### New FeatureGroup will have Features from all the individual FeatureGroups.
- ##### When you are combining multiple FeatureGroups, corresponding `Entity` and `time_stamp_column` should be same for all individual FeatureGroups.

In [74]:
# Let's first create individual FeatureGroups first.
patient_profile_fg = FeatureGroup.from_DataFrame(
    name='PatientProfile', 
    df=patient_profile_df, 
    entity_columns='patient_id', 
    timestamp_col_name='record_timestamp'
)
medical_readings_fg = FeatureGroup.from_DataFrame(
    name='MedicalReadings', 
    df=medical_readings_df, 
    entity_columns='patient_id', 
    timestamp_col_name='record_timestamp'
)

In [75]:
# Look at Features first for FeatureGroups.
patient_profile_fg.features

[Feature(name=pregnancies),
 Feature(name=age),
 Feature(name=bmi),
 Feature(name=skin_thickness)]

In [76]:
medical_readings_fg.features

[Feature(name=glucose),
 Feature(name=blood_pressure),
 Feature(name=insulin),
 Feature(name=diabetes_pedigree_function),
 Feature(name=outcome)]

In [77]:
# Create new FeatureGroup.
combined_fg = patient_profile_fg + medical_readings_fg

In [78]:
# Look at new FeatureGroup name.
combined_fg.name

'PatientProfile_MedicalReadings'

In [79]:
# Look at combined features.
combined_fg.features

[Feature(name=pregnancies),
 Feature(name=age),
 Feature(name=bmi),
 Feature(name=skin_thickness),
 Feature(name=glucose),
 Feature(name=blood_pressure),
 Feature(name=insulin),
 Feature(name=diabetes_pedigree_function),
 Feature(name=outcome)]

In [80]:
# Push individual FeatureGroups and also Combined FeatureGroup.
fs.apply(patient_profile_fg)

True

In [81]:
# Push individual FeatureGroups and also Combined FeatureGroup.
fs.apply(medical_readings_fg)

True

In [82]:
# Push individual FeatureGroups and also Combined FeatureGroup.
fs.apply(combined_fg)

True

In [83]:
# Let's look at FeatureGroups.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
MedicalReadings,,MedicalReadings,MedicalReadings,2024-10-18 09:49:06.464922,
PatientProfile,,PatientProfile,PatientProfile,2024-10-18 09:49:03.150774,
PatientProfile_MedicalReadings,Combined FeatureGroup for groups PatientProfile and MedicalReadings.,PatientProfile_MedicalReadings,PatientProfile_MedicalReadings,2024-10-18 09:49:09.855764,
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:48:31.277138


In [84]:
# Let's look at DataSources.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfile,,record_timestamp,"select * from ""patient_profile""",2024-10-18 09:49:03.109021,
PatientProfileSource,Data Source for Patient Profile.,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:48:31.255960
MedicalReadings,,record_timestamp,"select * from ""medical_readings""",2024-10-18 09:49:06.422885,
PatientProfile_MedicalReadings,Combined DataSource for PatientProfile and MedicalReadings,record_timestamp,"SELECT A.patient_id, A.record_timestamp, A.pregnancies, A.age, A.bmi, A.skin_thickness, B.glucose, B.blood_pressure, B.insulin, B.diabetes_pedigree_function, B.outcome  FROM (select * from ""patient_profile"") AS A, (select * from ""medical_readings"") AS B  WHERE A.patient_id = B.patient_id AND A.record_timestamp = B.record_timestamp",2024-10-18 09:49:09.813927,


In [85]:
# Let's look at Entities.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
PatientProfile,,2024-10-18 09:49:03.019868,,patient_id
PatientEntity,Entity for Patient Profile.,2024-10-18 09:42:05.465919,2024-10-18 09:48:31.126957,patient_id
MedicalReadings,,2024-10-18 09:49:06.294178,,patient_id
PatientProfile_MedicalReadings,,2024-10-18 09:49:09.706350,,patient_id


In [86]:
# Let's look at Features after pushing all FeatureGroups.
fs.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientAge,age,Patient's age for patient profile.,PatientProfile,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:41:31.294087,2024-10-18 09:48:31.080967,PatientProfileObjs
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile_MedicalReadings
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile_MedicalReadings
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile
blood_pressure,blood_pressure,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:06.164910,2024-10-18 09:49:09.618582,PatientProfile_MedicalReadings
blood_pressure,blood_pressure,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:06.164910,2024-10-18 09:49:09.618582,MedicalReadings
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile_MedicalReadings
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,2024-10-18 09:47:56.250386,


In [87]:
# Filter the features to understand the data more. Note that, Feature `blood_pressure` is mapped to two FeatureGroups.
features_df = fs.list_features()
features_df = features_df[features_df.name == 'blood_pressure']
features_df

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
blood_pressure,blood_pressure,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:06.164910,2024-10-18 09:49:09.618582,MedicalReadings
blood_pressure,blood_pressure,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:06.164910,2024-10-18 09:49:09.618582,PatientProfile_MedicalReadings


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>6. Archive and Delete objects in FeatureStore </b>

- ##### Archive and Delete are two different operations and they are not same in FeatureStore.
- ##### Archive stages objects instead of removing it completly from FeatureStore. Archived objects will not be part of any further processing.
  - ##### use `FeatureStore.archive_feature()` to archive a Feature. `FeatureStore.list_features(archived=True)` lists archived Features.
  - ##### use `FeatureStore.archive_feature()` to archive a Entity. `FeatureStore.list_entities(archived=True)` lists archived Entities.
  - ##### use `FeatureStore.archive_data_source()` to archive a DataSource. `FeatureStore.list_data_sources(archived=True)` to list archived DataSources.
  - ##### use `FeatureStore.archive_feature_group()` to archive a FeatureGroup. `FeatureStore.list_feature_groups(archived=True)` to list archived FeatureGroups.
- ##### Archiving FeatureGroup will archive the corresponding Feature, Entity and DataSource.
- ##### If a Feature is associated with a FeatureGroup, it can not be archived. First the Feature should be removed from FeatureGroup and then archive it.
- ##### If an Entity is associated with a FeatureGroup, it can not be archived. First the Entity should be removed from FeatureGroup and then archive it.
- ##### If a DataSource is associated with a FeatureGroup, it can not be archived. First the DataSource should be removed from FeatureGroup and then archive it.
- ##### Delete will remove the archived objects.
  - ##### use `FeatureStore.delete_feature()` to delete a Feature.
  - ##### use `FeatureStore.delete_entity()` to delete a Entity.
  - ##### use `FeatureStore.delete_data_source()` to delete a DataSource.
  - ##### use `FeatureStore.delete_feature_group()` to delete a FeatureGroup.
- ##### Deleting FeatureGroup <u><b> will not </b></u> remove corresponding archived Features or archived Entities or archived DataSources. You should delete these with corresponding API's.

#### Archive a Feature. Delete the archived Feature.

In [88]:
# Let's first look at Features which are not associated with any FeatureGroup.
features_df = fs.list_features()
features_df[features_df.group_name == None]

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,2024-10-18 09:47:56.250386,


In [89]:
# Let's archive the Feature `PatientBMI`. Before archiving Feature, let's look at features which are archived.
fs.list_features(archived=True)

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,archived_time,group_name


In [90]:
# Archive it.
fs.archive_feature('PatientBMI')

Feature 'PatientBMI' is archived.


True

In [91]:
# Let's look at archived Features.
fs.list_features(archived=True)

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,archived_time,group_name
PatientBMI,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:44:15.268962,2024-10-18 09:47:56.250386,2024-10-18 09:58:49.380000,PatientProfileObjs


In [92]:
# Delete the archived Feature.
fs.delete_feature('PatientBMI')

Feature 'PatientBMI' is deleted.


True

In [93]:
# Let's look at archived Features again.
fs.list_features(archived=True)

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,archived_time,group_name


#### Archive a FeatureGroup. Delete the archived objects.

In [94]:
# Before archiving group, let's look at Features for FeatureGroup. 
features_df[features_df.group_name == 'PatientProfile']

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
pregnancies,pregnancies,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.829843,2024-10-18 09:49:09.484713,PatientProfile
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile


In [95]:
# Note: These Features are mapped to other FeatureGroups also.
features_df[(
    (features_df.name == 'skin_thickness') | (features_df.name == 'pregnancies') | (features_df.name == 'age') | (features_df.name == 'bmi')
)]

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile
pregnancies,pregnancies,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.829843,2024-10-18 09:49:09.484713,PatientProfile
pregnancies,pregnancies,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.829843,2024-10-18 09:49:09.484713,PatientProfile_MedicalReadings
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile_MedicalReadings
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile_MedicalReadings
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile_MedicalReadings


In [96]:
# Before archiving group, let's look at DataSources.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
MedicalReadings,,record_timestamp,"select * from ""medical_readings""",2024-10-18 09:49:06.422885,
PatientProfile,,record_timestamp,"select * from ""patient_profile""",2024-10-18 09:49:03.109021,
PatientProfile_MedicalReadings,Combined DataSource for PatientProfile and MedicalReadings,record_timestamp,"SELECT A.patient_id, A.record_timestamp, A.pregnancies, A.age, A.bmi, A.skin_thickness, B.glucose, B.blood_pressure, B.insulin, B.diabetes_pedigree_function, B.outcome  FROM (select * from ""patient_profile"") AS A, (select * from ""medical_readings"") AS B  WHERE A.patient_id = B.patient_id AND A.record_timestamp = B.record_timestamp",2024-10-18 09:49:09.813927,
PatientProfileSource,Data Source for Patient Profile.,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:48:31.255960


In [97]:
# Let's look at archived DataSources.
fs.list_data_sources(archived=True)

name,description,timestamp_col_name,source,creation_time,modified_time,archived_time


In [98]:
# Before archiving group, let's look at Entities.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
MedicalReadings,,2024-10-18 09:49:06.294178,,patient_id
PatientProfile,,2024-10-18 09:49:03.019868,,patient_id
PatientProfile_MedicalReadings,,2024-10-18 09:49:09.706350,,patient_id
PatientEntity,Entity for Patient Profile.,2024-10-18 09:42:05.465919,2024-10-18 09:48:31.126957,patient_id


In [99]:
# Let's look at archived Entities.
fs.list_entities(archived=True)

name,description,creation_time,modified_time,archived_time,entity_column


In [100]:
# Before archiving group, let's look at FeatureGroups. 
# Notice, FeatureGroup `PatientProfileObjs` is associated with DataSource `PatientProfileSource` and Entity `PatientEntity`
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
MedicalReadings,,MedicalReadings,MedicalReadings,2024-10-18 09:49:06.464922,
PatientProfile,,PatientProfile,PatientProfile,2024-10-18 09:49:03.150774,
PatientProfile_MedicalReadings,Combined FeatureGroup for groups PatientProfile and MedicalReadings.,PatientProfile_MedicalReadings,PatientProfile_MedicalReadings,2024-10-18 09:49:09.855764,
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:48:31.277138


In [101]:
# Let's look at archived FeatureGroups.
fs.list_feature_groups(archived=True)

name,description,data_source_name,entity_name,creation_time,modified_time,archived_time


In [102]:
# Let's archive FeatureGroup `PatientProfileObjs`.
fs.archive_feature_group('PatientProfile')

FeatureGroup 'PatientProfile' is archived.


True

In [103]:
# Let's look at FeatureGroups after archive.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
MedicalReadings,,MedicalReadings,MedicalReadings,2024-10-18 09:49:06.464922,
PatientProfile_MedicalReadings,Combined FeatureGroup for groups PatientProfile and MedicalReadings.,PatientProfile_MedicalReadings,PatientProfile_MedicalReadings,2024-10-18 09:49:09.855764,
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:48:31.277138


In [104]:
# Look at archived FeatureGroup.
fs.list_feature_groups(archived=True)

name,description,data_source_name,entity_name,creation_time,modified_time,archived_time
PatientProfile,,PatientProfile,PatientProfile,2024-10-18 09:49:03.150774,,2024-10-18 10:00:14.540000


In [105]:
# After archiving group, let's look at Features for FeatureGroup.
features_df[features_df.group_name == 'PatientProfile']

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name


In [106]:
# Look at Features and observe group_name.
features_df[(
    (features_df.name == 'skin_thickness') | (features_df.name == 'pregnancies') | (features_df.name == 'age') | (features_df.name == 'bmi')
)]

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
pregnancies,pregnancies,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.829843,2024-10-18 09:49:09.484713,PatientProfile_MedicalReadings
bmi,bmi,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.936402,2024-10-18 09:49:09.552748,PatientProfile_MedicalReadings
skin_thickness,skin_thickness,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.978316,2024-10-18 09:49:09.574718,PatientProfile_MedicalReadings
age,age,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:49:02.894388,2024-10-18 09:49:09.530402,PatientProfile_MedicalReadings


In [108]:
# Let's look at archived Features. No Feature is archived because these Features are mapped to other FeatureGroup also.
fs.list_features(archived=True)

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,archived_time,group_name


In [109]:
# After archiving group, let's look at DataSources. 
# Notice, DataSource `PatientProfileSource`, which is associated with FeatureGroup `PatientProfileObjs` is also archived.
fs.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
PatientProfileSource,Data Source for Patient Profile.,record_timestamp,SELECT * FROM PATIENT_PROFILE,2024-10-18 09:42:55.680803,2024-10-18 09:48:31.255960
MedicalReadings,,record_timestamp,"select * from ""medical_readings""",2024-10-18 09:49:06.422885,
PatientProfile_MedicalReadings,Combined DataSource for PatientProfile and MedicalReadings,record_timestamp,"SELECT A.patient_id, A.record_timestamp, A.pregnancies, A.age, A.bmi, A.skin_thickness, B.glucose, B.blood_pressure, B.insulin, B.diabetes_pedigree_function, B.outcome  FROM (select * from ""patient_profile"") AS A, (select * from ""medical_readings"") AS B  WHERE A.patient_id = B.patient_id AND A.record_timestamp = B.record_timestamp",2024-10-18 09:49:09.813927,


In [110]:
# Look at archived DataSources.
fs.list_data_sources(archived=True)

name,description,timestamp_col_name,source,creation_time,modified_time,archived_time
PatientProfile,,record_timestamp,"select * from ""patient_profile""",2024-10-18 09:49:03.109021,,2024-10-18 10:00:14.670000


In [111]:
# After archiving group, let's look at Entities. 
# Notice, Entity `PatientEntity`, which is associated with FeatureGroup `PatientProfileObjs` is also archived.
fs.list_entities()

name,description,creation_time,modified_time,entity_column
MedicalReadings,,2024-10-18 09:49:06.294178,,patient_id
PatientProfile_MedicalReadings,,2024-10-18 09:49:09.706350,,patient_id
PatientEntity,Entity for Patient Profile.,2024-10-18 09:42:05.465919,2024-10-18 09:48:31.126957,patient_id


In [112]:
fs.list_entities(archived=True)

name,description,creation_time,modified_time,archived_time,entity_column
PatientProfile,,2024-10-18 09:49:03.019868,,2024-10-18 10:00:14.620000,patient_id


In [113]:
# Delete archived FeatureGroup. 
fs.delete_feature_group('PatientProfile')

FeatureGroup 'PatientProfile' is deleted.


True

In [114]:
# Let's look at archived FeatureGroups after delete.
fs.list_feature_groups(archived=True)

name,description,data_source_name,entity_name,creation_time,modified_time,archived_time


In [115]:
# Delete archived DataSources.
fs.delete_data_source('PatientProfile')

DataSource 'PatientProfile' is deleted.


True

In [116]:
# Let's look at archived DataSources after delete.
fs.list_data_sources(archived=True)

name,description,timestamp_col_name,source,creation_time,modified_time,archived_time


In [117]:
# Delete archived Entities.
fs.delete_entity('PatientProfile')

Entity 'PatientProfile' is deleted.


True

In [118]:
# Let's look at archived Entities after delete.
fs.list_entities(archived=True)

name,description,creation_time,modified_time,archived_time,entity_column


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>7. Creating Datasets and historic Datasets for ML models </b>


##### Since FeatureStore stores DataSource also, you can retrive Teradata DataFrame from FeatureStore. 
###### `FeatureStore.get_dataset()` get's Teradata DataFrame from FeatureGroup.

In [119]:
# Let's look at available FeatureGroups first.
fs.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
MedicalReadings,,MedicalReadings,MedicalReadings,2024-10-18 09:49:06.464922,
PatientProfile_MedicalReadings,Combined FeatureGroup for groups PatientProfile and MedicalReadings.,PatientProfile_MedicalReadings,PatientProfile_MedicalReadings,2024-10-18 09:49:09.855764,
PatientProfileObjs,FeatureGroup for Patient Profile.,PatientProfileSource,PatientEntity,2024-10-18 09:44:00.994420,2024-10-18 09:48:31.277138


In [120]:
# Get DataSet for FeatureGroup PatientProfile. 
fs.get_dataset('MedicalReadings')

patient_id,record_timestamp,outcome,blood_pressure,diabetes_pedigree_function,insulin,glucose
19,2024-04-10 11:10:59.000000,1,70,0.529,96,115
59,2024-04-10 11:10:59.000000,0,64,0.173,142,105
38,2024-04-10 11:10:59.000000,1,68,0.503,0,90
78,2024-04-10 11:10:59.000000,1,0,0.27,0,131
36,2024-04-10 11:10:59.000000,0,76,0.42,0,138
97,2024-04-10 11:10:59.000000,0,48,0.323,76,71
57,2024-04-10 11:10:59.000000,0,88,0.962,110,100
80,2024-04-10 11:10:59.000000,0,44,0.14,0,113
40,2024-04-10 11:10:59.000000,0,64,0.271,70,180
61,2024-04-10 11:10:59.000000,1,72,0.27,0,133


In [121]:
# Let's get DataSet for combined FeatureGroup. 
# Interesting point to observe:
#     DataSet will have all the combined Features of both FeatureGroups.
#     patient_id and record_timestamp will remain as it is.
fs.get_dataset('PatientProfile_MedicalReadings')

patient_id,record_timestamp,outcome,age,bmi,skin_thickness,diabetes_pedigree_function,blood_pressure,insulin,glucose,pregnancies
19,2024-04-10 11:10:59.000000,1,32,34.6,30.0,0.529,70,96,115,1
59,2024-04-10 11:10:59.000000,0,22,41.5,41.0,0.173,64,142,105,0
38,2024-04-10 11:10:59.000000,1,27,38.2,42.0,0.503,68,0,90,2
0,2024-04-10 11:10:59.000000,1,50,33.6,35.0,0.627,72,0,148,6
17,2024-04-10 11:10:59.000000,1,31,29.6,0.0,0.254,74,0,107,7
15,2024-04-10 11:10:59.000000,1,32,30.0,0.0,0.484,0,0,100,7
34,2024-04-10 11:10:59.000000,0,45,27.6,31.0,0.512,78,0,122,10
13,2024-04-10 11:10:59.000000,1,59,30.1,23.0,0.398,60,846,189,1
99,2024-04-10 11:10:59.000000,1,31,49.7,51.0,0.325,90,220,122,1
80,2024-04-10 11:10:59.000000,0,22,22.4,13.0,0.14,44,0,113,3


#### In some cases, you need the historic DataSet to perform ML Model. In such cases, use API `FeatureStore.get_dataset()` and filter the data using filter options. Let's look at an example.

In [122]:
df = fs.get_dataset('PatientProfile_MedicalReadings')

In [123]:
# Assume you want to feed only week 15 data to your model. 
week15_df = df[df.record_timestamp.week()==15]
week15_df

patient_id,record_timestamp,outcome,age,bmi,skin_thickness,diabetes_pedigree_function,blood_pressure,insulin,glucose,pregnancies
19,2024-04-10 11:10:59.000000,1,32,34.6,30.0,0.529,70,96,115,1
59,2024-04-10 11:10:59.000000,0,22,41.5,41.0,0.173,64,142,105,0
38,2024-04-10 11:10:59.000000,1,27,38.2,42.0,0.503,68,0,90,2
0,2024-04-10 11:10:59.000000,1,50,33.6,35.0,0.627,72,0,148,6
17,2024-04-10 11:10:59.000000,1,31,29.6,0.0,0.254,74,0,107,7
15,2024-04-10 11:10:59.000000,1,32,30.0,0.0,0.484,0,0,100,7
34,2024-04-10 11:10:59.000000,0,45,27.6,31.0,0.512,78,0,122,10
13,2024-04-10 11:10:59.000000,1,59,30.1,23.0,0.398,60,846,189,1
99,2024-04-10 11:10:59.000000,1,31,49.7,51.0,0.325,90,220,122,1
80,2024-04-10 11:10:59.000000,0,22,22.4,13.0,0.14,44,0,113,3


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>8. Use Enterprise Feature Store with teradataml analytic functions </b>


##### teradataml analytic functions accepts Features as input.
###### Let's look at Diabetes prediction using teradataml analytic function `XGBoost()`.

In [124]:
# First get the Dataset.
medical_readings_df = fs.get_dataset('MedicalReadings')
medical_readings_df

patient_id,record_timestamp,outcome,blood_pressure,diabetes_pedigree_function,insulin,glucose
19,2024-04-10 11:10:59.000000,1,70,0.529,96,115
59,2024-04-10 11:10:59.000000,0,64,0.173,142,105
38,2024-04-10 11:10:59.000000,1,68,0.503,0,90
78,2024-04-10 11:10:59.000000,1,0,0.27,0,131
36,2024-04-10 11:10:59.000000,0,76,0.42,0,138
97,2024-04-10 11:10:59.000000,0,48,0.323,76,71
57,2024-04-10 11:10:59.000000,0,88,0.962,110,100
80,2024-04-10 11:10:59.000000,0,44,0.14,0,113
40,2024-04-10 11:10:59.000000,0,64,0.271,70,180
61,2024-04-10 11:10:59.000000,1,72,0.27,0,133


In [125]:
# Split DataSet in to two samples.
sampled_df = medical_readings_df.sample(frac=[0.7, 0.3])
train_df = sampled_df[sampled_df.sampleid==2]
test_df = sampled_df[sampled_df.sampleid==1]

In [126]:
# Get the FeatureGroup. Notice the Feature `outcome` should be set as label. 
medical_readings_fg=fs.get_feature_group('MedicalReadings')
medical_readings_fg.labels='outcome'

In [127]:
from teradataml import XGBoost
model = XGBoost(data=train_df,
                input_columns=medical_readings_fg.features,
                response_column = medical_readings_fg.labels,
                max_depth=3,
                lambda1 = 1000.0,
                model_type='Classification',
                seed=-1,
                shrinkage_factor=0.1,
                iter_num=2)

In [128]:
# Score the model using test data.
XGBoostPredict_out_1 = model.predict(newdata=test_df,
                                     id_column='patient_id',
                                     model_type='Classification'
                                    )

In [129]:
XGBoostPredict_out_1.result

patient_id,Prediction,Confidence_Lower,Confidence_upper
17,0,1.0,1.0
34,0,0.5,0.5
13,0,0.5,0.5
40,1,1.0,1.0
80,0,0.5,0.5
59,0,1.0,1.0
38,0,1.0,1.0
57,0,1.0,1.0
19,0,0.5,0.5
15,0,1.0,1.0


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>9. Repository Governance </b>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
So far, we have been working in a repo called "LabRepoOne", with Teradata EFS Functions, you can manage your Feature Store Repos and "Promote" them to work as a production repo.
</p> 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
<b>Note : </b> The Feature Store Functions are not materializing the data of your features into production, this only for the metadata of the feature repo. Make sure that your ETL processes are executed in the production datasources. 
</p> 

In [130]:
# First, create a new repo to 'ProdRepoOne' to move Features.
ProdLabRepoOne = FeatureStore("ProdLabRepoOne")
# Setup prod repo if it is not setup.
ProdLabRepoOne.setup()

EFS is already setup for the repo ProdLabRepoOne.


In [132]:
# Assume you want to promote FeatureGroup 'MedicalReadings' from 'LabRepoOne' to 'ProdLabRepoOne'.
# First get the FeatureGroup from LabRepoOne. Then apply it to 'ProdLabRepoOne'.
ProdLabRepoOne.apply(fs.get_feature_group('MedicalReadings'))

True

In [133]:
# Let's verify ProdLabRepoOne FeatureGroups.
ProdLabRepoOne.list_feature_groups()

name,description,data_source_name,entity_name,creation_time,modified_time
MedicalReadings,,MedicalReadings,MedicalReadings,2024-10-18 09:55:37.349290,


In [134]:
# Let's verify ProdLabRepoOne Features.
ProdLabRepoOne.list_features()

name,column_name,description,tags,data_type,feature_type,status,creation_time,modified_time,group_name
glucose,glucose,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:55:37.192882,,MedicalReadings
diabetes_pedigree_function,diabetes_pedigree_function,,,FLOAT,CONTINUOUS,ACTIVE,2024-10-18 09:55:37.129559,,MedicalReadings
outcome,outcome,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:55:37.022138,,MedicalReadings
insulin,insulin,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:55:37.171440,,MedicalReadings
blood_pressure,blood_pressure,,,BIGINT,CONTINUOUS,ACTIVE,2024-10-18 09:55:37.085649,,MedicalReadings


In [135]:
# Let's verify ProdLabRepoOne DataSources.
ProdLabRepoOne.list_data_sources()

name,description,timestamp_col_name,source,creation_time,modified_time
MedicalReadings,,record_timestamp,"select * from ""medical_readings""",2024-10-18 09:55:37.306158,


In [136]:
# Let's verify ProdLabRepoOne Entities.
ProdLabRepoOne.list_entities()

name,description,creation_time,modified_time,entity_column
MedicalReadings,,2024-10-18 09:55:37.215982,,patient_id


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'> Disconnect from Vantage </b>

In [137]:
remove_context()

True

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023, 2024. All Rights Reserved
        </div>
    </div>
</footer>