# Module 2: Search and Discovery using Feature-Level Metadata

---

## Contents

1. [Background](#Background)
1. [Setup](#Setup)
1. [Feature Level Metadata](#Feature-level-Metadata)
1. [Search and Discovery](#Search-and-Discovery)


# Background

In this notebook, you will learn:
* how to add new features to an existing feature group
* how to add feature-level metadata (description and key/value pairs) to improve search and discovery of features; in this section you will also learn how to search and discovery features using Amazon SageMaker Studio and the API/SDK

**Note:** The feature groups created in this notebook will be used in the upcoming modules.


# Setup

#### Imports

In [None]:
from sagemaker.feature_store.feature_group import FeatureGroup
from time import gmtime, strftime, sleep
from random import randint
import pandas as pd
import numpy as np
import subprocess
import sagemaker
import importlib
import logging
import time
import sys
import boto3

In [None]:
sm_version = sagemaker.__version__
major, minor, patch = sm_version.split('.')
if int(major) < 2 or int(minor) < 125:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.125.0'])
    importlib.reload(sagemaker)

In [None]:
if boto3.__version__ < '1.24.23':
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'boto3==1.24.23'])
    importlib.reload(boto3)

In [None]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

In [None]:
logger.info(f'Using SageMaker version: {sagemaker.__version__}')
logger.info(f'Using Boto3 version: {boto3.__version__}')

#### Essentials

In [None]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()
logger.info(f'Default S3 bucket = {default_bucket}')
prefix = 'sagemaker-feature-store'
region = sagemaker_session.boto_region_name

boto_session = boto3.Session(region_name=region)
sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)


# Feature Level Metadata

With Amazon SageMaker Feature Store, customers have always been able to add metadata at the feature group-level. Data scientists who want the ability to search and discover new features for their models now have the ability to search for information at the feature level. For example, the information can include a description of the feature, the date it was last modified, its original data source, certain metrics, or the level of sensitivity.

The diagram below illustrates the architecture relationships between feature groups, features, and associated metadata. 

![Feature Level Metadata](../images/feature_level_metadata.png "Feature Level Metadata") 

In this section you will be adding metadata to a feature, including a description and a set of key/value pairs.

#### Retrieve endpoint and Feature Store group names from previous modules

In [None]:
%store -r customers_feature_group_name

print(customers_feature_group_name)

Describe FeatureGroup before we update feature metadata

In [None]:
sagemaker_client.describe_feature_group(
        FeatureGroupName=customers_feature_group_name
    )

#### Adding a description to an existing feature `is_married` from `customers` feature group

In [None]:
sagemaker_client.update_feature_metadata(
    FeatureGroupName=customers_feature_group_name,
    FeatureName="is_married",
    Description="boolean value whether the customer is married or not"
)

Describe that feature and check the description is updated

In [None]:
sagemaker_client.describe_feature_metadata(
    FeatureGroupName=customers_feature_group_name,
    FeatureName="is_married" 
)

#### Adding parameters to a feature

In [None]:
sagemaker_client.update_feature_metadata(
    FeatureGroupName=customers_feature_group_name,
    FeatureName="is_married",
    ParameterAdditions=[
        {"Key": "team", "Value": "mlops"},
        {"Key": "org", "Value": "customer product team"},
    ]
)

Describe that feature and see if parameters are updated

In [None]:
sagemaker_client.describe_feature_metadata(
    FeatureGroupName=customers_feature_group_name,
    FeatureName="is_married"
)

# Search and Discovery

Users can easily search and query features using Amazon SageMaker Studio. With SageMaker search and discovery capabilities, users can immediately search results using a simple type-ahead of a few characters.

* Users can access the *Feature Catalog* tab and observe features across feature groups.  The table includes the feature name, type, description, parameters, date of creation and associated feature group’s name.
* Users can directly use the type-ahead functionality to immediately return search results.
* Users have the flexibility to use different types of filter options: All, Feature name, Description, or Parameters.  Note: All will return all features where either Feature name, Description, or Parameters match the search criteria. 
* Users can narrow down the search further by specifying a data range using the Created from and Created to fields and specific parameters using the Search parameter key and Search parameter value. 

The following picture demonstrates a user searching for a feature `is_married` and adding a sensitivity level as key/value pair for the feature.


![Search Update metadata](../images/search_update_metadata.gif "Feature Level Metadata") 

#### Search for Features using Boto3

In [None]:
# Search functions that returns features where either feature name, description or parameters (key/value pairs) match the search criteria
def search_features_using_string(search_string):
    response = sagemaker_client.search(
        Resource= "FeatureMetadata",
        SearchExpression={
            'Filters': [
                {
                    'Name': 'FeatureName',
                    'Operator': 'Contains',
                    'Value': search_string
                },
                {
                    'Name': 'Description',
                    'Operator': 'Contains',
                    'Value': search_string
                },
                {
                    'Name': 'AllParameters',
                    'Operator': 'Contains',
                    'Value': search_string
                }
            ],
            "Operator": "Or"
        },
    )
    # Displaying results in a DataFrame
    df=pd.json_normalize(response['Results'], max_level=1)
    df.columns = df.columns.map(lambda col: col.split(".")[1])
    df=df.drop('FeatureGroupArn', axis=1)
    return df

# Searching for Feature which contains "married" string in either feature name, description, or parameters
search_string="married"
search_features_using_string(search_string)