# Account C
### Read ONLY access to Account B inside centralized Offline Store in Account A

#### Prerequisites

In [1]:
#!pip install awswrangler

#### Imports 

In [2]:
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker import get_execution_role
from sagemaker.session import Session
import awswrangler as wr
import pandas as pd
import sagemaker
import logging
import boto3
import time
import s3fs

#### Setup Logger

In [3]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

In [4]:
logger.info(f'[Using SageMaker version: {sagemaker.__version__}]')

[Using SageMaker version: 2.19.0]


#### Essentials 
* Create SageMaker & Feature Store Runtime Clients
* Create a Feature Store Session encapsulating the above clients
* Ensure the Execution Role you use for this notebook has both `AmazonSageMakerFullAccess` and `AmazonSageMakerFeatureStoreAccess` managed policies attached to it. If not, please make sure to attach them to the role before proceeding.

In [5]:
region = boto3.Session().region_name
boto_session = boto3.Session(region_name=region)
s3 = boto_session.resource('s3', region_name=region)
role = get_execution_role()

s3_client = boto3.client('s3', region_name=region)
sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_featurestore.html <br>
API Documentation: https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html

In [6]:
feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime
)

In [7]:
feature_store_session.__dict__

{'_default_bucket': None,
 '_default_bucket_name_override': None,
 's3_resource': None,
 's3_client': None,
 'config': None,
 'boto_session': Session(region_name='us-east-1'),
 '_region_name': 'us-east-1',
 'sagemaker_client': <botocore.client.SageMaker at 0x7fc9e9074d30>,
 'sagemaker_runtime_client': <botocore.client.SageMakerRuntime at 0x7fc9e9026e48>,
 'sagemaker_featurestore_runtime_client': <botocore.client.SageMakerFeatureStoreRuntime at 0x7fc9e9026080>,
 'local_mode': False}

`offline_feature_store_s3_uri` URI below is the location of your offline store

In [None]:
bucket = 'sagemaker-feature-store-account-a'
prefix = '' # account ID of Account B
offline_feature_store_s3_uri = f's3://{bucket}/'
offline_feature_store_s3_uri

#### Load Features 

In [None]:
features = pd.read_csv('features.csv', names=['employee_id', 'name', 'age', 'sex', 'happiness_score'])
features['created_by'] = 'account-c'

In [None]:
features.dtypes

### Ingest Features into SageMaker Feature Store

In [None]:
record_identifier_feature_name = 'employee_id'
event_time_feature_name = 'event_time'

#### Create Feature Group

In [None]:
feature_group_name = 'employees-account-c'
feature_group = FeatureGroup(name=feature_group_name, sagemaker_session=feature_store_session)
feature_group.__dict__

In [None]:
dir(feature_group)

Feature Store supported types are `String`, `Fractional`, and `Integral`. The default type is set to `String`. This means that, if a column in your dataset is not a `float` or `long` type, it will default to `String` in your feature store.

In [None]:
def cast_object_to_string(df):
    """
    Cast object dtype to string. The SageMaker FeatureStore Python SDK will then 
    map the string dtype to String feature type.
    """
    for label in df.columns:
        if df.dtypes[label] == 'object':
            df[label] = df[label].astype('string')

In [None]:
cast_object_to_string(features)

#### Append event_time to the `features` dataframe 

In [None]:
current_time_sec = int(round(time.time()))
features[event_time_feature_name] = pd.Series([current_time_sec]*len(features), dtype='float64')

In [None]:
features.dtypes

In [None]:
features

#### Load Feature Definitions
SageMaker FeatureStore Python SDK will auto-detect the data schema based on input data

In [None]:
feature_group.load_feature_definitions(data_frame=features)

#### Create Feature Group

In [None]:
feature_group.create(
    s3_uri=offline_feature_store_s3_uri,
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name=event_time_feature_name,
    role_arn=role,
    enable_online_store=True
)

In [None]:
feature_group.__dict__

#### Validate if feature group is created

In [None]:
feature_group.describe()

In [None]:
sagemaker_client.list_feature_groups()

In [None]:
#sagemaker_client.delete_feature_group(FeatureGroupName='employees')

#### Put Records into Feature Group (Both Online & Offline)

After the FeatureGroups have been created, we can put data into the FeatureGroups by using the PutRecord API. This API can handle high TPS and is designed to be called by different streams. The data from all of these Put requests is buffered and written to S3 in chunks. The files will be written to the offline store within a few minutes of ingestion. For this example, to accelerate the ingestion process, we are specifying multiple workers to do the job simultaneously. 

In [None]:
%%time

feature_group.ingest(data_frame=features, max_workers=5, wait=True)

#### Get Record from Online Store (Available Immediately)

To confirm that data has been ingested, we can quickly retrieve a record from the online store:

In [None]:
record_identifier = str(101)

featurestore_runtime.get_record(FeatureGroupName='employees-account-c', 
                                RecordIdentifierValueAsString=record_identifier)

#### Get Record from Offline Store
Now let's wait for the data to appear in our offline store before moving forward to creating a dataset. This will take approximately 5 minutes.

In [None]:
account_id = boto3.client('sts').get_caller_identity()['Account']

In [None]:
feature_group_s3_prefix = f'{account_id}/sagemaker/{region}/offline-store/{feature_group_name}/data'
feature_group_s3_prefix

In [None]:
offline_store_contents = None
while offline_store_contents is None:
    objects = s3_client.list_objects(Bucket=bucket, Prefix=feature_group_s3_prefix)
    if 'Contents' in objects and len(objects['Contents']) > 1:
        logger.info('[Features are available in Offline Store!]')
        offline_store_contents = objects['Contents']
    else:
        logger.info('[Waiting for data in Offline Store ...]')
        time.sleep(60)

In [None]:
offline_store_contents

#### Inspect the Parquet Files (Offline Store)

In [None]:
s3_prefix = '/'.join(offline_store_contents[0]['Key'].split('/')[:-1])
s3_uri = f's3://{bucket}/{s3_prefix}'
s3_uri

In [None]:
df = wr.s3.read_parquet(path=s3_uri)

In [None]:
df