# Module 9: Cross Account Access - Admin Account Setup  

---

# Contents

1. [Overview](#Overview)
1. [Setup](#Setup)
1. [Create cross account feature groups](#Create-cross-account-feature-groups)
1. [Ingest data into feature groups](#Ingest-data-into-feature-groups)
1. [Create Resource Share](#Create-Resource-Share)

# Overview

Amazon SageMaker Feature Store now makes it easier to share, discover and access feature groups
across AWS accounts. This new capability promotes collaboration and minimizes duplicate work for
teams involved in ML model and application development, particularly in enterprise environments
with multiple accounts spanning different business units or functions.

With this launch, account owners can grant access to select feature groups by other accounts using
AWS Resource Access Manager (RAM). Once granted access, users of those accounts can
conveniently view all of their feature groups, including the shared ones, through Amazon SageMaker
Studio or SDKs. This enables teams to discover and utilize features developed by other teams,
fostering knowledge sharing and efficiency. 

In this notebook to be run within the admin/owner account, you will learn:

- how to create and ingest features into features groups at the admin/owner account level
- how to create discoverability resource share for existing features groups at the admin/owner account and how to share it with another consumer account using RAM 
- how to grant access permissions to existing features groups at the admin/owner account and how to share these with another consumer account using RAM

Note: It is crucial to ensure proper AWS IAM permissions for using RAM for successful execution.

# Setup

#### IAM Roles

If you are running this notebook in Amazon SageMaker Studio, the IAM role assumed by your Studio user needs permission to perform RAM operations. To provide this permission to the role, do the following:

1. Open the [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/).
2. Select Amazon SageMaker Studio and choose your user name.
3. Under **User summary**, copy just the name part of the execution role ARN 
4. Go to the [IAM console](https://console.aws.amazon.com/iam) and click on **Roles**. 
5. Find the role associated with your SageMaker Studio user
6. Under the Permissions tab, click **Attach policies** and add the following: **AWSResourceAccessManagerFullAccess**
7. Under Trust relationships, click **Edit trust relationship** and add the following JSON,
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sagemaker.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ram.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```
 

#### Imports

Below cells imports necessary libraries, including boto3, which is used for AWS services, and SageMaker components. It also initialize the logging.

In [None]:
import boto3
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.feature_store.inputs import TableFormatEnum
from time import gmtime, strftime, sleep
from random import randint
import pandas as pd
import numpy as np
import subprocess
import sagemaker
import importlib
import logging
import time
import sys
import re

In [None]:
sm_version = sagemaker.__version__
major, minor, patch = sm_version.split('.')
if int(major) < 2 or int(minor) < 125:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.125.0'])
    importlib.reload(sagemaker)

In [None]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
logger.info(f'Using SageMaker version: {sagemaker.__version__}')
logger.info(f'Using Pandas version: {pd.__version__}')

#### Essentials

This cell sets up the AWS environment and initializes key components. The script configures the region, creates a session, and defines a role using get_execution_role() to obtain the SageMaker execution role.

In [None]:
region = boto3.Session().region_name
boto_session = boto3.Session(region_name=region)
sagemaker_session = sagemaker.Session()
sagemaker_client = boto_session.client(service_name="sagemaker")
ram_client = boto3.client("ram")
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket() #This bucket is used for offline storage
prefix = 'sagemaker-feature-store-cross-acc'
logger.info(f'Default S3 bucket and prefix = {default_bucket}/{prefix}')

# Create cross account feature groups

The following cells will create a Pandas DataFrame with sample data. The sample data consists of features like record_id, event_time, feature_11, and feature_12. The feature group's feature definitions are loaded from this DataFrame.

Then a feature group is created using the create method of the feature group.
The parameters for creating the feature group include the S3 URI for offline storage, record identifier name, event time feature name, AWS IAM role ARN, and whether to enable the online store.
The feature group is created and the script waits for its creation to complete.

### Create first cross account feature group

In [None]:
feature_group_name_1 = 'cross-account-fg-1'
feature_group_1 = FeatureGroup(name=feature_group_name_1, sagemaker_session=sagemaker_session)
data_1 = [[1, 187512346.0, 123, 128],
        [2, 187512347.0, 168, 258],
        [3, 187512348.0, 125, 184],
        [1, 187512349.0, 195, 206]]
data_df_1 = pd.DataFrame(data_1, columns=["record_id", "event_time", "feature_11", "feature_12"])
feature_group_1.load_feature_definitions(data_frame=data_df_1)
data_df_1.head()

This function waints until the feature group is created.

In [None]:
def wait_for_feature_group_creation_complete(feature_group):
    status = feature_group.describe().get('FeatureGroupStatus')
    print(f'Initial status: {status}')
    while status == 'Creating':
        logger.info(f'Waiting for feature group: {feature_group.name} to be created ...')
        time.sleep(5)
        status = feature_group.describe().get('FeatureGroupStatus')
    if status != 'Created':
        raise SystemExit(f'Failed to create feature group {feature_group.name}: {status}')
    logger.info(f'FeatureGroup {feature_group.name} was successfully created.')

In [None]:
feature_group_1.create(s3_uri=f's3://{default_bucket}/{prefix}', 
                               record_identifier_name='record_id', 
                               event_time_feature_name='event_time', 
                               role_arn=role, 
                               enable_online_store=True
                              )

In [None]:
wait_for_feature_group_creation_complete(feature_group_1)

In [None]:
# Retreive the FG ARN
fg_desc_1 = feature_group_1.describe()
fg_arn_1 = fg_desc_1['FeatureGroupArn']

### Create second cross account feature group

In [None]:
feature_group_name_2 = 'cross-account-fg-2'
feature_group_2 = FeatureGroup(name=feature_group_name_2, sagemaker_session=sagemaker_session)
data_2 = [[1, 187512346.0, 321, 821],
        [2, 187512347.0, 861, 852],
        [3, 187512348.0, 521, 481],
        [1, 187512349.0, 591, 602]]
data_df_2 = pd.DataFrame(data_2, columns=["record_id", "event_time", "feature_21", "feature_22"])
feature_group_2.load_feature_definitions(data_frame=data_df_2)
data_df_2.head()

In [None]:
feature_group_2.create(s3_uri=f's3://{default_bucket}/{prefix}', 
                               record_identifier_name='record_id', 
                               event_time_feature_name='event_time', 
                               role_arn=role, 
                               enable_online_store=True
                              )

In [None]:
wait_for_feature_group_creation_complete(feature_group_2)

In [None]:
# Retreive the FG ARN
fg_desc_2 = feature_group_2.describe()
fg_arn_2 = fg_desc_2['FeatureGroupArn']

# Ingest data into feature groups

Data is ingested into the feature groups using the ingest method.
The code attempts to ingest data into the feature groups with a specified number of processes.

In [None]:
%%time

logger.info(f'Ingesting data into feature group: {feature_group_1.name} ...')
feature_group_1.ingest(data_frame=data_df_1, max_processes=4, wait=True)
logger.info(f'{len(data_df_1)} records ingested into feature group: {feature_group_1.name}')

In [None]:
%%time

logger.info(f'Ingesting data into feature group: {feature_group_2.name} ...')
feature_group_2.ingest(data_frame=data_df_2, max_processes=4, wait=True)
logger.info(f'{len(data_df_2)} records ingested into feature group: {feature_group_2.name}')

# Create Resource Share

### Discoverability resource share

Discoverability means being able to see feature group names and metadata. When you grant
discoverability permission, all feature group entities in the account that you share from
(resource owner account) become discoverable by the accounts that you are sharing with
(resource consumer account). 

In this cell, the code lists SageMaker resource catalogs and searches for a specific one named "DefaultFeatureGroupCatalog."
If found, it attempts to share the default catalog with another AWS account using AWS RAM.
The shared resource catalog is intended to enable cross-account searching of SageMaker Feature Groups.

In [None]:
# List resource catalogs
list_catalogs_response = sagemaker_client.list_resource_catalogs()
logger.info(f'ResourceCatalogs: {list_catalogs_response} ...')
find_catalog = "DefaultFeatureGroupCatalog"
res = [
    sub["ResourceCatalogArn"]
    for sub in list_catalogs_response["ResourceCatalogs"]
    if re.search(find_catalog, sub["ResourceCatalogArn"])
]

if len(res) == 0:
    print("No default caratlog found, please contact service team")
    exit

# Share DefaultFeatureGroupCatalog with other account
disc_permission_arn = "arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite"
# Please fill in your consumer account ID on the following line
consumer_account = 'CONS_ACC_ID'
ram_share_catalogue_name = 'cross-account-catalog-ram-share'

response = ram_client.create_resource_share(
    name=ram_share_catalogue_name,
    resourceArns=[
        res[0],
    ],
    principals=[
        consumer_account,
    ],
    # There is only one permission which is sagemaker:search
    permissionArns = [disc_permission_arn]
)
logger.info(f'Resource share {ram_share_catalogue_name} created.')

### Access permissions

When you grant an access permission, you do so at a feature group resource level (not at
account level). This gives you more granular control over granting access to data. The type of
access permissions that can be granted are: read-only, read-write, and admin. 

This cell deals with sharing the feature group's resources (in this case, the Feature Group itself) with another AWS account using AWS Resource Access Manager (RAM).
It specifies the AWS account ID to share with, creates a RAM share name, and specifies a permission ARN that grants read and write access to the SageMaker Feature Group.
The code creates a RAM resource share and shares the feature group in ReadWrite mode with the specified account.

In [None]:
# AWSRAMPermissionSageMakerFeatureGroupReadWrite
# AWSRAMPermissionSageMakerFeatureGroupReadOnly
# AWSRAMPermissionSageMakerFeatureGroupAdmin
write_permission_arn = "arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite"
ram_share_name_1 = feature_group_name_1 + '-ram-share'

response = ram_client.create_resource_share(
    name = ram_share_name_1,
    resourceArns=[
        fg_arn_1
    ],
    principals=[
        consumer_account,
    ],
    permissionArns=[write_permission_arn]
)
logger.info(f'Resource share {ram_share_name_1} created.')