## Module 9: Access control to Feature Store using Tags through Identity-based policy



Tagging is a useful technique for adopting cloud management and governance techniques. However, it can be difficult to ensure proper tagging behavior if it isn't enforced through proper mechanisms. To meet security and compliance requirements, you may need fine-grained control over the tags that are applied to create a feature store. It is critical for customers that are required to audit access to feature data and ensure the right level of security is in place. In this notebook, we have achieved  this using AWS Identity and Access Management (IAM) policies.In our example, we assume the customer has identified some tags that must be present to help ensure proper security and governance. We show a way to enforce such a requirement.

Identity-based policies decide whether someone can create, access, or delete AWS resources in an AWS account. By specifying the `RequestTag` condition key in the IAM policy, you can control access to Sagemaker API calls. For example, you require that a feature store created by an IAM user have a tag with the key environment and values `dev, production, or staging`. You will create an IAM policy with a `RequestTag` condition that denies access to the CreateFeatureGroup action until the tag value is in dev, production, or staging and attach it to the Sagemaker execution role. Now if an IAM user tries to create a feature store for customer data without any tags or tags other than expected values, the request will get denied. 


![FS_Diagram](../../images/fs_security_iam_policy_governance_tag.png)

#### Imports

In [22]:
import sagemaker
import boto3
import sys
import pandas as pd
import numpy as np
import io
import time
from time import gmtime, strftime, sleep
import json
import logging
from sagemaker.session import Session
from sagemaker import get_execution_role

In [23]:
sm_version = sagemaker.__version__
major, minor, patch = sm_version.split('.')
if int(major) < 2 or int(minor) < 125:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.125.0'])
    importlib.reload(sagemaker)

In [24]:
logger = logging.getLogger('__name__')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

#### Initialize default parameters


In [25]:
prefix = 'sagemaker-feature-store'
role = get_execution_role()

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
s3_bucket_name = sagemaker_session.default_bucket()

#### Load and explore dataset
In this notebook, we will be using the dataset created in Module 1/Prepare datasets of the workshop.

In [26]:
customer_data = pd.read_csv('../.././data/transformed/customers.csv')

In [27]:
customer_data.head()

Unnamed: 0,customer_id,sex,is_married,event_time,age_18-29,age_30-39,age_40-49,age_50-59,age_60-69,age_70-plus,n_days_active
0,C1,0,0,2022-11-03T11:06:49.595Z,0,0,0,1,0,0,0.026027
1,C2,1,0,2022-11-03T11:06:49.597Z,1,0,0,0,0,0,0.077397
2,C3,0,1,2022-11-03T11:06:49.599Z,0,0,0,0,1,0,0.821233
3,C4,1,1,2022-11-03T11:06:49.600Z,0,0,0,1,0,0,0.887671
4,C5,0,1,2022-11-03T11:06:49.602Z,0,1,0,0,0,0,0.265753


### IAM policy update
Add the below policy to the IAM role (SageMaker Execution Role) used for this notebook as inline policy. It will allow you to modify the IAM policy programatically in the following sections.
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:CreatePolicy",
                "iam:DetachRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:DeletePolicy",
                "iam:AttachRolePolicy"
            ],
            "Resource": "*"
        }
    ]
}
```

### Create and add IAM policies based on tags to the existing role


Define IAM policies to restrict feature store creation without specific tags

In [28]:
"""creation of iam policy to restrict feature store creation without specific tags."""
    
_iam_tag_policy = {

                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Deny",
                                "Action": "sagemaker:CreateFeatureGroup",
                                "Resource": "arn:aws:sagemaker:*:*:feature-group/*",
                                "Condition": {
                                    "StringNotEquals": {
                                        "aws:RequestTag/environment": [
                                            "dev",
                                            "production",
                                            "staging"
                                        ]
                                    }
                                }
                            }
                        ]
                    }

Create IAM policies and attach them to the IAM role (SageMaker Execution Role)

In [12]:
# Attach IAM policy to restrict SageMaker Execution Role
timestamp = int(time.time())
iam_client = boto3.client('iam')
role_name = role.split('/')[-1] # get the role name from role arn

policy_res = iam_client.create_policy(
    PolicyName=f'Amazon_SageMaker_Tag_Policy_{timestamp}',
    PolicyDocument=json.dumps(_iam_tag_policy)
)
policy_arn = policy_res['Policy']['Arn']

policy_attach_res = iam_client.attach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn
)
## IAM is eventually consistent so added sleep to allow it to propogate to other regions
time.sleep(120) 

Create a feature group


In [29]:
customers_feature_group_name = "customers-feature-group-" + strftime("%d-%H-%M-%S", gmtime())

In [30]:
#Instantiate a FeatureGroup object for customers data_data
from sagemaker.feature_store.feature_group import FeatureGroup

customers_feature_group = FeatureGroup(
    name=customers_feature_group_name, sagemaker_session=sagemaker_session
)

In [31]:
current_time_sec = int(round(time.time()))
record_identifier_feature_name = "customer_id"

In [32]:
#Append EventTime feature to your data frame. This parameter is required, and time stamps each data point
customer_data["EventTime"] = pd.Series([current_time_sec] * len(customer_data), dtype="float64")

In [33]:
#Load feature definitions to your feature group.
customers_feature_group.load_feature_definitions(data_frame=customer_data)

[FeatureDefinition(feature_name='customer_id', feature_type=<FeatureTypeEnum.STRING: 'String'>),
 FeatureDefinition(feature_name='sex', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='is_married', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='event_time', feature_type=<FeatureTypeEnum.STRING: 'String'>),
 FeatureDefinition(feature_name='age_18-29', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='age_30-39', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='age_40-49', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='age_50-59', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='age_60-69', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition(feature_name='age_70-plus', feature_type=<FeatureTypeEnum.INTEGRAL: 'Integral'>),
 FeatureDefinition

### 1. Test Deny when creating a feature store without a tag

You should receive an error message in this scenario. `An error occurred (AccessDeniedException) when calling the CreateFeatureGroup operation: User: arn:aws:sts::*ACCOUNT-ID*:assumed-role/AmazonSageMaker-ExecutionRole-*ID*/SageMaker is not authorized to perform: sagemaker:CreateFeatureGroup on resource: arn:aws:sagemaker:*REGION*:*ACCOUNT-ID*:feature-group/customers-feature-group-*TIMESTAMP* with an explicit deny in an identity-based policy`

In [34]:
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True
)

ClientError: An error occurred (AccessDeniedException) when calling the CreateFeatureGroup operation: User: arn:aws:sts::227246955871:assumed-role/AmazonSageMaker-ExecutionRole-20220810T165739/SageMaker is not authorized to perform: sagemaker:CreateFeatureGroup on resource: arn:aws:sagemaker:us-west-2:227246955871:feature-group/customers-feature-group-09-14-17-58 with an explicit deny in an identity-based policy

### 2. Test Deny when creating a feature store with an incorrect tag value

You should receive an error message in this scenario. `An error occurred (AccessDeniedException) when calling the CreateFeatureGroup operation: User: arn:aws:sts::*ACCOUNT-ID*:assumed-role/AmazonSageMaker-ExecutionRole-*ID*/SageMaker is not authorized to perform: sagemaker:CreateFeatureGroup on resource: arn:aws:sagemaker:*REGION*:*ACCOUNT-ID*:feature-group/customers-feature-group-*TIMESTAMP* with an explicit deny in an identity-based policy`

In [35]:
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    tags=[{'Key':'environment','Value':'qa'}]
)

ClientError: An error occurred (AccessDeniedException) when calling the CreateFeatureGroup operation: User: arn:aws:sts::227246955871:assumed-role/AmazonSageMaker-ExecutionRole-20220810T165739/SageMaker is not authorized to perform: sagemaker:CreateFeatureGroup on resource: arn:aws:sagemaker:us-west-2:227246955871:feature-group/customers-feature-group-09-14-17-58 with an explicit deny in an identity-based policy

### 3. Test allow when creating a feature store with a allowed tag value

In [36]:
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
    tags=[{'Key':'environment','Value':'staging'}]
)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-west-2:227246955871:feature-group/customers-feature-group-09-14-17-58',
 'ResponseMetadata': {'RequestId': '42f2a810-5660-4103-afcc-1459dcbfb901',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '42f2a810-5660-4103-afcc-1459dcbfb901',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '112',
   'date': 'Thu, 09 Feb 2023 14:19:15 GMT'},
  'RetryAttempts': 0}}

### Clean up

First delete the tag based granular access IAM policy 

In [37]:
# Detach iam policy added to the SageMaker execution role
policy_detach_res = iam_client.detach_role_policy(
    RoleName=role_name,
    PolicyArn=policy_arn
)

# Delete the IAM policy
delete_policy_res = iam_client.delete_policy(
    PolicyArn=policy_arn
)

Then delete the feature store

In [38]:
feature_group_name = customers_feature_group.describe()['FeatureGroupName']
index = 1
while index < 10:
    if customers_feature_group.describe()['FeatureGroupStatus'] == 'Created':
       customers_feature_group.delete()
       print(f"{feature_group_name} deleted.")
       index = 10
    elif customers_feature_group.describe()['FeatureGroupStatus'] == 'Creating':
        print(f"{feature_group_name} is still in Creating status. Will try delete operation after 10 seconds.")
        time.sleep(10)
        index = index + 1

customers-feature-group-09-14-17-58 deleted.


Finally delete the inline policy you added to the IAM role (SageMaker Execution Role) earlier in this notebook.
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:CreatePolicy",
                "iam:DetachRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:DeletePolicy",
                "iam:AttachRolePolicy"
            ],
            "Resource": "*"
        }
    ]
}
```