## Implementing granular access control to Offline Feature Store and Feature Groups using AWS Lake Formation

In this noteboook, we will show you how to implement fine grained access to control using AWS Lake formation. The configurations will be performed by AWS Lake Formation admin.

##  A. Lake Formation Admin User

In order to perform the steps, you need to create a **Lake Formation Admin user** in IAM (Identify and Access Management) and sign in as that Admin user. Detailed instructions can be found here https://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-setup.html#create-data-lake-adminl.

## B. Setup AWS Lake Formation

In this section, we will show you how to implement the access control in AWS Lake Formation. 

1. Register the Offline Feature Store in Lake Formation.
2. Create the required data filters for fined grained access control
3. Grant feature groups (tables) and features (columns) permissions

In order to perform the steps, you need to create a Lake Formation Admin user in IAM (Identify and Access Management) and sign in as that Admin user. Detailed instructions can be found here: https://docs.aws.amazon.com/lake-formation/latest/dg/getting-started-setup.html#create-data-lake-adminl

### 1) Register Amazon SageMaker Offline Feature Store in Lake Formation

To start using Lake Formation permissions with your existing Feature Store databases and tables, you must revoke the Super permission from the **IAMAllowedPrincipals** group on the database in Lake Formation.

* Sign in to the console as a Lake Formation administrator.
* In the navigation pane, under **Data Catalog**, choose **Databases**.
* Select database **sagemaker_featurestore**, which is the database associated to the offline feature store. Because Feature Store automatically builds an AWS Glue Data Catalog when you create the feature groups, the offline feature store is visible as a database in AWS Lake Formation.



![Offline Feature Store Database](../images/fs_lf_database.png "Granular Access using Lake Formation")

* On the **Actions** menu, choose **Edit**.
* On the *Edit database* page, if you want Lake Formation permissions to work for newly created feature groups too, then clear **Use only IAM access control for new tables in this database**, and then choose **Save**.
* Back on the **Databases** page, ensure that the sagemaker_featurestore database is still selected, and select **View permissions** under the *Actions* button, select IAMAllowedPrincipals group and click on Revoke button.

![Offline Feature Store Database](../images/fs_lf_db_view_permissions.png "Granular Access using Lake Formation")

![Offline Feature Store Database](../images/fs_lf_db_iamallowedprincipals.png "Granular Access using Lake Formation")

Similarly, you need to perform these steps for all feature group tables that are associated to your offline feature store. 

* In the navigation pane, under **Data Catalog**, choose **Tables**.
* Select table with your [feature group name].
* Select **View permissions** under the **Actions** button, select **IAMAllowedPrincipals** group and click on **Revoke** button.



In [None]:
# Retrieve FG names (when running previous modules)
%store -r customers_feature_group_name
%store -r orders_feature_group_name
%store -r products_feature_group_name


In order to switch the offline feature store to the Lake Formation permission model, you need to turn on Lake Formation Permissions for the Amazon S3 Location of the Offline Feature Store. For this, you have to register the Amazon S3 location

* In the navigation pane, under **Register and Ingest**, choose **Data lake locations**.
* Click on *Register location*.  
* Select the location of the offline feature store in Amazon S3 for the **Amazon S3 path**; the location is the S3Uri that was provided in the FeatureGroup’s offline store configuration and can be found in the DescribeFeatureGroup API's ResolvedOutputS3Uri field.
* Use the default AWSServiceRoleForLakeFormationDataAccess IAM role and click **Register location**.

Run below code to retrieve the S3 location

In [None]:
# Run this code to retrieve the S3 location
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker import get_execution_role
import sagemaker
import logging
import boto3
import sys
sys.path.append('..')
from utilities import Utils

sagemaker_session = sagemaker.Session()
account_id = sagemaker_session.account_id()
role = sagemaker.get_execution_role()
region = sagemaker_session.boto_region_name
default_bucket = sagemaker_session.default_bucket()
s3_client = boto3.client('s3', region_name=region)
query_results= 'sagemaker-featurestore-workshop'
prefix = 'sagemaker-feature-store'

boto_session = boto3.Session(region_name=region)
sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

feature_store_session = sagemaker.Session(boto_session=boto_session, 
                                          sagemaker_client=sagemaker_client, 
                                          sagemaker_featurestore_runtime_client=featurestore_runtime)

s3_uri = Utils.describe_feature_group(orders_feature_group_name)['OfflineStoreConfig']['S3StorageConfig']['S3Uri']
print(s3_uri)

![Offline Feature Store Database](../images/fs_lf_db_register_location.png "Granular Access using Lake Formation")

### 2) Create granular access control using  Lake Formation

You can implement row-level and cell-level security by creating **data filters**. You select a data filter when you grant the SELECT Lake Formation permission on tables.  In this case, we will use this capability to implement a set of filters that limit access to feature groups and features within a feature group.

To create a new data filter, in the navigation pane, under **Data Catalog**, choose **Data filters and** then click on **Create new filter** button. See below how to configure two different filters which allow granular access control at the row and cell-levels.

* **Row-Level Security.** When you specify the "all columns" wildcard and provide a row filter expression, you are establishing row-level security (row filtering) only.  In this example we will create a filter which limits access to a data scientist to only records in orders feature group based on the value of the feature customer_id ='C7782'


![Offline Feature Store Database](../images/fs_lf_create_data_filter.png "Granular Access using Lake Formation")

* **Cell-Level Security.** When you include or exclude specific columns and also provide a row filter expression, you are establishing cell-level security (cell filtering).  In this example we will create a filter which limits access to a data scientist to certain features of a feature group (we exclude sex, is_married) and subset of the records in customers feature group based on the value of the feature customer_id ='C3126').

![Offline Feature Store Database](../images/fs_lf_create_data_filter_2.png "Granular Access using Lake Formation")

Below is a screenshot of the data filters created.

![Offline Feature Store Database](../images/fs_lf_create_data_filter_3.png "Granular Access using Lake Formation")

### 3) Grant feature groups (tables) and features (columns) permissions

In this section, you will grant the granular access control and permissions defined in Lake Formation to a SageMaker user by assigning the data filter to the SageMaker execution role associated to the user who originally created the feature groups. The Sagemaker execution role is created as part of the Sagemaker studio domain setup (https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-iam.html) and by default starts with AmazonSageMaker-ExecutionRole-*. You need to give this role permissions on Lake Formation APIs: GetDataAccess, StartQueryPlanning, GetQueryState, GetWorkUnits, GetWorkUnitResults and Glue APIs: GetTables, GetDatabases in IAM in order for it to be able to access the data. Create the following policy in IAM, name the policy LakeFormationDataAccess and attach it to SageMaker execution role.  You also need to attach AmazonAthenaFullAccess policy to access Athena.

{
    "Version": "2012-10-17",
    "Statement": [
    {
        "Sid": "LakeFormationDataAccess",
        "Effect": "Allow",
        "Action": [
            "lakeformation:GetDataAccess",
            "lakeformation:StartQueryPlanning",
            "lakeformation:GetQueryState",
            "lakeformation:GetWorkUnits",
            "lakeformation:GetWorkUnitResults",
            "glue:GetTables",
            "glue:GetDatabases"
        ],
        "Resource": "*"
    }
    ]
}


Next, you need to *GRANT* access to Feature Store database and specific feature group table to SageMaker execution role and assign it one of the data filters created previously.  The screenshot below demonstrates how to grant permissions with data filter for row level access as an example to a SageMaker execution role.

![Offline Feature Store Database](../images/fs_lf_grant_permission_1.png "Granular Access using Lake Formation")

![Offline Feature Store Database](../images/fs_lf_grant_permission_2.png "Granular Access using Lake Formation")

Similarly, you can *GRANT* permissions with the data filter created for *cell level* access to the Sagemaker execution role.