# Trust, Governance, and HIPAA compliance with CMLE

Intro...

## Prerequisites

- Developer access for creating a set of API credentials, and Admin access for managing role-based permissions for the credentials
- A set of API credentials assigned to a role with sandbox access but no other permissions and no access to labels (see Credentials and ABAC for CMLE for more details).
    1. In the Developer Console, create a set of Oauth2 credentials
    2. In the Permissions UI, create a Role with permissions for the sandbox you'll use to run through this demo and add the API credentials. 
        - Do not add other permissions or label access at this time
        - We will progressively add permissions to the Role in order to demonstrate access control in action
    3. Update the [config.ini](../conf/config.ini) file with IMS org info, sandbox, environment (stage or prod), and credentals
- Ensure the sandbox you are using contains the synthetic Profile and Experience Event datasets generated by the [Synthetic Data](SyntheticData.ipynb) notebook

### Configure AEPP

In [2]:
import os
from configparser import ConfigParser
import aepp

os.environ["ADOBE_HOME"] = os.path.dirname(os.getcwd())

if "ADOBE_HOME" not in os.environ:
    raise Exception("ADOBE_HOME environment variable needs to be set.")

config = ConfigParser()
config_file = "cmle_gov_config.ini"
config_path = os.path.join(os.environ["ADOBE_HOME"], "conf", config_file)

if not os.path.exists(config_path):
    raise Exception(f"Looking for configuration under {config_path} but config not found, please verify path")

config.read(config_path)

aepp.configure(
  org_id=config.get("Platform", "ims_org_id"),
  tech_id=config.get("Authentication", "tech_acct_id"), 
  secret=config.get("Authentication", "client_secret"),
  scopes=config.get("Authentication", "scopes"),
  client_id=config.get("Authentication", "client_id"),
  environment=config.get("Platform", "environment"),
  sandbox=config.get("Platform", "sandbox_name")
)

Add some utility functions for creating unique resource names and generating UI links to AEP resources

In [3]:
import re
username = os.getlogin()
unique_id = s = re.sub("[^0-9a-zA-Z]+", "_", username)

In [9]:
import urllib
def get_ui_link(tenant_id, resource_type, resource_id):
    environment = config.get("Platform", "environment")
    sandbox_name = config.get("Platform", "sandbox_name")
    if environment == "prod":
        prefix = f"https://experience.adobe.com"
    else:
        prefix = f"https://experience-{environment}.adobe.com"
    return f"{prefix}/#/@{tenant_id}/sname:{sandbox_name}/platform/{resource_type}/{resource_id}"

## 1. Schemas

### 1.1 Test schema permissions

In [155]:
from aepp import schema
schema_conn = schema.Schema()
schema_conn.sandbox
tenant_id = schema_conn.getTenantId()
tenant_id

'cloudmlecosystem'

In [6]:
def getSchemabyTitle(schema_conn: schema.Schema, title: str):
    schemas = schema_conn.getSchemas()
    # Handle case where no schemas have been created
    if 'results' in schemas: 
        return None
    # Filter schemas list for matching title
    match = list(filter(lambda d: d['title'] == title, schemas))
    # XDM schema titles must be unique, so 'match' will have exactly 1 element if a schema
    # with the same title already exists
    if len(match) == 1:
        return match[0]
    else:
        return None

In [10]:
profile_schema_title = f"[CMLE Synthetic Data] Profile Schema (created by {username})"
profile_schema_res = getSchemabyTitle(schema_conn=schema_conn, title=profile_schema_title)
profile_schema_id = profile_schema_res['$id']
profile_schema_altId = profile_schema_res["meta:altId"]
print(f"Profile Schema ID: {profile_schema_id}")
print(f"Profile Schema Alt ID: {profile_schema_altId}")

profile_schema_link = get_ui_link(tenant_id, "schema/mixin/browse", urllib.parse.quote(profile_schema_id, safe="a"))
print(f"View Profile schema in UI: {profile_schema_link}")

Profile Schema ID: https://ns.adobe.com/cloudmlecosystem/schemas/f415f7d964337d192cd4b53a29fd0c07a5eea100031223ec
Profile Schema Alt ID: _cloudmlecosystem.schemas.f415f7d964337d192cd4b53a29fd0c07a5eea100031223ec
View Profile schema in UI: https://experience-stage.adobe.com/#/@cloudmlecosystem/sname:cmle-governance/platform/schema/mixin/browse/https%3A%2F%2Fns.adobe.com%2Fcloudmlecosystem%2Fschemas%2Ff415f7d964337d192cd4b53a29fd0c07a5eea100031223ec


### 1.2 Test FLAC for Schemas

Get Demographic Details field group

In [15]:
fieldgroup_id = 'https://ns.adobe.com/xdm/context/profile-person-details'
schema_conn.getFieldGroup(fieldgroup_id)

{'$id': 'https://ns.adobe.com/xdm/context/profile-person-details',
 'meta:altId': '_xdm.context.profile-person-details',
 'meta:resourceType': 'mixins',
 'version': '1.43.3',
 'title': 'Demographic Details',
 'type': 'object',
 'description': 'Demographic information such as name, gender, and birth date of an individual.',
 'properties': {'person': {'title': 'Person',
   'description': 'An individual actor, contact, or owner.',
   'type': 'object',
   'meta:xdmType': 'object',
   'properties': {'birthDate': {'title': 'Birth date(YYYY-MM-DD)',
     'type': 'string',
     'format': 'date',
     'description': 'The full date a person was born.',
     'meta:xdmType': 'date',
     'meta:xdmField': 'xdm:birthDate'},
    'birthDayAndMonth': {'title': 'Birth date (MM-DD)',
     'type': 'string',
     'pattern': '[0-1][0-9]-[0-9][0-9]',
     'description': "The day and month a person was born, in the format MM-DD. This field should be used when the day and month of a person's birth is known, bu

In [14]:
fieldgroup_id = fieldgroup_res['$id']
print(f"User ID field group ID: {fieldgroup_id}")

# Get link to field group in AEP UI
fieldgroup_link = get_ui_link(tenant_id, "schema/mixin/browse", urllib.parse.quote(fieldgroup_id, safe="a"))
print(f"View field group in UI: {fieldgroup_link}")

User ID field group ID: https://ns.adobe.com/cloudmlecosystem/mixins/7db99ef15ef3c4dbd140abb1ee4bf7f189bd06aac2be5b3c
View field group in UI: https://experience-stage.adobe.com/#/@cloudmlecosystem/sname:cmle-governance/platform/schema/mixin/browse/https%3A%2F%2Fns.adobe.com%2Fcloudmlecosystem%2Fmixins%2F7db99ef15ef3c4dbd140abb1ee4bf7f189bd06aac2be5b3c


Add 'C9' label to `gender` field in Profile Schema (via Demographic Details standard field group)

In [17]:
fields = ['/person/gender']
labels = ['core/C9']
label_desc_data = {
    "@type": "xdm:descriptorLabel",
    "xdm:sourceSchema": fieldgroup_id,
    "xdm:sourceVersion": 1,
    "xdm:sourceProperty": fields,
    "xdm:labels": labels
  }
label_desc_res = schema_conn.createDescriptor(
    descriptorObj = label_desc_data
)
label_desc_res

{'@id': '9da3d0567edc6debcf650e5834a81bcbcb4faaf3a3cfc46c',
 '@type': 'xdm:descriptorLabel',
 'xdm:sourceSchema': 'https://ns.adobe.com/xdm/context/profile-person-details',
 'xdm:sourceVersion': 1,
 'xdm:sourceProperty': ['/person/gender'],
 'imsOrg': '3ADF23C463D98F640A494032@AdobeOrg',
 'version': '1',
 'xdm:labels': ['core/C9'],
 'meta:containerId': '97e9e135-cb1e-49df-a9e1-35cb1e29dfe5',
 'meta:sandboxId': '97e9e135-cb1e-49df-a9e1-35cb1e29dfe5',
 'meta:sandboxType': 'production'}

Get field group and check whether `gender` is returned as one of the properties

In [18]:
schema_conn.getFieldGroup(fieldgroup_id)

{'$id': 'https://ns.adobe.com/xdm/context/profile-person-details',
 'meta:altId': '_xdm.context.profile-person-details',
 'meta:resourceType': 'mixins',
 'version': '1.43.3',
 'title': 'Demographic Details',
 'type': 'object',
 'description': 'Demographic information such as name, gender, and birth date of an individual.',
 'properties': {'person': {'title': 'Person',
   'description': 'An individual actor, contact, or owner.',
   'type': 'object',
   'meta:xdmType': 'object',
   'properties': {'birthDate': {'title': 'Birth date(YYYY-MM-DD)',
     'type': 'string',
     'format': 'date',
     'description': 'The full date a person was born.',
     'meta:xdmType': 'date',
     'meta:xdmField': 'xdm:birthDate'},
    'birthDayAndMonth': {'title': 'Birth date (MM-DD)',
     'type': 'string',
     'pattern': '[0-1][0-9]-[0-9][0-9]',
     'description': "The day and month a person was born, in the format MM-DD. This field should be used when the day and month of a person's birth is known, bu

As can be seen in the GET response above, the SchemaRegistry API does NOT filter out fields to which the API credential does not have access. Labeled fields for which a user does not have access are hidden in the UI, but that was apparently implemented in the UI level rather than at the API level.

Because the SchemaRegistry API does not implement FLAC, the following functions can be helpful in getting a list of allowed fields for a given schema:

In [100]:
def getSchemaFields(conn: schema.Schema, schema_id: str) -> list[str]:
    fields = list(conn.getSchema(schema_id, flat=True, schema_type="xed")['properties'].keys())
    return [f.replace('/', '.') for f in fields]

In [102]:
def getSchemaLabels(conn: schema.Schema, schema_id: str) -> list[dict]:
    # schema descriptors
    schema_desc = schema_conn.getSchema(schemaId=profile_schema_id, desc=True, flat=True)['meta:descriptors']
    # filter for label descriptors
    filter_type = lambda x: x['@type']=='xdm:descriptorLabel'
    schema_label_desc = list(filter(filter_type, schema_desc))
    # extract labeled fields with associated labels
    labeled_fields = [{'field': field.lstrip('/').replace('/','.'), 'labels':d['xdm:labels']} for d in schema_label_desc for field in d['xdm:sourceProperty']]
    return labeled_fields
    

In [171]:
def getAllowedSchemaFields(conn: schema.Schema, schema_id: str, labels: list[str] = None, fields: list[str] = None) -> list[str]:
    """
    Get a (flattened) list of allowed fields of a schema. By default, any labeled fields are excluded,
    but the user may provide a list of labels that should be allowed

    Params:
        - conn: an instance of schema.Schema to be used for connecting to the Schema Registry API
        - schema_id: the id of the schema from which to find the allowed fields
        - labels: a list of DULE labels that should be permitted. By default any labeled fields are excluded from the allowed list
    """
    labeled_fields = getSchemaLabels(conn, schema_id)
    if labels is not None:
        restricted_fields = [field['field'] for field in labeled_fields if any([l not in labels for l in field['labels']])]
    else:
        restricted_fields = restricted_fields = [field['field'] for field in labeled_fields]
    if fields is not None:
        schema_fields = fields
    else:
        schema_fields = getSchemaFields(conn, schema_id)
    allowed_fields = [field for field in schema_fields if field not in restricted_fields]
    return allowed_fields
    

The API credential we are using does not have access to any labeled data. The Profile schema has one labeled field (`person.gender` has a C9 label). When you get the schema fields from the GET schema by id request, you see that `person.gender` is present in the response

In [106]:
schema_fields = getSchemaFields(schema_conn, profile_schema_id)
# filter the list of fields to find 'person.gender'
print([f for f in schema_fields if f=='person.gender'])

['person.gender']


The `getAllowedFields()` function returns the same list of fields but with `person/gender` removed

In [107]:
allowed_fields = getAllowedSchemaFields(schema_conn, profile_schema_id)
# filter the list of fields to find 'person/gender'
print([f for f in allowed_fields if f=='person/gender'])

[]


In [159]:
from typing import Union, IO
from aepp import policy
policy_conn = policy.Policy()

TypeError: Cannot instantiate typing.Union

### 1.3 Audit logs for Schemas

Now we'll check audit logs to confirm that our requests to AEP via the `aepp` are captured in the audit logs

## 2. Queries

Create connection to query service

### 2.1 Basic interactive query of the Profile 

Get ID of the Profile dataset

In [64]:
from aepp import catalog

def getDatasetbyName(conn: catalog.Catalog, name: str):
    datasets = conn.getDataSets()
    match = {k:v for k, v in datasets.items() if v['name'] == name}
    if match:
        return list(match.keys())[0]
    else:
        return None

In [123]:
cat_conn = catalog.Catalog()

profile_dataset_name = f"[CMLE Synthetic Data] Profile dataset (created by {username})"
profile_dataset_res = cat_conn.getDataSets(name=profile_dataset_name)
profile_dataset_id = list(profile_dataset_res.keys())[0]
profile_dataset_id

'64f8b1d4b2464f289ec5cca4'

In [138]:
def getColumns(properties: dict) -> list[str]:
    columns = []
    for column in list(properties.keys()):
        if properties[column]['type']=='object':
            subcolumns = getColumns(properties[column]['properties'])
            subcolumn_dot_paths = [f"{column}.{sub}" for sub in subcolumns]
            columns += subcolumn_dot_paths
        else:
            columns.append(column)
    return columns

In [144]:
def getObservableColumns(dataset_id: str, labels: list[str] = None) -> list[str]:
    conn = catalog.Catalog()
    obs_schema = conn.getDataSetObservableSchema(dataset_id)
    columns = getColumns(obs_schema['observableSchema']['properties'])
    schema_id = conn.getDataSet(dataset_id)[dataset_id]['schemaRef']['id']
    allowed_columns = getAllowedSchemaFields(conn, schema_id, labels=labels, fields=columns)
    return allowed_columns


Get table name and columns from the Catalog service

In [70]:
dataset_info = cat_conn.getDataSet(profile_dataset_id)
table_name = dataset_info[profile_dataset_id]["tags"]["adobe/pqs/table"][0]
table_columns = 

'cmle_synthetic_data_profile_dataset_created_by_jeremypage'

Create connection to Query Service

In [110]:
from aepp import queryservice

qs_conn = queryservice.QueryService().connection()
sandbox_name = config.get("Platform", "sandbox_name")
# qs_conn["dbName"] = f"{sandbox_name}:{table_name}"
qs_cursor = queryservice.InteractiveQuery2(qs_conn)

Create list of fields that are populated in the Profile dataset to use in the SELECT statement

In [117]:
profile_fields = [
    'personID',
    'person.gender',
    'person.name.firstName',
    'person.name.lastName',
    'personalEmail.address',
    'mobilePhone.number',
    'homeAddress.street1',
    'homeAddress.city',
    'homeAddress.state',
    'homeAddress.postalCode',
    'loyalty.loyaltyID',
    'loyalty.tier',
    'loyalty.points',
    'loyalty.joinDate'
]

In [118]:
sample_query = f'''SELECT {', '.join(profile_fields)} FROM {table_name} LIMIT 5'''
sample_query
qs_cursor.query(sample_query)

Unnamed: 0,personID,gender,firstname,lastname,address,number,street1,city,state,postalcode,loyaltyid,tier,points,joindate
0,26104514094657887070828039980073903437,not_specified,Charles,Jacobs,oldest1877@emailsim.io,221-582-7870,687 Lagunitas Trail,Surprise,GA,48050,[5339452],diamond,588914.0,2018-02-17 06:28:26
1,03921550853451792211980811532713968084,male,Laurena,Whitney,heath2028@emailsim.io,284-261-0050,1377 Middle West Grove,Sapulpa,NM,19543,[5701611],gold,204915.0,2015-07-24 03:07:18
2,17929078501172649417669336790984380330,female,Assunta,Kemp,concerned1876@emailsim.io,052-706-8528,355 Mary Crescent,Ashtabula,AR,90627,[5415133],platinum,899506.0,2017-06-25 18:27:07
3,58582052711456172824522025926755953322,male,Celine,Williams,italy2030@emailsim.io,250-278-0953,376 Harris Road,Friendswood,NJ,38748,[5683637],member,605213.0,2013-05-10 06:39:13
4,12382047478963142440491610050245299485,male,Lakeesha,Park,bibliography1875@emailsim.io,580-907-3502,1112 Sanchez Junction,Greeley,AK,4987,[5075781],platinum,692000.0,2007-06-09 11:36:50


This is using the expiring credentials that can be retrieved from the Query Service `/connection_parameters` endpoint. As far as I can tell, there is now way to apply ABAC with those credentials, as you can with a set non-expiring credentials.

The QS functionality in `aepp` is set up to use expiring credentials, so we would have to instruct users to create non-expiring credentials and how to connect with those credentials.

In the meantime, users can DIY it by cleaning the list of columns for the select statement via something like the `getAllowedFields` function above

In [119]:
allowed_select = getAllowedSchemaFields(schema_conn, profile_schema_id, fields=profile_fields)

clean_query = f'''SELECT {', '.join(allowed_select)} FROM {table_name} LIMIT 5'''
qs_cursor.query(clean_query)

Unnamed: 0,personID,firstname,lastname,address,number,street1,city,state,postalcode,loyaltyid,tier,points,joindate
0,26104514094657887070828039980073903437,Charles,Jacobs,oldest1877@emailsim.io,221-582-7870,687 Lagunitas Trail,Surprise,GA,48050,[5339452],diamond,588914.0,2018-02-17 06:28:26
1,03921550853451792211980811532713968084,Laurena,Whitney,heath2028@emailsim.io,284-261-0050,1377 Middle West Grove,Sapulpa,NM,19543,[5701611],gold,204915.0,2015-07-24 03:07:18
2,17929078501172649417669336790984380330,Assunta,Kemp,concerned1876@emailsim.io,052-706-8528,355 Mary Crescent,Ashtabula,AR,90627,[5415133],platinum,899506.0,2017-06-25 18:27:07
3,58582052711456172824522025926755953322,Celine,Williams,italy2030@emailsim.io,250-278-0953,376 Harris Road,Friendswood,NJ,38748,[5683637],member,605213.0,2013-05-10 06:39:13
4,12382047478963142440491610050245299485,Lakeesha,Park,bibliography1875@emailsim.io,580-907-3502,1112 Sanchez Junction,Greeley,AK,4987,[5075781],platinum,692000.0,2007-06-09 11:36:50


TODO: demonstrate usage with non-expiring credentials (must use a prod org because non-expireing credentials do not appear to be supported in stage)

## 2.2 Create table query from Experience Events dataset