# <p style="color:dodgerblue">01 Create DataBrew Project</p>
This notebook creates an example of using the DataBrew service
It uses an Excel Spreadhseet as its data source of randomly generated Supply Chain Data
  
(Jupyter Notebook developed with Kernel Python 3.11.6)
<hr style="border:1px dotted; color:floralwhite">

# <span style="color:deeppink">GETTING STARTED</span>
# Local client requirements for this Lab (macOS)
- *See <span style="color:gold">Appendix - Jupyter Install Requirements (macOS)</span> at the bottom of this lab to install macOS requirements, windows requirements will be similar, apart from Homebrew.*  
- These requirements are generic and allow you to run Python notebooks, use Boto3, etc - they are simply to get your local environment in a state that can support Jupyter Notebooks and not specific to databrew

<hr style="border:1px dotted">
<hr style="border:1px dotted;color:greenyellow">

# <p style="color:DarkTurquoise">DataBrew Prerequisites</p>
### <p style="color:DarkTurquoise">NOTE we are using ap-southeast-2</p>
No other architecture prerequisites required.

<hr style="border:1px dotted;color:DarkTurquoise">
<hr style="border:1px dotted;color:greenyellow">

# <p style="color:greenyellow">Create backend architecture needed to support DataBrew</p>

# <p style="color:greenyellow">Lets Create Clients and Variables</p>
- We do these setup cells here because we can then use the vars and clients to clean up resources later without having to run multiple cells if we lose the kernel

In [None]:
import boto3
import json
import random

# region - we use us-east-1 as Bedrock is limited in other reasons
myRegion='ap-southeast-2'
myAccountNumber = boto3.client("sts").get_caller_identity()["Account"]

# set up a boto3 session using a profile that is able to create services in the region
# this is typically a developer profile or deployment profile
# we DO NOT need the amplify profile we crerated above yet - that is used in stack 02
sessionBoto3 = boto3.Session(profile_name="default", region_name=myRegion)

# names for services we will create below
# s3 bucket - MUST BE A UNIQUE NAME so we randomise a couple of numbers to be sure
myBucketDataBrew='doit-databrew-bucket-' + str(random.randint(0, 1000)) + '-' + str(random.randint(0, 1000))

# iam
myRoleDataBrew="doit-databrew-service-role"
myPolicyDataBrew="doit-databrew-service-policy"
myRoleDataBrewARN='RETRIEVED FROM ROLE BELOW ONCE CREATED'

# databrew
myDataSet="doit-databrew-dataset"
myDataSetARN="RETRIEVED FROM OBJECT BELOW ONCE CREATED"

myProfileJob="doit-databrew-profile-job"
myRecipeJob="doit-databrew-recipe-job"

myRuleSet="doit-databrew-ruleset"
myRuleSetARN="RETRIEVED FROM OBJECT BELOW ONCE CREATED"

myRecipe1="doit-databrew-recipe1"
myRecipe2="doit-databrew-recipe2"
myRecipe1Version="RETRIEVED FROM OBJECT BELOW ONCE CREATED"
myRecipe2Version="RETRIEVED FROM OBJECT BELOW ONCE CREATED"

myProject="doit-databrew-project"

print ('Done! Move to the next cell ->')

In [None]:
# local client path for resources
# these are resources required by this lab and will be later uploaded to the cloud
myLocalPathForResources='/Users/simondavies/Documents/GitHub/labs/glue/databrew/Resources/'
print ('Done! Move to the next cell ->')

- create required clients to AWS SDK for Python (Boto3) to create, configure, and manage AWS services
- https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

In [None]:
# s3
s3 = sessionBoto3.client('s3')

# databrew
databrew = sessionBoto3.client('databrew')

# iam
iam = sessionBoto3.client('iam')

print ('Done! Move to the next cell ->')

- tags for all services that are created - you can never have too many tags!
  - make sure you have a tagging policy in place
  - https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_tag-policies.html

In [None]:
# define tags added to all services we create
# best practice tagging of all resources should be used at all times
myTags = [
    {"Key": "env", "Value": "non_prod"},
    {"Key": "owner", "Value": "doit_lab"},
    {"Key": "project", "Value": "doit_databrew-lab"},
    {"Key": "author", "Value": "simon"},
]
myTagsDct = {
    "env": "non_prod",
    "owner": "doit_lab",
    "project": "doit_databrew-lab",
    "author": "simon",
}

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:greenyellow">
<hr style="border:1px dotted;color:crimson">

# <p style="color:crimson">Create S3 Bucket</p>
- defaults used, will use sse-s3 encryption and block public access
- bucket is used to upload the data file we need as a resource for the databrew dataset
- also used to store output from databrew job runs

In [None]:
# create bucket
# don't change this region, the condition is just checking how to create the bucket based on the region we're working in
if (myRegion != 'us-east-1'):
    s3.create_bucket(
        Bucket=myBucketDataBrew, CreateBucketConfiguration={"LocationConstraint": myRegion}
    )
else:
    s3.create_bucket(
        Bucket=myBucketDataBrew
    )

s3.put_bucket_tagging(Bucket=myBucketDataBrew, Tagging={"TagSet": myTags})

# create a "folder" - really keys as S3 is flat
s3.put_object(Bucket=myBucketDataBrew, Key="datasource/")
s3.put_object(Bucket=myBucketDataBrew, Key="output/")

# define the policy for databrew access
# this allows the databrew service to access S3
bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowdatabrewToWriteToS3",
            "Effect": "Allow",
            "Principal": {
                "Service": "databrew.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": f"arn:aws:s3:::{myBucketDataBrew}/*"
        }
    ]
}
bucket_policy = json.dumps(bucket_policy)
s3.put_bucket_policy(Bucket=myBucketDataBrew, Policy=bucket_policy)

print ('Done! Move to the next cell ->')

- upload resource files to s3 that will be used as a datasource for databrew

In [None]:
# Upload each file to the S3 bucket
myDataSourceFile='datasource/Supply_Chain_Transactions_Updated.xlsx'
myDataSourceFileWithPII='datasource/Supply_Chain_Transactions_WithPII.xlsx'
files = [
    {
        's3key': myDataSourceFile,
        'localpath': '{}Supply_Chain_Transactions_Updated.xlsx'.format(myLocalPathForResources)
    },
    {
        's3key': myDataSourceFileWithPII,
        'localpath': '{}Supply_Chain_Transactions_WithPII.xlsx'.format(myLocalPathForResources)
    }
]

for file in files:
    print ('uploading: {}'.format(file['s3key']))
    s3.upload_file(file['localpath'], myBucketDataBrew, file['s3key'], ExtraArgs={'StorageClass': 'STANDARD'})
    print ('uploaded: {}'.format(file['s3key']))

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:crimson">
<hr style="border:1px dotted;color:orchid">

# <p style="color:orchid">Create IAM</p>
- roles and policies that allow services to interact with other services
- https://docs.aws.amazon.com/databrew/latest/dg/setting-up-iam-policies-for-databrew.html

In [None]:
# myRoleDataBrew
# trust policy for the role
roleTrust = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "databrew.amazonaws.com"},
            "Action": "sts:AssumeRole",
        }
    ],
}

# define inline policy
policyJson = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
            ],
            "Resource": "*",
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{myBucketDataBrew}",
                f"arn:aws:s3:::{myBucketDataBrew}/*"
            ]
        }
    ],
}

# create inline policy
policy = iam.create_policy(
    PolicyName=myPolicyDataBrew,
    PolicyDocument=json.dumps(policyJson),
    Description="Policy for databrew",
    Tags=[
        *myTags,
    ],
)

# create role
role = iam.create_role(
    RoleName=myRoleDataBrew,
    AssumeRolePolicyDocument=json.dumps(roleTrust),
    Description="Role for databrew",
    Tags=[
        *myTags,
    ],
)

# attach inline policies to role
response = iam.attach_role_policy(
    RoleName=role["Role"]["RoleName"], PolicyArn=policy["Policy"]["Arn"]
)

myRoleDataBrewARN = role['Role']['Arn']

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:orchid">
 
# <span style="color:greenyellow">Now we can start creating our DataBrew resources</span>
 
<hr style="border:1px dotted;color:LightSkyBlue">

> # **LAB 1** - Data Quality Validation

# <span style="color:LightSkyBlue">DATASETS</span>
## <span style="color:LightSkyBlue">Create a dataset from our datasource</span>
- We create datasets from the xl files we have uploaded into S3
- We need to create a dataset before we can create anything else that uses it
- https://docs.aws.amazon.com/databrew/latest/dg/datasets.html

In [None]:
# dataset with pii
response = databrew.create_dataset(
    Name=myDataSet,
    Format='EXCEL',
    FormatOptions={
        'Excel': {
            'SheetNames': [
                'Sheet1',
            ],
            'HeaderRow': True
        }
    },
    Input={
        'S3InputDefinition': {
            'Bucket': myBucketDataBrew,
            'Key': myDataSourceFileWithPII,
            'BucketOwner': myAccountNumber
        }
    },
    Tags=myTagsDct,
)

# need to ARN which is only available if we describe it
response = databrew.describe_dataset(
    Name=myDataSet
)
myDataSetARN = response['ResourceArn']

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:LightSkyBlue">
<hr style="border:1px dotted;color:DarkSeaGreen">

# <span style="color:DarkSeaGreen">DATA PROFILE</span>
## <span style="color:DarkSeaGreen">Create a ruleset</span>
- create some rules that will be used by the data profile job to validate it
- rules allow us to apply validation checks to the data
- it does not make any changes to the data
- https://docs.aws.amazon.com/databrew/latest/dg/profile.data-quality-available-checks.html

In [None]:
response = databrew.create_ruleset(
    Name=myRuleSet,
    Description='Example ruleset',
    TargetArn=myDataSetARN,
    Rules=[
        {
            'Name': 'myRule1',
            'Disabled': False,
            'CheckExpression': 'AGG(ROWS_COUNT) < :val1',
            'SubstitutionMap': {
                ':val1': '1001'
            }
        },
        {
            'Name': 'myRule2',
            'Disabled': False,
            'CheckExpression': 'AGG(DUPLICATE_ROWS_COUNT) == :val1',
            'SubstitutionMap': {
                ':val1': '0'
            }
        },
    ],
    Tags=myTagsDct,
)

# need the ARN which is only available if we describe it
response = databrew.describe_ruleset(
    Name=myRuleSet
)
myRuleSetARN = response['ResourceArn']

print ('Done! Move to the next cell ->')

## <span style="color:DarkSeaGreen">Create a data profile job from our dataset and ruleset</span>
- We can create a data profile job of our data set (if we choose ) to get statistics and metrics about our data and to optionally validate it
- We are providing some configuration
  - identify any columns that maybe pii that match the entity types provided
  - https://docs.aws.amazon.com/databrew/latest/dg/profile.configuration.html?icmpid=docs_console_unmapped#entity-detector-configuration

In [None]:
response = databrew.create_profile_job(
    Name=myProfileJob,
    DatasetName=myDataSet,
    LogSubscription="ENABLE",
    MaxCapacity=2,
    MaxRetries=2,
    OutputLocation={
        "Bucket": myBucketDataBrew,
        "Key": "output",
        "BucketOwner": myAccountNumber,
    },
    ValidationConfigurations=[
        {"RulesetArn": myRuleSetARN, "ValidationMode": "CHECK_ALL"},
    ],
    Configuration={
        "EntityDetectorConfiguration": {
            "EntityTypes": ["CREDIT_CARD", "EMAIL", "PERSON_NAME"],
            "AllowedStatistics": [
                {
                    "Statistics": [
                        "AGGREGATED_GROUP",
                        "CONTAINING_NUMERIC_VALUES_GROUP",
                        "TOP_VALUES_GROUP",
                    ]
                }
            ],
        }
    },
    RoleArn=myRoleDataBrewARN,
    Tags=myTagsDct,
    Timeout=10,
    JobSample={"Mode": "FULL_DATASET"},
)

print("Done! Move to the next cell ->")

## <span style="color:DarkSeaGreen">Lets run the profile job</span>
- We can also schedule the job to run if we prefer

In [None]:
response = databrew.start_job_run(
    Name=myProfileJob
)

print ('Done! Move to the next cell ->')

## <span style="color:DarkSeaGreen">And we can monitor its run</span>
- Run the next cell <span style="color:deeppink">as many times as you want</span> until the State reports a finished run
- One of 'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'

In [None]:
response = databrew.list_job_runs(
    Name=myProfileJob
)

# Extract job runs
job_runs = response.get('JobRuns', [])

# Find the latest job run based on StartTime
if job_runs:
    latest_job = max(job_runs, key=lambda x: x.get('StartTime', 0))
    latest_run_id = latest_job.get('RunId')
    print(f"The latest RunId is: {latest_run_id}")
else:
    print("No job runs found.")


response = databrew.describe_job_run(
    Name=myProfileJob,
    RunId=latest_run_id
)

# Extract the state from the response
job_state = response.get('State', 'Unknown')  # Defaults to 'Unknown' if 'State' is not present

print(f"The job state is: {job_state}")
print(f"-----\n-----\nFull response for your information:")
response

<hr style="border:1px dotted;color:DarkSeaGreen">
<hr style="border:1px dotted;color:ForestGreen">

> # **LAB 2** - Project Playground

# <span style="color:ForestGreen">RECIPE</span>
## <span style="color:ForestGreen">Create a recipe</span>
- create a recipe to perform transformations on our dataset
- if creating a recipe via the console, it can obly be created via a project session
- however we can create one directly in code that we can then add to a project we also create
- this recipe can be played with in a project, reviewed, updated, and published if ready for production
- the following tasks are being applied to demonstrate just a very small handful of transformations
  - UPPER_CASE - upper case the order status column
  - CASE_OPERATION - NEW COLUMN - giving the order status unique values a number to represent the text
  - UPPER_CASE - CONDITION - upper case the returned status when the feedback score is < 3
  - REPLACE_BETWEEN_POSITIONS - obfuscation of credit card number
  - REPLACE_PATTERN - obfuscation of email
- https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.pii.html

In [None]:
response = databrew.create_recipe(
    Name=myRecipe1,
    Description="recipe 1 - to be demonstrated in a project",
    Steps=[
        {
            "Action": {
                "Operation": "UPPER_CASE",
                "Parameters": {"sourceColumn": "Order Status"},
            }
        },
        {
            "Action": {
                "Operation": "CASE_OPERATION",
                "Parameters": {
                    "functionStepType": "CASE_OPERATION",
                    "targetColumn": "Order Status Key",
                    "valueExpression": "case when `Order Status` = 'DELIVERED' then '4' when `Order Status` = 'SHIPPED' then '3' when `Order Status` = 'CANCELLED' then '2' when `Order Status` = 'PENDING' then '1' else '-99' end",
                    "withExpressions": "[]",
                },
            }
        },
        {
            "Action": {
                "Operation": "UPPER_CASE",
                "Parameters": {"sourceColumn": "Return Status"},
            },
            "ConditionExpressions": [
                {
                    "Condition": "IS",
                    "Value": '["Returned"]',
                    "TargetColumn": "Return Status",
                },
                {
                    "Condition": "LESS_THAN",
                    "Value": "3",
                    "TargetColumn": "Feedback Score",
                },
            ],
        },
        {
            "Action": {
                "Operation": "REPLACE_BETWEEN_POSITIONS",
                "Parameters": {
                    "endPosition": "11",
                    "sourceColumn": "Credit Card Number",
                    "startPosition": "4",
                    "value": "xxxxxxxx",
                },
            }
        },
        {
            "Action": {
                "Operation": "REPLACE_PATTERN",
                "Parameters": {
                    "pattern": "^[^@]+(?=@)",
                    "sourceColumn": "Customer Email",
                    "value": "hidden",
                },
            }
        },
    ],
    Tags=myTagsDct,
)

print("Done! Move to the next cell ->")

## <span style="color:ForestGreen">Create a project</span>
- a project is a playground where you can experiment with recipes
- it can use existing recipes, and review and update them
- or you can create new recipes in a project, and publish them if production ready
- once created, you can play with this project via the console

In [None]:
response = databrew.create_project(
    Name=myProject,
    DatasetName=myDataSet,
    RecipeName=myRecipe1,
    Sample={
        'Size': 1000,
        'Type': 'FIRST_N'
    },
    RoleArn=myRoleDataBrewARN,
    Tags=myTagsDct,
)


print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:ForestGreen">
<hr style="border:1px dotted;color:SandyBrown">

> # **LAB 3** - Transformations

# <span style="color:SandyBrown">RECIPE JOB</span>
## <span style="color:SandyBrown">Create a recipe</span>
- lets create another recipe
- we're creating a new one just in case the one created for the project has been changed and played with and already published 
- this recipe can still be played with in a project, reviewed, updated, and published
- but we're going to publish it via code
- and add it to a recipe job to execute
- its the same recipe as above too :)

In [None]:
response = databrew.create_recipe(
    Name=myRecipe2,
    Description="recipe 2 - to be published immediately and used in a recipe job",
    Steps=[
        {
            "Action": {
                "Operation": "UPPER_CASE",
                "Parameters": {"sourceColumn": "Order Status"},
            }
        },
        {
            "Action": {
                "Operation": "CASE_OPERATION",
                "Parameters": {
                    "functionStepType": "CASE_OPERATION",
                    "targetColumn": "Order Status Key",
                    "valueExpression": "case when `Order Status` = 'DELIVERED' then '4' when `Order Status` = 'SHIPPED' then '3' when `Order Status` = 'CANCELLED' then '2' when `Order Status` = 'PENDING' then '1' else '-99' end",
                    "withExpressions": "[]",
                },
            }
        },
        {
            "Action": {
                "Operation": "UPPER_CASE",
                "Parameters": {"sourceColumn": "Return Status"},
            },
            "ConditionExpressions": [
                {
                    "Condition": "IS",
                    "Value": '["Returned"]',
                    "TargetColumn": "Return Status",
                },
                {
                    "Condition": "LESS_THAN",
                    "Value": "3",
                    "TargetColumn": "Feedback Score",
                },
            ],
        },
        {
            "Action": {
                "Operation": "REPLACE_BETWEEN_POSITIONS",
                "Parameters": {
                    "endPosition": "11",
                    "sourceColumn": "Credit Card Number",
                    "startPosition": "4",
                    "value": "xxxxxxxx",
                },
            }
        },
        {
            "Action": {
                "Operation": "REPLACE_PATTERN",
                "Parameters": {
                    "pattern": "^[^@]+(?=@)",
                    "sourceColumn": "Customer Email",
                    "value": "hidden",
                },
            }
        },
    ],
    Tags=myTagsDct,
)

print("Done! Move to the next cell ->")

## <span style="color:SandyBrown">Publish a recipe</span>
- publish the recipe to a new version so it can be used by a recipe job
- a published version will be created (v1.0 in this case)
- and a new working version will be created for experimentation in a project (v1.1 in this case)

In [None]:
response = databrew.publish_recipe(
    Description='Published my recipe to demo its availability and use in a recipe job',
    Name=myRecipe2
)

# Extract the state from the response
response = databrew.describe_recipe(Name=myRecipe2)
myRecipe2Version = response.get("RecipeVersion", "x.x")

print ('Done! Move to the next cell ->')

## <span style="color:SandyBrown">Create a recipe job</span>
- create a recipe job to run the recipe

In [None]:
response = databrew.create_recipe_job(
    Name=myRecipeJob,
    DatasetName=myDataSet,
    LogSubscription="ENABLE",
    MaxCapacity=2,
    MaxRetries=1,
    Outputs=[
        {
            "Format": "CSV",
            "Location": {
                "Bucket": myBucketDataBrew,
                "Key": "output/",
                "BucketOwner": myAccountNumber,
            },
            "Overwrite": False,
            'MaxOutputFiles': 1
        }
    ],
    RecipeReference={"Name": myRecipe2, "RecipeVersion": myRecipe2Version},
    RoleArn=myRoleDataBrewARN,
    Timeout=10,
    Tags=myTagsDct,
)

print("Done! Move to the next cell ->")

## <span style="color:SandyBrown">Lets run the recipe job</span>
- We can also schedule the job to run if we prefer

In [None]:
response = databrew.start_job_run(
    Name=myRecipeJob
)

print ('Done! Move to the next cell ->')

## <span style="color:SandyBrown">And we can monitor its run</span>
- Run the next cell <span style="color:deeppink">as many times as you want</span> until the State reports a finished run
- One of 'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'

In [None]:
response = databrew.list_job_runs(
    Name=myRecipeJob
)

# Extract job runs
job_runs = response.get('JobRuns', [])

# Find the latest job run based on StartTime
if job_runs:
    latest_job = max(job_runs, key=lambda x: x.get('StartTime', 0))
    latest_run_id = latest_job.get('RunId')
    print(f"The latest RunId is: {latest_run_id}")
else:
    print("No job runs found.")


response = databrew.describe_job_run(
    Name=myRecipeJob,
    RunId=latest_run_id
)

# Extract the state from the response
job_state = response.get('State', 'Unknown')  # Defaults to 'Unknown' if 'State' is not present

print(f"The job state is: {job_state}")
print(f"-----\n-----\nFull response for your information:")
response

<hr style="border:1px dotted;color:SandyBrown">
<hr style="border:1px dotted;color:deeppink">

# <p style="color:deeppink">STACK 01 COMPLETE!</p>

<hr style="border:1px dotted;color:deeppink">
<hr style="border:1px dotted;color:orangered">
<hr style="border:1px dotted;color:orangered">
<hr style="border:1px dotted;color:orangered">

# <p style="color:orangered">CLEAN UP!!</p>
# <p style="color:orangered">DO NOT RUN THESE UNLESS YOU WANT TO DESTROY EVERYTHING</p>
- If you have lost the Kernel:
  - Run the cells contained in the <span style="color:greenyellow">Set Up Requirements<span> section before continuing...
  - Any IDs or ARNs will have to be manually stated
### <p style="color:orangered">Click on the Variables in the tool bar above to display all variables, you'll see those that may have no value if you have lost or stopped your kernel</p>


In [None]:
# delete project
try:
    databrew.delete_project(Name=myProject)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete recipe job
try:
    databrew.delete_job(Name=myRecipeJob)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete recipes
# if no further work has been done on these recipes via the console, this piece of code should delete all that is there
# if not, either manually delete remaining recipes via the console or edit this code
try:
    databrew.delete_recipe_version(
        Name=myRecipe2,
        RecipeVersion='1.0'
    )
except Exception as err:
    print(f'3:{err}')


try:
    databrew.delete_recipe_version(
        Name=myRecipe1,
        RecipeVersion='LATEST_WORKING'
    )
except Exception as err:
    print(f'1:{err}')

try:
    databrew.delete_recipe_version(
        Name=myRecipe2,
        RecipeVersion='LATEST_WORKING'
    )
except Exception as err:
    print(f'2:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete databrew profile job
try:
    databrew.delete_job(Name=myProfileJob)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete databrew rule set
try:
    databrew.delete_ruleset(Name=myRuleSet)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete databrew dataset
try:
    databrew.delete_dataset(Name=myDataSet)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete roles and policies
try:
    iam.detach_role_policy(
        RoleName=myRoleDataBrew, PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyDataBrew)
    )
except Exception as err:
    print(f'1:{err}')

try:
    iam.delete_role(RoleName=myRoleDataBrew)
except Exception as err:
    print(f'4:{err}')

try:
    iam.delete_policy(PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyDataBrew))
except Exception as err:
    print(f'5:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete s3 bucket
# NOTE WARNING - this will delete all objects in the bucket with NO prompt or confirmation
# myBucketDataBrew = 'doit-databrew-bucket-???-???' # look in the console and set here if lost
try:
    s3r = boto3.resource('s3')
    bucket = s3r.Bucket(myBucketDataBrew)
    bucket.objects.all().delete()
except Exception as err:
    print(f'9:{err}')

try:
    # delete the bucket
    response = s3.delete_bucket(Bucket=myBucketDataBrew)
except Exception as err:
    print(f'9:{err}')

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:gold">
<hr style="border:1px dotted;color:gold">
<hr style="border:1px dotted;color:gold">

# <p style="color:gold">Appendix - Jupyter Install Requirements (macOS)</p>
#### <p style="color:deeppink">- If you are running VSCode on a laptop, follow all of below.<br>- If you are running Jupyter inside an AWS Account, you don't need to do anything!</p>

  - Credentials to the AWS account this notebook executes in is provided by AWS configure
  - You must already have an IAM user with code (Command Line Interface) access and AWS access keys to be able to use these credentials in AWS configure  
    
  - arn:aws:iam::###########:user/simon-davies-cli was created for this lab when the workshop was presented

### <p style="color:gold">1. Homebrew</p> 
If you haven't installed Homebrew, you can install it by running the following command here or in the terminal:

In [None]:
%%bash
sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

### <p style="color:gold">1.1 Virtual Environments</p> 
- You can create a virtual environment that ensures any libraries you install are restricted to the venv.
  - https://code.visualstudio.com/docs/python/environments
- To enable the virtual environment once you have created it, ensure you open the folder in vs code rather than individual files.

In [None]:
%%bash
sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

### <p style="color:gold">1.2 Python</p> 
Once Homebrew is installed, you can install Python using the following command  
*check what you have before installing/upgrading*  
*you will need to quit and restart vsCode to use python once installed (or updated)*

In [None]:
%%bash
python3 --version
which python3

In [None]:
%%bash
brew install python

### <p style="color:gold">2. boto3 and other Python requirements</p> 
* boto3 must be installed on your client
  * *Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.*
  * https://boto3.amazonaws.com/v1/documentation/api/latest/index.html  
  
*check what you have before installing/upgrading*  

In [None]:
%%bash
python3 -m pip show boto3

In [None]:
pip install -U boto3

### <p style="color:gold">3. aws configure</p> 
*Configure aws configure with credentials, and a user that has all of the Bedrock IAM policies required*  
https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html
  
*You will need AWS CLI*  
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

In [None]:
%%bash
aws sts get-caller-identity

<hr style="border:1px dotted;color:gold">