# <p style="color:dodgerblue">01 Create Billing Project</p>
- This notebook creates the architecture required to analyse an AWS billing "standard data export"
- An example "standard data export" csv file is provided for this lab
- Granularity is set to "monthly" and resource IDs are included in the CSV file
  
(Jupyter Notebook developed with Kernel Python 3.11.6)
<hr style="border:1px dotted; color:floralwhite">

# <span style="color:deeppink">GETTING STARTED</span>
# Client requirements for this Jupyter Notebook Lab (macOS)
- *See <span style="color:gold">Appendix - Jupyter Install Requirements (macOS)</span> at the bottom of this lab to install macOS requirements, windows requirements will be similar, apart from Homebrew.*  
- These requirements are generic and allow you to run Python notebooks, use Boto3, etc - they are simply to get your local environment in a state that can support Jupyter Notebooks and not specific to this lab

<hr style="border:1px dotted">
<hr style="border:1px dotted;color:greenyellow">

# <p style="color:DarkTurquoise">Billing Prerequisites</p>
### <p style="color:DarkTurquoise">NOTE we are using ap-southeast-2</p>
No other architecture prerequisites required.
Any "standard data export" csv file can be used with this project, however one is provided in the resources folder as a sample.  
To create your own, do the following:
- In an AWS account, go to the Billing and Cost Management console
- Go to Data Exports (option in left hand column)
- Create an export
- Select "Standard data export"
- Select CUR 2.0
- Check "Include resource IDs" only
- Select "Monthly" as time granularity
- Select "CSV" as the file format
- Typically 114 columns should be included (ie all columns, if more no worries)
- Choose an S3 bucket etc
- Wait for it to be created
- Use your file as follows, either:
  - Replace the file in the resources/datasouces folder of this lab with yours, ensuring the name is the same; or
  - When invoking the lambda, pass in the bucket location and file name of your own file (assuming its in the same account and region of the lambda)

### Columns
https://docs.aws.amazon.com/cur/latest/userguide/table-dictionary-cur2.html#cur2-column-groups

<style scoped>
table {
  font-size: 10px;
}
</style>
| **Field Name** | **Data Type** | **Description** |
|--------------|------------|--------------|
|bill_bill_type|String|The type of bill that this report covers.|
|bill_billing_entity|String|Helps you identify whether your invoices or transactions are for AWS Marketplace or for purchases of other AWS services.|
|bill_billing_period_end_date|Timestamp|The end date of the billing period that is covered by this report, in UTC. The format is YYYY-MM-DDTHH:mm:ssZ.|
|bill_billing_period_start_date|Timestamp|The start date of the billing period that is covered by this report, in UTC. The format is YYYY-MM-DDTHH:mm:ssZ.|
|bill_invoice_id|String|The ID associated with a specific line item. Until the report is final, the InvoiceId is blank.|
|bill_invoicing_entity|String|The AWS entity that issues the invoice.|
|bill_payer_account_id|String|The account ID of the paying account. For an organization in AWS Organizations, this is the account ID of the management account.|
|bill_payer_account_name|String|The account name of the paying account. For an organization in AWS Organizations, this is the name of the management account.|
|cost_category|Map|Cost Category entries are automatically populated when you create a Cost Category and categorization rule. These entries include user-defined Cost Category names as keys, and corresponding Cost Category values|
|discount|Map|A map column that contains key-value pairs of additional discount data for a given line item when applicable.|
|discount_bundled_discount|Number|The bundled discount applied to the line item. A bundled discount is a usage-based discount that provides free or discounted usage of a service or feature based on the usage of another service or feature.|
|discount_total_discount|Number|The sum of all the discount columns for the corresponding line item.|
|identity_line_item_id|String|This field is generated for each line item and is unique in a given partition. This does not guarantee that the field will be unique across an entire delivery (that is, all partitions in an update) of the AWS CUR. The line item ID isn't consistent between different Cost and Usage Reports and can't be used to identify the same line item across different reports.|
|identity_time_interval|String|The time interval that this line item applies to, in the following format: YYYY-MM-DDTHH:mm:ssZ/YYYY-MM-DDTHH:mm:ssZ. The time interval is in UTC and can be either daily or hourly, depending on the granularity of the report.|
|line_item_availability_zone|String|The Availability Zone that hosts this line item.|
|line_item_blended_cost|Number|The BlendedRate multiplied by the UsageAmount.|
|line_item_blended_rate|String|The BlendedRate is the average cost incurred for each SKU across an organization.|
|line_item_currency_code|String|The currency that this line item is shown in. All AWS customers are billed in US dollars by default. To change your billing currency, see Changing which currency you use to pay your bill in the AWS Billing User Guide.|
|line_item_legal_entity|String|The Seller of Record of a specific product or service. In most cases, the invoicing entity and legal entity are the same. The values might differ for third-party AWS Marketplace transactions.|
|line_item_line_item_description|String|The description of the line item type.|
|line_item_line_item_type|String|The type of charge covered by this line item.|
|line_item_net_unblended_cost|Number|The actual after-discount cost that you're paying for the line item.|
|line_item_net_unblended_rate|String|The actual after-discount rate that you're paying for the line item.|
|line_item_normalization_factor|Number|As long as the instance has shared tenancy, AWS can apply all Regional Linux or Unix Amazon EC2 and Amazon RDS RI discounts to all instance sizes in an instance family and AWS Region. This also applies to RI discounts for member accounts in an organization. All new and existing Amazon EC2 and Amazon RDS size-flexible RIs are sized according to a normalization factor, based on the instance size.|
|line_item_normalized_usage_amount|Number|The amount of usage that you incurred, in normalized units, for size-flexible RIs. The NormalizedUsageAmount is equal to UsageAmount multiplied by NormalizationFactor.|
|line_item_operation|String|The specific AWS operation covered by this line item. This describes the specific usage of the line item.|
|line_item_product_code|String|The code of the product measured.|
|line_item_resource_id|String|If you chose to include individual resource IDs in your report, this column contains the ID of the resource that you provisioned.|
|line_item_tax_type|String|The type of tax that AWS applied to this line item.|
|line_item_unblended_cost|Number|The UnblendedCost is the UnblendedRate multiplied by the UsageAmount.|
|line_item_unblended_rate|String|In consolidated billing for accounts using AWS Organizations, the unblended rate is the rate associated with an individual account's service usage. For Amazon EC2 and Amazon RDS line items that have an RI discount applied to them, the UnblendedRate is zero. Line items with an RI discount have a LineItemType of DiscountedUsage.|
|line_item_usage_account_id|String|The account ID of the account that used this line item. For organizations, this can be either the management account or a member account. You can use this field to track costs or usage by account.|
|line_item_usage_account_name|String|The name of the account that used this line item. For organizations, this can be either the management account or a member account. You can use this field to track costs or usage by account.|
|line_item_usage_amount|Number|The amount of usage that you incurred during the specified time period. For size-flexible Reserved Instances, use the reservation_total_reserved_units column instead. Certain subscription charges will have a UsageAmount of 0.|
|line_item_usage_end_date|Timestamp|The end date and time for the line item in UTC, exclusive. The format is YYYY-MM-DDTHH:mm:ssZ.|
|line_item_usage_start_date|Timestamp|The start date and time for the line item in UTC, inclusive. The format is YYYY-MM-DDTHH:mm:ssZ.|
|line_item_usage_type|String|The usage details of the line item.|
|pricing_currency|String|The currency that the pricing data is shown in.|
|pricing_lease_contract_length|String|The length of time that your RI is reserved for.|
|pricing_offering_class|String|The offering class of the Reserved Instance.|
|pricing_public_on_demand_cost|Number|The total cost for the line item based on public On-Demand Instance rates. If you have SKUs with multiple On-Demand public costs, the equivalent cost for the highest tier is displayed. For example, services offering free-tiers or tiered pricing.|
|pricing_public_on_demand_rate|String|The public On-Demand Instance rate in this billing period for the specific line item of usage. If you have SKUs with multiple On-Demand public rates, the equivalent rate for the highest tier is displayed. For example, services offering free-tiers or tiered pricing.|
|pricing_purchase_option|String|How you chose to pay for this line item. Valid values are All Upfront, Partial Upfront, and No Upfront.|
|pricing_rate_code|String|A unique code for a product/ offer/ pricing-tier combination. The product and term combinations can have multiple price dimensions, such as a free tier, low-use tier, and high-use tier.|
|pricing_rate_id|String|The ID of the rate for a line item.|
|pricing_term|String|Whether your AWS usage is Reserved or On-Demand.|
|pricing_unit|String|The pricing unit that AWS used for calculating your usage cost. For example, the pricing unit for Amazon EC2 instance usage is in hours.|
|product|Map|A map column for where each key-value pair is an additional product attribute and its value.|
|product_comment|String|A comment regarding the product.|
|product_fee_code|String|The code that refers to the fee.|
|product_fee_description|String|The description for the product fee.|
|product_from_location|String|Describes the location where the usage originated from.|
|product_from_location_type|String|Describes the location type where the usage originated from.|
|product_from_region_code|String|Describes the source Region code for the AWS service.|
|product_instance_family|String|Describes your Amazon EC2 instance family. Amazon EC2 provides you with a large number of options across 10 different instance types, each with one or more size options, organized into distinct instance families optimized for different types of applications.|
|product_instance_type|String|Describes the instance type, size, and family, which define the CPU, networking, and storage capacity of your instance.|
|product_instancesku|String|The SKU of the product instance|
|product_location|String|Describes the location that your resource resides in.|
|product_location_type|String|Describes the location type of your task.|
|product_operation|String|Describes the specific AWS operation that this line item covers.|
|product_pricing_unit|String|The smallest billing unit for an AWS service. For example, 0.01c per API call.|
|product_product_family|String|The category for the type of product.|
|product_region_code|String|A Region is a physical location around the world where data centers are clustered. AWS calls each group of logical data centers an Availability Zone (AZ). Each AWS Region consists of multiple, isolates, and physically separate AZs within a geographical area. The Region code attribute has the same name as an AWS Region, and specifies where the AWS service is available.|
|product_servicecode|String|This identifies the specific AWS service to the customer as a unique short abbreviation.|
|product_sku|String|A unique code for a product. The SKU is created by combining the ProductCode, UsageType, and Operation. For size-flexible RIs, the SKU uses the instance that was used.|
|product_to_location|String|Describes the location usage destination.|
|product_to_location_type|String|Describes the destination location of the service usage.|
|product_to_region_code|String|Describes the source Region code for the AWS service.|
|product_usagetype|String|Describes the usage details of the line item.|
|reservation_amortized_upfront_cost_for_usage|Number|The initial upfront payment for all upfront RIs and partial upfront RIs amortized for usage time. The value is equal to: RIAmortizedUpfrontFeeForBillingPeriod * The normalized usage amount for DiscountedUsage line items / The normalized usage amount for the RIFee. Because there are no upfront payments for no upfront RIs, the value for a no upfront RI is 0. We do not provide this value for Dedicated Host reservations at this time. The change will be made in a future update.|
|reservation_amortized_upfront_fee_for_billing_period|Number|Describes how much of the upfront fee for this reservation is costing you for the billing period. The initial upfront payment for all upfront RIs and partial upfront RIs, amortized over this month. Because there are no upfront fees for no upfront RIs, the value for no upfront RIs is 0. We do not provide this value for Dedicated Host reservations at this time. The change will be made in a future update.|
|reservation_availability_zone|String|The Availability Zone of the resource that is associated with this line item.|
|reservation_effective_cost|Number|The sum of both the upfront and hourly rate of your RI, averaged into an effective hourly rate. EffectiveCost is calculated by taking the amortizedUpfrontCostForUsage and adding it to the recurringFeeForUsage.|
|reservation_end_time|String|The end date of the associated RI lease term.|
|reservation_modification_status|String|Shows whether the RI lease was modified or if it is unaltered.|
|reservation_net_amortized_upfront_cost_for_usage|Number|The initial upfront payment for All Upfront RIs and Partial Upfront RIs amortized for usage time, if applicable.|
reservation_net_amortized_upfront_fee_for_billing_period|Number|The cost of the reservation's upfront fee for the billing period.|
|reservation_net_effective_cost|Number|The sum of both the upfront fee and the hourly rate of your RI, averaged into an effective hourly rate.|
|reservation_net_recurring_fee_for_usage|Number|The after-discount cost of the recurring usage fee.|
|reservation_net_unused_amortized_upfront_fee_for_billing_period|Number|The net unused amortized upfront fee for the billing period.|
|reservation_net_unused_recurring_fee|Number|The recurring fees associated with unused reservation hours for Partial Upfront and No Upfront RIs after discounts.|
|reservation_net_upfront_value|Number|The upfront value of the RI with discounts applied.|
|reservation_normalized_units_per_reservation|String|The number of normalized units for each instance of a reservation subscription.|
|reservation_number_of_reservations|String|The number of reservations that are covered by this subscription. For example, one RI subscription might have four associated RI reservations.|
|reservation_recurring_fee_for_usage|Number|The recurring fee amortized for usage time, for partial upfront RIs and no upfront RIs. The value is equal to: The unblended cost of the RIFee * The sum of the normalized usage amount of Usage line items / The normalized usage amount of the RIFee for size flexible Reserved Instances. Because all upfront RIs don't have recurring fee payments greater than 0, the value for all upfront RIs is 0.|
|reservation_reservation_a_r_n|String|The Amazon Resource Name (ARN) of the RI that this line item benefited from. This is also called the "RI Lease ID". This is a unique identifier of this particular AWS Reserved Instance. The value string also contains the AWS service name and the Region where the RI was purchased.|
|reservation_start_time|String|The start date of the term of the associated Reserved Instance.|
|reservation_subscription_id|String|A unique identifier that maps a line item with the associated offer. We recommend you use the RI ARN as your identifier of an AWS Reserved Instance, but both can be used.|
|reservation_total_reserved_normalized_units|String|The total number of reserved normalized units for all instances for a reservation subscription. AWS computes total normalized units by multiplying the reservation_normalized_units_per_reservation with reservation_number_of_reservations.|
|reservation_total_reserved_units|String|TotalReservedUnits populates for both Fee and RIFee line items with distinct values. Fee line items: The total number of units reserved, for the total quantity of leases purchased in your subscription for the entire term. This is calculated by multiplying the NumberOfReservations with UnitsPerReservation. For example, 5 RIs x 744 hours per month x 12 months = 44,640. RIFee line items (monthly recurring costs): The total number of available units in your subscription, such as the total number of Amazon EC2 hours in a specific RI subscription. For example, 5 RIs x 744 hours = 3,720.|
|reservation_units_per_reservation|String|UnitsPerReservation populates for both Fee and RIFee line items with distinct values. Fee line items: The total number of units reserved for the subscription, such as the total number of RI hours purchased for the term of the subscription. For example 744 hours per month x 12 months = 8,928 total hours/units. RIFee line items (monthly recurring costs): The total number of available units in your subscription, such as the total number of Amazon EC2 hours in a specific RI subscription. For example, 1 unit x 744 hours = 744.|
|reservation_unused_amortized_upfront_fee_for_billing_period|Number|The amortized-upfront-fee-for-billing-period-column amortized portion of the initial upfront fee for all upfront RIs and partial upfront RIs. Because there are no upfront payments for no upfront RIs, the value for no upfront RIs is 0. We do not provide this value for Dedicated Host reservations at this time. The change will be made in a future update.|
|reservation_unused_normalized_unit_quantity|Number|The number of unused normalized units for a size-flexible Regional RI that you didn't use during this billing period|
|reservation_unused_quantity|Number|The number of RI hours that you didn't use during this billing period.|
|reservation_unused_recurring_fee|Number|The recurring fees associated with your unused reservation hours for partial upfront and no upfront RIs. Because all upfront RIs don't have recurring fees greater than 0, the value for All Upfront RIs is 0.|
|reservation_upfront_value|Number|The upfront price paid for your AWS Reserved Instance. For no upfront RIs, this value is 0.|
|resource_tags|Map|A map where each entry is a resource tag key-value pair. This can be used to find information about the specific resources covered by a line item.|
|savings_plan_amortized_upfront_commitment_for_billing_period|String|The amount of upfront fee a Savings Plan subscription is costing you for the billing period. The initial upfront payment for All Upfront Savings Plan and Partial Upfront Savings Plan amortized over the current month. For No Upfront Savings Plan, the value is 0.|
|savings_plan_end_time|String|The expiration date for the Savings Plan agreement.|
|savings_plan_instance_type_family|String|The instance family that is associated with the specified usage.|
|savings_plan_net_amortized_upfront_commitment_for_billing_period|Number|The cost of a Savings Plan subscription upfront fee for the billing period.|
|savings_plan_net_recurring_commitment_for_billing_period|Number|The net unblended cost of the Savings Plan fee.|
|savings_plan_net_savings_plan_effective_cost|Number|The effective cost for Savings Plans, which is your usage divided by the fees.|
|savings_plan_offering_type|String|Describes the type of Savings Plan purchased.|
|savings_plan_payment_option|String|The payment options available for your Savings Plan.|
|savings_plan_purchase_term|String|Describes the duration, or term, of the Savings Plan.|
|savings_plan_recurring_commitment_for_billing_period|Number|The monthly recurring fee for your Savings Plan subscriptions. For example, the recurring monthly fee for a Partial Upfront Savings Plan or No Upfront Savings Plan.|
|savings_plan_region|String|The AWS Region (geographic area) that hosts your AWS services. You can use this field to analyze spend across a particular AWS Region.|
|savings_plan_savings_plan_a_r_n|String|The unique Savings Plan identifier.|
|savings_plan_savings_plan_effective_cost|Number|The proportion of the Savings Plan monthly commitment amount (upfront and recurring) that is allocated to each usage line.|
savings_plan_savings_plan_rate|String|The Savings Plan rate for the usage.|
|savings_plan_start_time|String|The start date of the Savings Plan agreement.|
|savings_plan_total_commitment_to_date|Number|The total amortized upfront commitment and recurring commitment to date, for that hour.|
|savings_plan_used_commitment|Number|The total dollar amount of the Savings Plan commitment used. (SavingsPlanRate multiplied by usage)|

<hr style="border:1px dotted;color:DarkTurquoise">
<hr style="border:1px dotted;color:greenyellow">

# <p style="color:greenyellow">Create backend architecture needed to support CostOpt</p>

# <p style="color:greenyellow">Lets Create Clients and Variables</p>
- We do these setup cells here because we can then use the vars and clients to clean up resources later without having to run multiple cells if we lose the kernel

In [None]:
import boto3
import json
import random
import os

# region
myRegion='ap-southeast-2'
myAccountNumber = boto3.client("sts").get_caller_identity()["Account"]

# set up a boto3 session using a profile that is able to create services in the region
# this is typically a developer profile or deployment profile
sessionBoto3 = boto3.Session(profile_name="default", region_name=myRegion)

# names for services we will create below
# s3 bucket - MUST BE A UNIQUE NAME so we randomise a couple of numbers to be sure
myBucketCostOpt='doit-costopt-bucket-' + str(random.randint(0, 1000)) + '-' + str(random.randint(0, 1000))
myBucketFolder='dataexport_cur2.0'

# github url for markdown file content
myGitHubURL='https://github.com/SimonDdraig/labs/blob/main/billing/resources/optimisations/dataworkloads/rds/'

# iam
myRoleLambda1="doit-costopt-lambda-role"
myPolicyLambda1="doit-costopt-lambda-s3-policy"
myRoleLambda1ARN='RETRIEVED FROM ROLE BELOW ONCE CREATED'
myRoleDataBrew="doit-costopt-databrew-service-role"
myPolicyDataBrew="doit-costopt-databrew-service-policy"
myRoleDataBrewARN='RETRIEVED FROM ROLE BELOW ONCE CREATED'

# lambda
myLambda1='doit-costopt-process-csv-fn'
myLambda1ARN='RETRIEVED FROM LAMBDA BELOW ONCE QUERIED'

# databrew
myDataSet="doit-costopt-databrew-dataset"
myDataSetARN="RETRIEVED FROM OBJECT BELOW ONCE CREATED"
myRecipe="doit-costopt-databrew-recipe"
myProject="doit-costopt-databrew-project"

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


# <span style="color:ForestGreen">RECIPE</span>
## <span style="color:ForestGreen">Create a recipe</span>
- create a recipe to perform transformations on our dataset
- if creating a recipe via the console, it can obly be created via a project session
- however we can create one directly in code that we can then add to a project we also create
- this recipe can be played with in a project, reviewed, updated, and published if ready for production
- the following steps are being applied to demonstrate just a very small handful of transformations
  - UPPER_CASE - upper case the order status column
  - CASE_OPERATION - NEW COLUMN - giving the order status unique values a number to represent the text
  - UPPER_CASE - CONDITION - upper case the returned status when the feedback score is < 3
  - REPLACE_BETWEEN_POSITIONS - obfuscation of credit card number
  - REPLACE_PATTERN - obfuscation of email
- https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.pii.html

# <span style="color:ForestGreen">RECIPE</span>
## <span style="color:ForestGreen">Create a recipe</span>
- create a recipe to perform transformations on our dataset
- if creating a recipe via the console, it can obly be created via a project session
- however we can create one directly in code that we can then add to a project we also create
- this recipe can be played with in a project, reviewed, updated, and published if ready for production
- the following steps are being applied to demonstrate just a very small handful of transformations
  - UPPER_CASE - upper case the order status column
  - CASE_OPERATION - NEW COLUMN - giving the order status unique values a number to represent the text
  - UPPER_CASE - CONDITION - upper case the returned status when the feedback score is < 3
  - REPLACE_BETWEEN_POSITIONS - obfuscation of credit card number
  - REPLACE_PATTERN - obfuscation of email
- https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.pii.html

In [2]:
# local client path for resources
# these are resources required by this lab and will be later uploaded to the cloud
myLocalPathForResources='/Users/simondavies/Documents/GitHub/labs/billing/resources/'

# jupypter notebook path if notebook is used in AWS for example
#myLocalPathForResources='/home/ec2-user/SageMaker/labs/billing/resources/'

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


- create required clients to AWS SDK for Python (Boto3) to create, configure, and manage AWS services
- https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

In [3]:
# s3
s3 = sessionBoto3.client('s3')

# lambda
lambdac = boto3.client('lambda', region_name=myRegion)

# iam
iam = sessionBoto3.client('iam')

# logs (cloudwatch)
logs = boto3.client('logs', region_name=myRegion)

# databrew
databrew = sessionBoto3.client('databrew')

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


- tags for all services that are created - you can never have too many tags!
  - make sure you have a tagging policy in place
  - https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_tag-policies.html

In [4]:
# define tags added to all services we create
# best practice tagging of all resources should be used at all times
myTags = [
    {"Key": "env", "Value": "non_prod"},
    {"Key": "owner", "Value": "doit_lab"},
    {"Key": "project", "Value": "doit_costopt-lab"},
    {"Key": "author", "Value": "simon"},
]
myTagsDct = {
    "env": "non_prod",
    "owner": "doit_lab",
    "project": "doit_costopt-lab",
    "author": "simon",
}

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


<hr style="border:1px dotted;color:greenyellow">
<hr style="border:1px dotted;color:crimson">

# <p style="color:crimson">Create S3 Bucket</p>
- defaults used, will use sse-s3 encryption and block public access
- bucket is used to upload the data file we need as a resource for the costopt dataset
- also used to store output from costopt job runs

In [5]:
# create bucket
# don't change this region, the condition is just checking how to create the bucket based on the region we're working in
if (myRegion != 'us-east-1'):
    s3.create_bucket(
        Bucket=myBucketCostOpt, CreateBucketConfiguration={"LocationConstraint": myRegion}
    )
else:
    s3.create_bucket(
        Bucket=myBucketCostOpt
    )

s3.put_bucket_tagging(Bucket=myBucketCostOpt, Tagging={"TagSet": myTags})

# create a "folder" to upload the file to and where lambda gets it from - really keys as S3 is flat
s3.put_object(Bucket=myBucketCostOpt, Key="{}/".format(myBucketFolder))

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


- upload resource files to s3 that will be used as a datasource for the lambda

In [None]:
# Upload each file to the S3 bucket
myDataSourceFile='{}/monthly-00001.csv'.format(myBucketFolder)
filename = os.path.basename(myDataSourceFile)

files = [
    {
        's3key': myDataSourceFile,
        'localpath': '{}/datasource/{}'.format(myLocalPathForResources,filename)
    }
]

for file in files:
    print ('uploading: {}'.format(filename))
    s3.upload_file(file['localpath'], myBucketCostOpt, file['s3key'], ExtraArgs={'StorageClass': 'STANDARD'})
    print ('uploaded: {} to bucket {}/{}/{}'.format(filename,myBucketCostOpt,myBucketFolder,filename))

print ('Done! Move to the next cell ->')

uploading: monthly-00001.csv
uploaded: monthly-00001.csv to doit-costopt-bucket-396-484/dataexport_cur2.0
Done! Move to the next cell ->


<hr style="border:1px dotted;color:crimson">
<hr style="border:1px dotted;color:orchid">

# <p style="color:orchid">Create IAM</p>
- roles and policies that allow services to interact with other services
- https://docs.aws.amazon.com/databrew/latest/dg/setting-up-iam-policies-for-costopt.html

### <p style="color:orchid">Lambda 1 IAM</p>
- allows lambda to create log group and stream and write logs to cloudwatch
- read from s3

In [10]:
# myRoleLambda1
# trust policy for the role
roleTrust = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "lambda.amazonaws.com"},
            "Action": "sts:AssumeRole",
        }
    ],
}

# define inline policy
policyJson = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
            ],
            "Resource": [
                "arn:aws:logs:*:*:log-group:/aws/lambda/*",
                "arn:aws:logs:*:*:log-group:/aws/lambda/*:log-stream:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{myBucketCostOpt}",
                f"arn:aws:s3:::{myBucketCostOpt}/*"
            ]
        }
    ],
}

# create inline policy
policy = iam.create_policy(
    PolicyName=myPolicyLambda1,
    PolicyDocument=json.dumps(policyJson),
    Description="Policy for costopt",
    Tags=[
        *myTags,
    ],
)

# create role
role = iam.create_role(
    RoleName=myRoleLambda1,
    AssumeRolePolicyDocument=json.dumps(roleTrust),
    Description="Role for costopt lambda",
    Tags=[
        *myTags,
    ],
)

# attach inline policies to role
response = iam.attach_role_policy(
    RoleName=role["Role"]["RoleName"], PolicyArn=policy["Policy"]["Arn"]
)

myRoleLambda1ARN = role['Role']['Arn']

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


# <span style="color:LightSkyBlue">Create Lambda</span>
- NOTE a zip file is already provided in this lab. 
- You DO NOT NEED to create your own zip file.
- However, if you want to edit the lambda, follow below to create an amended zip file 
  - You need to create the zip file from the lambda resource folder as create lambda function requires a zipped file
  - You can do this from a terminal window as long as you have cd'ed to the folder that contains the function code
    - Eg you need to be in the folder that contains the lambda function code (and all of the libraries if any are required) ...
      - *zip -r doit-costopt-process-csv-fn.zip .*

### <span style="color:LightSkyBlue">Create Lambda 1 - doit-costopt-process-csv-fn</span>
- create a lambda to process a csv

In [None]:
# define vars
myLambdaZip='{}lambda/{}.zip'.format(myLocalPathForResources,myLambda1)

# Loads the zip file as binary code. 
with open(myLambdaZip, 'rb') as f: 
    code = f.read()
    
# create lambda
myLambdaFunction = lambdac.create_function(
    FunctionName=myLambda1,
    Runtime='python3.12',
    Role=myRoleLambda1ARN,
    Handler='{}.lambda_handler'.format(myLambda1),
    Code={'ZipFile':code},
    Description='processes a csv file of format AWS cur 2.0 to find cost optimisation suggestions',
    Timeout=30,
    MemorySize=128,
    Publish=True,
    PackageType='Zip',
    Environment={
        'Variables': {
            'BUCKET_NAME': '{}'.format(myBucketCostOpt,myBucketFolder),
            'FILE_KEY': '{}/{}'.format(myBucketFolder, filename)
            'GITHUB': '{}'.format(myGitHubURL)
        }
    },
    Tags=myTagsDct,
    Architectures=[
        'x86_64',
    ],
    LoggingConfig={
        'LogFormat': 'JSON',
        'ApplicationLogLevel': 'INFO',
        'SystemLogLevel': 'WARN'
    }
)

myLambda1ARN=myLambdaFunction['FunctionArn']

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


- log group for lambda

In [13]:
logs.create_log_group(
    logGroupName='/aws/lambda/' + myLambda1,
    tags=myTagsDct
)

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


<hr style="border:1px dotted;color:orchid">
 
# <span style="color:greenyellow">The following is not required to do a cost opt, but if the cells are run, it will create a dataset in databrew and allow you to view the data in a databrew project</span>

<hr style="border:1px dotted;color:LightSkyBlue">

# <span style="color:LightSkyBlue">DATABREW</span>

# <p style="color:LightSkyBlue">Create databrew IAM</p>
- roles and policies that allow services to interact with other services
- https://docs.aws.amazon.com/databrew/latest/dg/setting-up-iam-policies-for-databrew.html

In [14]:
# myRoleDataBrew
# trust policy for the role
roleTrust = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"Service": "databrew.amazonaws.com"},
            "Action": "sts:AssumeRole",
        }
    ],
}

# define inline policy
policyJson = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
            ],
            "Resource": "*",
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{myBucketCostOpt}",
                f"arn:aws:s3:::{myBucketCostOpt}/*"
            ]
        }
    ],
}

# create inline policy
policy = iam.create_policy(
    PolicyName=myPolicyDataBrew,
    PolicyDocument=json.dumps(policyJson),
    Description="Policy for databrew",
    Tags=[
        *myTags,
    ],
)

# create role
role = iam.create_role(
    RoleName=myRoleDataBrew,
    AssumeRolePolicyDocument=json.dumps(roleTrust),
    Description="Role for databrew",
    Tags=[
        *myTags,
    ],
)

# attach inline policies to role
response = iam.attach_role_policy(
    RoleName=role["Role"]["RoleName"], PolicyArn=policy["Policy"]["Arn"]
)

myRoleDataBrewARN = role['Role']['Arn']

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


## <span style="color:LightSkyBlue">Create a dataset from our datasource</span>
- We create datasets from the xl files we have uploaded into S3
- We need to create a dataset before we can create anything else that uses it
- https://docs.aws.amazon.com/databrew/latest/dg/datasets.html

In [18]:
# dataset with pii
response = databrew.create_dataset(
    Name=myDataSet,
    Format='CSV',
    FormatOptions={
        'Csv': {
            'Delimiter': ',',
            'HeaderRow': True
        }
    },
    Input={
        'S3InputDefinition': {
            'Bucket': myBucketCostOpt,
            'Key': '{}/{}'.format(myBucketFolder, filename),
            'BucketOwner': myAccountNumber
        }
    },
    Tags=myTagsDct,
)

# need to ARN which is only available if we describe it
response = databrew.describe_dataset(
    Name=myDataSet
)
myDataSetARN = response['ResourceArn']

print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


## <span style="color:LightSkyBlue">Create a recipe</span>
- create a simple recipe to perform transformations on our dataset
- needed to be able to create a project

In [22]:
response = databrew.create_recipe(
    Name=myRecipe,
    Description="recipe 1 - to be used in a project",
    Steps=[
        {
            "Action": {
                "Operation": "UPPER_CASE",
                "Parameters": {"sourceColumn": "bill_billing_entity"},
            }
        },
    ],
    Tags=myTagsDct,
)

print("Done! Move to the next cell ->")

Done! Move to the next cell ->


## <span style="color:LightSkyBlue">Create a databrew project</span>
- a project is a playground where you can experiment with recipes
- it can use existing recipes, and review and update them
- or you can create new recipes in a project, and publish them if production ready
- once created, you can play with this project via the console
- only samples the first 5000 rows - this is the max

In [25]:
response = databrew.create_project(
    Name=myProject,
    DatasetName=myDataSet,
    RecipeName=myRecipe,
    Sample={
        'Size': 5000,
        'Type': 'FIRST_N'
    },
    RoleArn=myRoleDataBrewARN,
    Tags=myTagsDct,
)


print ('Done! Move to the next cell ->')

Done! Move to the next cell ->


You can now view the dataset in the databrew project if you wish

<hr style="border:1px dotted;color:LightSkyBlue">
<hr style="border:1px dotted;color:deeppink">

# <p style="color:deeppink">STACK 01 COMPLETE!</p>
- ### <p style="color:deeppink">You can now invoke the lambda</p>
- ### <p style="color:deeppink">This will create a cost optimisation document in the bucket we have created!</p>

<hr style="border:1px dotted;color:deeppink">
<hr style="border:1px dotted;color:orangered">
<hr style="border:1px dotted;color:orangered">
<hr style="border:1px dotted;color:orangered">

# <p style="color:orangered">CLEAN UP!!</p>
# <p style="color:orangered">DO NOT RUN THESE UNLESS YOU WANT TO DESTROY EVERYTHING</p>
- If you have lost the Kernel:
  - Run the cells contained in the <span style="color:greenyellow">Set Up Requirements<span> section before continuing...
  - Any IDs or ARNs will have to be manually stated
### <p style="color:orangered">Click on the Variables in the tool bar above to display all variables, you'll see those that may have no value if you have lost or stopped your kernel</p>


In [None]:
# delete databrew project
try:
    databrew.delete_project(Name=myProject)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete recipes
# if no further work has been done on these recipes via the console, this piece of code should delete all that is there
# if not, either manually delete remaining recipes via the console or edit this code
try:
    databrew.delete_recipe_version(
        Name=myRecipe,
        RecipeVersion='LATEST_WORKING'
    )
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# delete databrew dataset
try:
    databrew.delete_dataset(Name=myDataSet)
except Exception as err:
    print(f'1:{err}')

print ('Done! Move to the next cell ->')

In [None]:
# lambdas
lambdac.delete_function(FunctionName=myLambda1)

In [None]:
# delete lambda roles and policies
try:
    iam.detach_role_policy(
        RoleName=myRoleLambda1, PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyLambda1)
    )
except Exception as err:
    print(f'1:{err}')

try:
    iam.delete_role(RoleName=myRoleLambda1)
except Exception as err:
    print(f'4:{err}')

try:
    iam.delete_policy(PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyLambda1))
except Exception as err:
    print(f'5:{err}')

# delete databrew roles and policies
try:
    iam.detach_role_policy(
        RoleName=myRoleDataBrew, PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyDataBrew)
    )
except Exception as err:
    print(f'1:{err}')

try:
    iam.delete_role(RoleName=myRoleDataBrew)
except Exception as err:
    print(f'4:{err}')

try:
    iam.delete_policy(PolicyArn='arn:aws:iam::{}:policy/{}'.format(myAccountNumber, myPolicyDataBrew))
except Exception as err:
    print(f'5:{err}')
print ('Done! Move to the next cell ->')

In [None]:
# delete s3 bucket
# NOTE WARNING - this will delete all objects in the bucket with NO prompt or confirmation
# myBucketCostOpt = 'doit-costopt-bucket-???-???' # look in the console and set here if lost
try:
    s3r = boto3.resource('s3')
    bucket = s3r.Bucket(myBucketCostOpt)
    bucket.objects.all().delete()
except Exception as err:
    print(f'9:{err}')

try:
    # delete the bucket
    response = s3.delete_bucket(Bucket=myBucketCostOpt)
except Exception as err:
    print(f'9:{err}')

print ('Done! Move to the next cell ->')

<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:coral">
<hr style="border:1px dotted;color:gold">
<hr style="border:1px dotted;color:gold">
<hr style="border:1px dotted;color:gold">

# <p style="color:gold">Appendix - Jupyter Install Requirements (macOS)</p>
#### <p style="color:deeppink">- If you are running VSCode on a laptop, follow all of below.<br>- If you are running Jupyter inside an AWS Account, you don't need to do anything!</p>

  - Credentials to the AWS account this notebook executes in is provided by AWS configure
  - You must already have an IAM user with code (Command Line Interface) access and AWS access keys to be able to use these credentials in AWS configure  
    
  - arn:aws:iam::###########:user/simon-davies-cli was created for this lab when the workshop was presented

### <p style="color:gold">1. Homebrew</p> 
If you haven't installed Homebrew, you can install it by running the following command here or in the terminal:

In [None]:
%%bash
sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

### <p style="color:gold">1.1 Virtual Environments</p> 
- You can create a virtual environment that ensures any libraries you install are restricted to the venv.
  - https://code.visualstudio.com/docs/python/environments
- To enable the virtual environment once you have created it, ensure you open the folder in vs code rather than individual files.

In [None]:
%%bash
sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

### <p style="color:gold">1.2 Python</p> 
Once Homebrew is installed, you can install Python using the following command  
*check what you have before installing/upgrading*  
*you will need to quit and restart vsCode to use python once installed (or updated)*

In [None]:
%%bash
python3 --version
which python3

In [None]:
%%bash
brew install python

### <p style="color:gold">2. boto3 and other Python requirements</p> 
* boto3 must be installed on your client
  * *Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.*
  * https://boto3.amazonaws.com/v1/documentation/api/latest/index.html  
  
*check what you have before installing/upgrading*  

In [None]:
%%bash
python3 -m pip show boto3

In [None]:
pip install -U boto3

### <p style="color:gold">3. aws configure</p> 
*Configure aws configure with credentials, and a user that has all of the Bedrock IAM policies required*  
https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html
  
*You will need AWS CLI*  
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

In [None]:
%%bash
aws sts get-caller-identity

<hr style="border:1px dotted;color:gold">