
# Getting Started with the AMC Quickstart

#### Import Libraries

In [None]:
import boto3
import json
import pandas as pd
from client_manager_microservices.tps.atsclientslibraries.clientmgr import customer
from datalake_hydration_microservices.wfm.atsclientslibraries.workflows import workflows
from datalake_hydration_microservices.wfm.atsclientslibraries.workflow_invoke import workflowInvoke

#### Define Global Configurations

In [None]:
ENV = "dev" # Change to your Default Environment

## The Team Name configured in the Data Lake platform
# This is the same value passed in the 'data_pipeline_parameters' team of ddk.json file
TEAM_NAME = "<INSERT TEAM NAME>" 

## The Dataset Name configured in the Data Lake platform
# This is the same value passed in the 'data_pipeline_parameters' dataset of ddk.json file
amc_dataset_name = "<INSERT DATASET NAME>" 

# The AMC Instance information can be found on your AMC UI page
amc_api_endpoint = "<ENTER AMC API ENDPOINT URL>"

amc_s3_bucket_name = "<ENTER AMC S3 BUCKET NAME>"

amc_data_upload_acct = "<ENTER ACCOUNT ID>"

## The AMC Instance AWS region (also in the API Endpoint URL)
# This may differ from your solution deployment region
amc_instance_region = 'us-east-1'

#OPTIONAL: Change to your desired Customer ID (keep less than 25 characters) 
customer_id = "testdemocustomer"

if amc_instance_region == '':
    amc_instance_region = str(boto3.Session().region_name)
print("Region : " + amc_instance_region)

#### Step 1: Onboard A New Client

Tenant Provisioning Service (TPS) is used for onboarding clients for each team space. Each Client should be configured in the same AWS Region as the AMC Instance. Each client is defined by:

1. An AMC Instance
2. Corresponding profiles ids which are grouped for this customer according the advertiserids used for setting up the AMC instance

The following notebook cells will define your client configuration, onboard your client, and create the AMC S3 Bucket for that client

##### Define your Client Configuration

Run the below cell to setup and verify the client configuration. 

Refer to the `client_manager_microservices/client_manager_adminstrator_sample` notebook for more information on the configuration parameters.

In [None]:
customer_details = {
    "customer_id": customer_id,
    "customer_name":"DemoCustomer",     #OPTIONAL: Change to your desired Customer Name
    "customer_type": "ENDEMIC",         #Can Be <ENDEMIC or NON-ENDEMIC>
    "region":amc_instance_region,
    "amc":{
        "amc_dataset_name":amc_dataset_name,
        "endpoint_url": amc_api_endpoint,
        "aws_orange_account_id": amc_data_upload_acct,
        "bucket_name": amc_s3_bucket_name
    }
}

print(json.dumps(customer_details, indent=4))

##### Submit Your Client Configuration to Create The AMC S3 Bucket

Running the cells below will start the process of onboarding your client and creating the AMC S3 Bucket for your client.

In [None]:
dynamodb_resp_wr = customer.set_customers_config(customer_details=customer_details, TEAM_NAME=TEAM_NAME, ENV=ENV)
dynamodb_resp_wr

_Wait a few minutes for the AMC S3 Bucket to be deployed BEFORE moving to Step #2._

_You Can Verify the Status by going to AWS Step Functions and waiting until the state machine named tps-&lt;TeamName&gt;-initialize-amc has 1 Succeeded Execution Status._

#### Step 2: Define An AMC Workflow Query and Set An AMC Workflow Record. 

Workflow Manager (WFM) Service is used to manage and schedule AMC workflows. The following notebook cells with walkthrough the process of creating an AMC workflow query and creating an AMC Workflow record for the specified query.

The query used here is the Time To Conversion Query from the Interactive Query Library (IQL) in the AMC UI.

The query finds out how long it takes for your customers to convert after last seeing your ad. You can use this information to adjust the duration of campaign and promotion to maximize sales. In our amazon_attributed_events_by_conversion_time and amazon_attributed_events_by_traffic_time tables, we report up to 14 days after the customers’ last exposure to your ad.


In [None]:
amc_query = """
SELECT
      advertiser,
      campaign,
        ( 
            CASE WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 60 THEN
                '1 | < 1 MIN'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 600 THEN
                '2 | 1 - 10 MIN'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 1800 THEN
                '3 | 10 - 30 MIN'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 3600 THEN
                '4 | 30 - 60 MIN'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 7200 THEN
                '5 | 1 - 2 HRS'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 43200 THEN
                '6 | 2 - 12 HRS'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 86400 THEN
                '7 | 12 - 24 HRS'
                WHEN SECONDS_BETWEEN (impression_dt,
                    conversion_event_dt) <= 604800 THEN
                '8 | 1 - 7 DAYS'
            ELSE
                '9 | 7+ DAYS'
END
) AS time_to_conversion,
        SUM(purchases) AS purchases,
        SUM(total_purchases) AS total_brand_purchases
FROM
    amazon_attributed_events_by_conversion_time
    
GROUP BY 1,2,3
"""

print (amc_query)

__NOTE__: This is just one example of a workflow query you can run on your AMC Instance. Refer to the Interactive Query Library (IQL) in the AMC UI for a list of other queries for different use cases. Queries can also be customized for you unique use case as well.

##### Create The AMC Workflow Record

Run the below cell to setup and verify the workflow configuration (default configuration values are already populated). 

Refer to the `datalake_hydration_microservices/workflows_wfm_sample` notebook for more information on the workflow configuration parameters.



In [None]:
# Workflow record
workflow = {
  "customerId": customer_id,
  "defaultSchedule": {
    "automaticDeploySchedule": False,
    "Description": "Runs the time_to_conversion workflow. ",
    "Input": {
      "payload": {
        "timeWindowEnd": "today(-1)",
        "timeWindowStart": "today(-91)",
        "timeWindowType": "EXPLICIT",
        "workflow_executed_date": "now()"
      }
    },
    "Name": "time_to_conversion",
  },
  "filteredMetricsDiscriminatorColumn": "filtered",
  "sqlQuery": amc_query,
  "version": 1,
  "workflowId": "time_to_conversion_v1",
  "workflowMetaData": {
    "automaticDeployWorkflow": True,
    "endemicType": "ENDEMIC"
  },
  "workflowType": "ENDEMIC|NON-ENDEMIC"
}

print(json.dumps(workflow, indent=4))

In [None]:
dynamodb_resp_wr = workflows.set_workflow_record(workflow_details=workflow, TEAM_NAME=TEAM_NAME, ENV=ENV)
dynamodb_resp_wr

#### Step 3: Invoke the AMC Workflow to be Executed Ad Hoc

Run the below cell to set up your workflow exeuction configuration (default configuration values are already populated). 

Refer to the `datalake_hydration_microservices/workflows_invoke_wfm_sample` notebook for more information on the workflow exeuction configuration parameters.

In [None]:
# Workflow record
workflow =  {
  "customerId": customer_id,
  "Description": "Runs the time_to_conversion workflow looking back to 90 days prior",
  "Input": {
    "payload": {
      "timeWindowEnd": "today(-1)",
      "timeWindowStart": "today(-91)",
      "timeWindowType": "EXPLICIT",
      "workflow_executed_date": "now()",
      "workflowId": "time_to_conversion_v1"
    }
  },
  "Name": f'wfm-{customer_id}-time_to_conversion'
}

print(json.dumps(workflow, indent=4))

In [None]:
response = workflowInvoke.invoke_workflow(workflow, TEAM_NAME, ENV)
response

__NOTE__: In this example we are invoking the time_to_conversion workflow to be run once. Workflows can also be set up to run on pre-defined schedules custom to your use case. Refer to the `workflowLibrary_wfm_sample` and `workflowSchedules_wfm_sample` notebooks for more information.

#### Your workflow is now being processed and executed. 
Once the workflow has run, data will be uploaded to your AMC S3 Bucket with the results of the workflow query and processed through the data lake. 

- Continue exploring the notebooks in the `client_manager_microservices` folder for further documentation on how to onboard new clients

- Continue exploring the notebooks in the `datalake_hydration_microservices` folder for further documentation on how to schedule and manage your workflows and submit multiple workflows.

### Optional : Data Visualization Using Workflow Query Results

Before you begin setting up data visualizations please wait until the workflow query has been executed and data has been populated in S3 and processed by the data lake. This can take up to 30 minutes for data to be populated on S3 and processed through the data lake.

Verify that there is data for your query result in the S3 Path: 

    amc-{ENV}-{aws_region}-{account_id}-stage/post-stage/{TEAM_NAME}/

Once you have your query results, run the below cell to visualize the data you have gathered:

In [None]:
#Import Data Visualization Python Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import awswrangler as wr
from datetime import datetime
import matplotlib.ticker as mtick

In [None]:
#Fill in parameters for workflow_id and the solution deployment_region 
# The deployment_region may differ from the amc_instance_region set earlier in the notebook
workflow_id = "time_to_conversion_v1"
deployment_region = "<ENTER SOLUTION DEPLOYMENT REGION>"

In [None]:
s3_client = boto3.client("s3")

account_id = boto3.client("sts").get_caller_identity()["Account"]
bucket_name = f"amc-{ENV}-{deployment_region}-{account_id}-stage"

d = datetime.today()
month = '{:02d}'.format(d.month)
year = '{:04d}'.format(d.year)

#Retrieving S3 Key for Processed Data
s3_prefix = "post-stage/{}/{}/{}_{}_adhoc/customer_hash={}/export_year={}/export_month={}".format(TEAM_NAME, amc_dataset_name, customer_id, workflow_id, customer_id, year, month)
response = s3_client.list_objects(Bucket=bucket_name, Prefix= s3_prefix)
s3_key = "/".join(response["Contents"][0]["Key"].split("/")[:-1])


# CONFIRM THIS IS THE CORRECT S3_PATH FOR YOUR PROCESSED DATA
s3_path = f"s3://{bucket_name}/{s3_key}"
s3_path

In [None]:
# Creating DataFrame from the processed data in Amazon S3
df = wr.s3.read_parquet(s3_path, dataset=True)

#View Data Returned From Query Results
df.head()

In [None]:
# Sum Purchases by Time To Conversion Groups
purchases_df = df.groupby(df["time_to_conversion"], as_index=False, sort=True, group_keys=True)["time_to_conversion","purchases"].sum()
purchases_df["Type"]= "Purchases"

# Sum  Total_Brand_Purchases by Time To Conversion Groups
total_purchases_df = df.groupby(df["time_to_conversion"], as_index=False, sort=True, group_keys=True)[  "total_brand_purchases"].sum()
total_purchases_df["Type"]= "Total Brand Purchases"


# Join 2 DataFrames Together
total_purchases_df.rename(columns={'total_brand_purchases': 'purchases'}, inplace=True)
p_df = purchases_df.append(total_purchases_df, ignore_index=True)


# Re-Format Time To Conversion
p_df[['time_to_conversion','Time To Conversion']] = p_df['time_to_conversion'].str.split("|", expand=True)
p_df = p_df.drop('time_to_conversion', 1)
p_df

In [None]:
# Set Figure Size
plt.figure(figsize=(15,10))

# Create Bar Plot
ax = sns.barplot(x = "Time To Conversion",
            y = "purchases",
            hue = "Type",
            palette = "coolwarm",
            data = p_df)

# Change Bar Width
width_scale = 0.9
for bar in ax.containers[0]:
    bar.set_width(bar.get_width()* width_scale)
for bar in ax.containers[1]: 
    bar.set_width(bar.get_width()* width_scale)

    
# Format Y Axis
def currency(x, pos):
    """The two args are the value and tick position"""
    if x >= 1e6:
        s = '${:1.1f}M'.format(x*1e-6)
    else:
        s = '${:1.1f}K'.format(x*1e-3)
    return s

ax.yaxis.set_major_formatter(currency)

# Label Axes
plt.ylabel("Purchases", fontsize=16)
plt.xlabel("Time To Conversion", fontsize=16)
plt.yticks(rotation=25)


# Add Values to Bar Chart
for p in ax.patches:
        ax.annotate('${:,.0f}'.format(p.get_height()), (p.get_x()+0.15, p.get_height()),ha='center', va='bottom',color= 'black')


# Add Title
plt.title("Purchases By time To Conversion", fontsize = 24)

# Resize Plot Legend
plt.legend(loc=1, prop={'size': 12})

# Show the plot
plt.show()

#### Data Visualization Using QuickSight Dashboards

If you require more advanced and customizable data visualizations consider using Amazon QuickSight as your BI Tool.  With Amazon QuickSight you can perform advanced analytics, gather machine learning (ML) insights and embed interactive visualizations and dashboards with natural language query capabilities. Refer to the AMC QuickStart FAQ to find more information on how to set up your own Amazon QuickSight Dashboards.