# Invoke SageMaker Autopilot Model from Athena

Machine Learning (ML) with Amazon Athena (Preview) lets you use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.

To use ML with Athena (Preview), you define an ML with Athena (Preview) function with the `USING FUNCTION` clause. The function points to the Amazon SageMaker model endpoint that you want to use and specifies the variable names and data types to pass to the model. Subsequent clauses in the query reference the function to pass values to the model. The model runs inference based on the values that the query passes and then returns inference results.

<img src="img/athena_model.png" width="50%" align="left">

# Pre-Requisite

## *Please note that ML with Athena is in Preview and will only work in the following regions that support Preview Functionality:*

## *us-east-1,  us-west-2, ap-south-1, eu-west-1*


### Check if you current regions supports AthenaML Preview

In [None]:
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name="sagemaker", region_name=region)

In [None]:
if region in ["eu-west-1", "ap-south-1", "us-east-1", "us-west-2"]:
    print(" [OK] AthenaML IS SUPPORTED IN {}".format(region))
    print(" [OK] Please proceed with this notebook.")
else:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(" [ERROR] AthenaML IS *NOT* SUPPORTED IN {} !!".format(region))
    print(" [INFO] This is OK. SKIP this notebook and move ahead with the workshop.")
    print(" [INFO] This notebook is not required for the rest of this workshop.")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

# Pre-Requisite

## _Please wait for the Autopilot Model to deploy!!  Otherwise, this notebook won't work properly._

In [None]:
%store -r autopilot_endpoint_name

In [None]:
try:
    autopilot_endpoint_name
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] There is no Autopilot Model Endpoint deployed.")
    print("[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.")
    print("[INFO] This notebook is not required for the rest of this workshop.")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

In [None]:
print(autopilot_endpoint_name)

In [None]:
try:
    resp = sm.describe_endpoint(EndpointName=autopilot_endpoint_name)
    status = resp["EndpointStatus"]
    if status == "InService":
        print("[OK] Your Autopilot Model Endpoint is in status: {}".format(status))
    elif status == "Creating":
        print("[INFO] Your Autopilot Model Endpoint is in status: {}".format(status))
        print("[INFO] Waiting for the endpoint to be InService. Please be patient. This might take a few minutes.")
        sm.get_waiter("endpoint_in_service").wait(EndpointName=autopilot_endpoint_name)
    else:
        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
        print("[ERROR] Your Autopilot Model is in status: {}".format(status))
        print("[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.")
        print("[INFO] This notebook is not required for the rest of this workshop.")
        print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
except:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] There is no Autopilot Model Endpoint deployed.")
    print("[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.")
    print("[INFO] This notebook is not required for the rest of this workshop.")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

## Import PyAthena

In [None]:
from pyathena import connect

# Create an Athena Table with Sample Reviews

## Check for Athena TSV Table

In [None]:
%store -r ingest_create_athena_table_tsv_passed

In [None]:
try:
    ingest_create_athena_table_tsv_passed
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN ALL NOTEBOOKS IN THE `INGEST` SECTION.")
    print("++++++++++++++++++++++++++++++++++++++++++++++")

In [None]:
print(ingest_create_athena_table_tsv_passed)

In [None]:
if not ingest_create_athena_table_tsv_passed:
    print("++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN ALL NOTEBOOKS IN THE `INGEST` SECTION.")
    print("++++++++++++++++++++++++++++++++++++++++++++++")
else:
    print("[OK]")

In [None]:
s3_staging_dir = "s3://{}/athena/staging".format(bucket)

In [None]:
tsv_prefix = "amazon-reviews-pds/tsv"
database_name = "dsoaws"
table_name_tsv = "amazon_reviews_tsv"
table_name = "product_reviews"

In [None]:
statement = """
CREATE TABLE IF NOT EXISTS {}.{} AS 
SELECT review_id, review_body 
FROM {}.{}
""".format(
    database_name, table_name, database_name, table_name_tsv
)

print(statement)

In [None]:
import pandas as pd

if region in ["eu-west-1", "ap-south-1", "us-east-1", "us-west-2"]:
    conn = connect(region_name=region, s3_staging_dir=s3_staging_dir)
    pd.read_sql(statement, conn)

    print("[OK]")
else:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(" [ERROR] AthenaML IS *NOT* SUPPORTED IN {} !!".format(region))
    print(" [INFO] This is OK. SKIP this notebook and move ahead with the workshop.")
    print(" [INFO] This notebook is not required for the rest of this workshop.")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

In [None]:
if region in ["eu-west-1", "ap-south-1", "us-east-1", "us-west-2"]:
    statement = "SELECT * FROM {}.{} LIMIT 10".format(database_name, table_name)
    conn = connect(region_name=region, s3_staging_dir=s3_staging_dir)
    df_table = pd.read_sql(statement, conn)
    print(df_table)

## Add the Required `AmazonAthenaPreviewFunctionality` Work Group to Use This Preview Feature

In [None]:
from botocore.exceptions import ClientError

client = boto3.client("athena")

if region in ["eu-west-1", "ap-south-1", "us-east-1", "us-west-2"]:
    try:
        response = client.create_work_group(Name="AmazonAthenaPreviewFunctionality")
        print(response)
    except ClientError as e:
        if e.response["Error"]["Code"] == "InvalidRequestException":
            print("[OK] Workgroup already exists.")
        else:
            print("[ERROR] {}".format(e))

# Create SQL Query

The `USING FUNCTION` clause specifies an ML with Athena (Preview) function or multiple functions that can be referenced by a subsequent `SELECT` statement in the query. You define the function name, variable names, and data types for the variables and return values.

In [None]:
statement = """
USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = '{}'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM {}.{} LIMIT 10
    """.format(
    autopilot_endpoint_name, database_name, table_name
)

print(statement)

# Query the Autopilot Endpoint using Data from the Athena Table

In [None]:
if region in ["eu-west-1", "ap-south-1", "us-east-1", "us-west-2"]:
    conn = connect(region_name=region, s3_staging_dir=s3_staging_dir, work_group="AmazonAthenaPreviewFunctionality")
    df = pd.read_sql(statement, conn)
    print(df)

# Delete Endpoint

In [None]:
sm = boto3.client("sagemaker")

if autopilot_endpoint_name:
    sm.delete_endpoint(EndpointName=autopilot_endpoint_name)

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [None]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}