# _Please wait for the Autopilot Model to deploy!!  Otherwise, this notebook won't work properly._


# Invoke SageMaker Autopilot Model from Athena

Machine Learning (ML) with Amazon Athena (Preview) lets you use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.

To use ML with Athena (Preview), you define an ML with Athena (Preview) function with the `USING FUNCTION` clause. The function points to the Amazon SageMaker model endpoint that you want to use and specifies the variable names and data types to pass to the model. Subsequent clauses in the query reference the function to pass values to the model. The model runs inference based on the values that the query passes and then returns inference results.

### Install PyAthena

In [1]:
!pip install -q PyAthena==1.10.7

[33mYou are using pip version 10.0.1, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
from pyathena import connect
from pyathena.pandas_cursor import PandasCursor
from pyathena.util import as_pandas

In [3]:
import boto3
import sagemaker
import pandas as pd

# Get region 
session = boto3.session.Session()
region_name = session.region_name

# Get SageMaker session & default S3 bucket
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role() 

# Create an Athena Table with Sample Reviews

In [4]:
# Set S3 prefixes
tsv_prefix = 'amazon-reviews-pds/tsv'

# Set Athena parameters
database_name = 'dsoaws'
table_name_tsv = 'amazon_reviews_tsv'
table_name = 'product_reviews'

In [5]:
# Set S3 staging directory -- this is a temporary directory used for Athena queries
s3_staging_dir = 's3://{}/athena/staging'.format(bucket)

In [6]:
# Create Table SQL Statement
statement = """
CREATE TABLE IF NOT EXISTS {}.{} AS 
SELECT review_id, review_body 
FROM {}.{}
""".format(database_name, table_name, database_name, table_name_tsv)

print(statement)


CREATE TABLE IF NOT EXISTS dsoaws.product_reviews AS 
SELECT review_id, review_body 
FROM dsoaws.amazon_reviews_tsv



In [7]:
# Execute statement using connection cursor
cursor = connect(region_name=region_name, s3_staging_dir=s3_staging_dir).cursor()
cursor.execute(statement)

<pyathena.cursor.Cursor at 0x7fc579d1e080>

In [8]:
statement = 'SELECT * FROM {}.{} LIMIT 10'.format(database_name, table_name)
cursor.execute(statement)

<pyathena.cursor.Cursor at 0x7fc579d1e080>

In [9]:
df_show = as_pandas(cursor)
df_show

Unnamed: 0,review_id,review_body
0,R19OFJV91M7D8X,I chose the deluxe version CD because of mortg...
1,R1I6G894K5AGG5,"Schedule C IS for business, so figures it wou..."
2,R17OE43FFEP81I,I wish that companies can test several scenari...
3,R15MGDDK63B52Z,i just installed turbotax deluxe 2007. If you ...
4,R1GGJJA2R68033,The description mentions that you can use this...
5,R24OSHCGREF78Q,Got this on sale and I don't regret it one bit...
6,RN0IGK9V02MAM,Fun game but super addicting
7,ROMNRCAPR8FKU,excellent service. the best
8,R2ZN124DJCEXZE,"Worked great!!!!! Best way to buy a game, only..."
9,R1UJSSOE66DA1J,Excellent


# Retrieve Autopilot Endpoint Name

In [10]:
%store -r autopilot_endpoint_name

no stored variable autopilot_endpoint_name


In [11]:
print(autopilot_endpoint_name)

NameError: name 'autopilot_endpoint_name' is not defined

## Add the Required `AmazonAthenaPreviewFunctionality` Work Group to Use This Preview Feature

In [None]:
import boto3
from botocore.exceptions import ClientError

client = boto3.client('athena')

try:
    response = client.create_work_group(Name='AmazonAthenaPreviewFunctionality') 
    print(response)
except ClientError as e:
    if e.response['Error']['Code'] == 'InvalidRequestException':
        print("Workgroup already exists.")
    else:
        print("Unexpected error: %s" % e)
    


# Create SQL Query

The `USING FUNCTION` clause specifies an ML with Athena (Preview) function or multiple functions that can be referenced by a subsequent `SELECT` statement in the query. You define the function name, variable names, and data types for the variables and return values.

In [None]:
statement = """
USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = '{}'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM {}.{} LIMIT 10
    """.format(autopilot_endpoint_name, database_name, table_name)

print(statement)

# Query the Autopilot Endpoint using Data from the Athena Table

In [None]:
# Execute statement using connection cursor
cursor = connect(region_name=region_name, 
                 s3_staging_dir=s3_staging_dir).cursor()
cursor.execute(statement, 
               work_group='AmazonAthenaPreviewFunctionality')

In [None]:
df = as_pandas(cursor)

In [None]:
df.head(10)