# Sagemaker Jupyter Notebook Integration with Snowflake via Python

---

This Notebook shows how to integrate Sagemaker and Snowflake via the Python connector. 


## Contents

1. [Credentials](#Credentials)
1. [Database Connectivity](#Database-Connectivity)
1. [Data Import](#DataImport)


## Credentials
Credentials can be hard coded but a much more secure way is to stored them in the [Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-paramstore.html). The following step reads the values for the provided keys from the parameter store. These Keys are just an example. You can use the same Keys but you have to create the Key/Value pairs in the parameter store before you can use them here. 

In [3]:
import boto3

params=['/SNOWFLAKE/URL','/SNOWFLAKE/ACCOUNT_ID'
        ,'/SNOWFLAKE/USER_ID','/SNOWFLAKE/PASSWORD'
        ,'/SNOWFLAKE/DATABASE','/SNOWFLAKE/SCHEMA'
        ,'/SNOWFLAKE/WAREHOUSE','/SNOWFLAKE/BUCKET'
        ,'/SNOWFLAKE/PREFIX']
region='us-east-1'

def get_credentials(params):
   ssm = boto3.client('ssm',region)
   response = ssm.get_parameters(
      Names=params,
      WithDecryption=True
   )
   #Build dict of credentials
   param_values={k['Name']:k['Value'] for k in  response['Parameters']}
   return param_values

param_values=get_credentials(params)

## Database Connectivity
The following step establishes a connection to the Snowflake database. It uses the credentials read by the previous step from the Systems Manager parameter store.

In [4]:
import snowflake.connector
# Connecting to Snowflake using the default authenticator
ctx = snowflake.connector.connect(
  user=param_values['/SNOWFLAKE/USER_ID'],
  password=param_values['/SNOWFLAKE/PASSWORD'],
  account=param_values['/SNOWFLAKE/ACCOUNT_ID'],
  warehouse=param_values['/SNOWFLAKE/WAREHOUSE'],
  database=param_values['/SNOWFLAKE/DATABASE'],
  schema=param_values['/SNOWFLAKE/SCHEMA']
)

## Data Import
The following step reads weather from the [Snowflake Sample Weather Data](https://docs.snowflake.net/manuals/user-guide/sample-data-openweathermap.html) database. Notice, how easy it is to read and transform JSON data. The result set can directly be used to create a pandas data frame. Check out this [JSON tutorial](https://docs.snowflake.net/manuals/user-guide/json-basics-tutorial.html) on the Snowflake documentation site.
Since we are using the Python connector and all data for the whole dataset will be read into memory on the local Notbook server, we are limiting the result set to only weather data for New York. Check out the Spark integration for a fully scalable solution.

In [5]:
cs=ctx.cursor()
allrows=cs.execute( \
"select (V:main.temp_max - 273.15) * 1.8000 + 32.00 as temp_max_far, " +\
"       (V:main.temp_min - 273.15) * 1.8000 + 32.00 as temp_min_far, " +\
"       cast(V:time as timestamp) time, " +\
"       V:city.coord.lat lat, " +\
"       V:city.coord.lon lon " +\
"from snowflake_sample_data.weather.weather_14_total " +\
"where v:city.name = 'New York' " +\
"and   v:city.country = 'US' ").fetchall()

In [6]:
import pandas as pd                               # For munging tabular data

data = pd.DataFrame(allrows)
data.columns=['temp_max_far','temp_min_far','time','lat','lon']
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 10)         # Keep the output on one page
data

Unnamed: 0,temp_max_far,temp_min_far,time,lat,lon
0,73.400,69.800,2016-08-29 13:26:34,43.000351,-75.499901
1,80.996,73.400,2016-08-29 13:27:13,40.714272,-74.005966
2,66.992,55.400,2016-09-03 01:28:37,43.000351,-75.499901
3,77.000,60.998,2016-09-03 01:29:14,40.714272,-74.005966
4,68.000,64.004,2016-08-22 02:30:49,43.000351,-75.499901
...,...,...,...,...,...
19937,33.800,26.600,2018-01-10 15:02:27,40.714272,-74.005966
19938,41.000,33.800,2018-01-11 04:02:15,43.000351,-75.499901
19939,41.000,32.000,2018-01-11 04:02:19,40.714272,-74.005966
19940,41.000,35.600,2018-01-11 05:04:31,43.000351,-75.499901
