# AWS Glue Studio Notebook
##### You are now running a AWS Glue Studio notebook; To start using your notebook you need to start an AWS Glue Interactive Session.


#### Optional: Run this cell to see available notebook commands ("magics").


In [None]:
%help

####  Run this cell to set up and start your interactive session.


In [1]:
%idle_timeout 2880
%glue_version 5.0
%worker_type G.1X
%number_of_workers 5

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

Welcome to the Glue Interactive Sessions Kernel
For more information on available magic commands, please type %help in any new cell.

Please view our Getting Started page to access the most up-to-date information on the Interactive Sessions kernel: https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html
Installed kernel version: 1.0.7 
Current idle_timeout is None minutes.
idle_timeout has been set to 2880 minutes.
Setting Glue version to: 5.0
Previous worker type: None
Setting new worker type to: G.1X
Previous number of workers: None
Setting new number of workers to: 5
Trying to create a Glue session for the kernel.
Session Type: glueetl
Worker Type: G.1X
Number of Workers: 5
Idle Timeout: 2880
Session ID: ebef392c-aa36-4d0c-af21-9f6d80cb0619
Applying the following default arguments:
--glue_kernel_version 1.0.7
--enable-glue-datacatalog true
Waiting for session ebef392c-aa36-4d0c-af21-9f6d80cb0619 to get into ready status...
Session ebef392c-aa36-4d0c-af21-9f6d80cb0619 ha

#### Example: Create a DynamicFrame from a table in the AWS Glue Data Catalog and display its schema


In [4]:
dyf = glueContext.create_dynamic_frame.from_options(
connection_type = 's3',
connection_options = {
"paths": ["s3://accelerometer-1226/landing/"]},
format = "json")
dyf.printSchema()

root
|-- user: string
|-- timestamp: long
|-- x: double
|-- y: double
|-- z: double


#### Example: Convert the DynamicFrame to a Spark DataFrame and display a sample of the data


In [13]:
from awsglue.dynamicframe import DynamicFrame
accelerometer_trusted = dyf.toDF()
customer_trusted = spark.read.json("s3://customer-1226/trusted/")
# customer_trusted.show()

accelerometer_trusted =accelerometer_trusted.join(customer_trusted.select("email").distinct(),accelerometer_trusted.user == customer_trusted.email,"leftsemi")

# print(accelerometer_curated.count())

accelerometer_trusted_dyf=DynamicFrame.fromDF(
accelerometer_trusted,
glueContext,
"accelerometer_trusted_dyf"
)

accelerometer_trusted_dyf.printSchema()

root
|-- user: string
|-- timestamp: long
|-- x: double
|-- y: double
|-- z: double


#### Example: Visualize data with matplotlib


In [17]:
glueContext.write_dynamic_frame.from_options(
frame = accelerometer_trusted_dyf,
connection_type = 's3',
connection_options = {"path":"s3://accelerometer-1226/trusted"},
format = "json"
)

<awsglue.dynamicframe.DynamicFrame object at 0x7fb7b3165950>


#### Example: Write the data in the DynamicFrame to a location in Amazon S3 and a table for it in the AWS Glue Data Catalog


In [21]:
%%sql

CREATE TABLE IF NOT EXISTS accelerometer_trusted(
user string,
timestamp long,
x double,
y double,
z double
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://accelerometer-1226/trusted/"
TBLPROPERTIES ('encryption' = 'false')


++
||
++
++
