
# Glue Studio Notebook
You are now running a **Glue Studio** notebook; before you can start using your notebook you *must* start an interactive session.

## Available Magics
|          Magic              |   Type       |                                                                        Description                                                                        |
|-----------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| %%configure                 |  Dictionary  |  A json-formatted dictionary consisting of all configuration parameters for a session. Each parameter can be specified here or through individual magics. |
| %profile                    |  String      |  Specify a profile in your aws configuration to use as the credentials provider.                                                                          |
| %iam_role                   |  String      |  Specify an IAM role to execute your session with.                                                                                                        |
| %region                     |  String      |  Specify the AWS region in which to initialize a session                                                                                                  |
| %session_id                 |  String      |  Returns the session ID for the running session.                                                                                                          |
| %connections                |  List        |  Specify a comma separated list of connections to use in the session.                                                                                     |
| %additional_python_modules  |  List        |  Comma separated list of pip packages, s3 paths or private pip arguments.                                                                                 |
| %extra_py_files             |  List        |  Comma separated list of additional Python files from S3.                                                                                                 |
| %extra_jars                 |  List        |  Comma separated list of additional Jars to include in the cluster.                                                                                       |
| %number_of_workers          |  Integer     |  The number of workers of a defined worker_type that are allocated when a job runs. worker_type must be set too.                                          |
| %worker_type                |  String      |  Standard, G.1X, *or* G.2X. number_of_workers must be set too. Default is G.1X                                                                            |
| %glue_version               |  String      |  The version of Glue to be used by this session. Currently, the only valid options are 2.0 and 3.0 (eg: %glue_version 2.0)                                |
| %security_config            |  String      |  Define a security configuration to be used with this session.                                                                                            |
| %sql                        |  String      |  Run SQL code. All lines after the initial %%sql magic will be passed as part of the SQL code.                                                            |
| %streaming                  |  String      |  Changes the session type to Glue Streaming.                                                                                                              |
| %etl                        |  String      |   Changes the session type to Glue ETL.                                                                                                                   |
| %status                     |              |  Returns the status of the current Glue session including its duration, configuration and executing user / role.                                          |
| %stop_session               |              |  Stops the current session.                                                                                                                               |
| %list_sessions              |              |  Lists all currently running sessions by name and ID.                                                                                                     |
| %spark_conf                 |  String      |  Specify custom spark configurations for your session. E.g. %spark_conf spark.serializer=org.apache.spark.serializer.KryoSerializer                       |

In [None]:
%number_of_workers 2

In [44]:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)




## Create DynamicFrame From JSON Data From AWS S3

In [46]:
dyf_customer = glueContext.create_dynamic_frame_from_catalog('customer_raw','customer_orders_with_address_json')
s3_path = "s3://your_s3_bucket/raw/customer/customer_orders_with_address_json/'
dyf_customer_from_s3= glueContext.create_dynamic_frame_from_options(connection_type= 's3',
                                                               connection_options={"paths": [s3_path]},
                                                               format='json')

dyf_customer.toDF().show(5)
dyf_customer_from_s3.toDF().show(5)

+--------+--------------------+------------+----------+--------------------+
|order_id|         customer_id|total_amount|order_date|             address|
+--------+--------------------+------------+----------+--------------------+
|       1|ef48a4d0-b794-4e0...|    $1306.10|2022-08-22|[[Nedašov, 2 Onei...|
|       2|f4711520-c6f2-47f...|    $1388.30|2022-02-03|[[Hiseti, 6959 Fa...|
|       3|4d4a7f9e-15eb-431...|    $1610.24|2022-07-30|[[Giraldo, 47332 ...|
|       4|5e4fabb7-a877-4bd...|     $171.71|2021-10-27|                  []|
|       5|1dca0bc8-0051-4be...|    $1415.38|2021-12-08|[[Inya, 9054 Red ...|
+--------+--------------------+------------+----------+--------------------+
only showing top 5 rows

+--------+--------------------+------------+----------+--------------------+
|order_id|         customer_id|total_amount|order_date|             address|
+--------+--------------------+------------+----------+--------------------+
|       1|ef48a4d0-b794-4e0...|    $1306.10|2022-08

## Unest JSON in Dynamic Frame

In [48]:
unested = dyf_customer.relationalize('root','s3://adriano-datalake-us-east-1/raw/customer/customer_orders_with_address_temp/')
unested.keys()
unested.select("root_address").toDF().show()
unested.select("root").toDF().show()

+---+-----+------------------+--------------------------+
| id|index|  address.val.city|address.val.street_address|
+---+-----+------------------+--------------------------+
|  1|    0|           Nedašov|           2 Oneill Center|
|  1|    1|Klášterec nad Ohří|      13 Holy Cross Center|
|  1|    2|            Arnhem|        39845 Thierer Hill|
|  1|    3|           Windsor|      5 Ronald Regan Place|
|  2|    0|            Hiseti|         6959 Farmco Court|
|  2|    1|              Bahe|         4069 Surrey Alley|
|  2|    2|          Ayang-ni|       8290 Carpenter Park|
|  2|    3|         Kiukainen|         9398 Lerdahl Lane|
|  3|    0|           Giraldo|         47332 Fordem Road|
|  4| null|                  |                          |
|  5|    0|              Inya|      9054 Red Cloud Ju...|
|  5|    1|         Vera Cruz|        3742 Mcguire Trail|
|  5|    2|          Honolulu|              1 Hudson Way|
|  5|    3|      Sumpur Kudus|      316 Pepper Wood S...|
|  5|    4|   

## Join Create Dataframes from DynamicFrameCollection

In [None]:
dyf_root_address = unested.select("root_address")
dyf_root = unested.select("root")

In [None]:
dyf_joined = dyf_root.join(paths1=['address'], paths2=['id'], frame2= dyf_root_address)

In [49]:
dyf_joined.toDF().show()

+--------------------------+-------+------------+--------------------+--------+-----+----------------+----------+---+
|address.val.street_address|address|total_amount|         customer_id|order_id|index|address.val.city|order_date| id|
+--------------------------+-------+------------+--------------------+--------+-----+----------------+----------+---+
|           8 Rowland Alley|     88|    $1664.09|6da36978-2b4c-41a...|      88|    0|         Setúbal|2022-03-15| 88|
|      40 High Crossing ...|     88|    $1664.09|6da36978-2b4c-41a...|      88|    1|       Iwierzyce|2022-03-15| 88|
|            67 Darwin Hill|    253|     $453.42|0f52a81a-0781-4f9...|     253|    0|          Crasto|2022-04-02|253|
|            8 Fuller Drive|    253|     $453.42|0f52a81a-0781-4f9...|     253|    1|           Yunlu|2022-04-02|253|
|            52 Dayton Lane|    253|     $453.42|0f52a81a-0781-4f9...|     253|    2|    Az̧ Z̧alī‘ah|2022-04-02|253|
|          872 Straubel Way|    275|    $1548.06|df52192

In [51]:
dyf_customer.count()

1000


In [52]:
dyf_joined.count()

2666
