
# Glue Studio Notebook
You are now running a **Glue Studio** notebook; before you can start using your notebook you *must* start an interactive session.

## Available Magics
|          Magic              |   Type       |                                                                        Description                                                                        |
|-----------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| %%configure                 |  Dictionary  |  A json-formatted dictionary consisting of all configuration parameters for a session. Each parameter can be specified here or through individual magics. |
| %profile                    |  String      |  Specify a profile in your aws configuration to use as the credentials provider.                                                                          |
| %iam_role                   |  String      |  Specify an IAM role to execute your session with.                                                                                                        |
| %region                     |  String      |  Specify the AWS region in which to initialize a session                                                                                                  |
| %session_id                 |  String      |  Returns the session ID for the running session.                                                                                                          |
| %connections                |  List        |  Specify a comma separated list of connections to use in the session.                                                                                     |
| %additional_python_modules  |  List        |  Comma separated list of pip packages, s3 paths or private pip arguments.                                                                                 |
| %extra_py_files             |  List        |  Comma separated list of additional Python files from S3.                                                                                                 |
| %extra_jars                 |  List        |  Comma separated list of additional Jars to include in the cluster.                                                                                       |
| %number_of_workers          |  Integer     |  The number of workers of a defined worker_type that are allocated when a job runs. worker_type must be set too.                                          |
| %worker_type                |  String      |  Standard, G.1X, *or* G.2X. number_of_workers must be set too. Default is G.1X                                                                            |
| %glue_version               |  String      |  The version of Glue to be used by this session. Currently, the only valid options are 2.0 and 3.0 (eg: %glue_version 2.0)                                |
| %security_config            |  String      |  Define a security configuration to be used with this session.                                                                                            |
| %sql                        |  String      |  Run SQL code. All lines after the initial %%sql magic will be passed as part of the SQL code.                                                            |
| %streaming                  |  String      |  Changes the session type to Glue Streaming.                                                                                                              |
| %etl                        |  String      |   Changes the session type to Glue ETL.                                                                                                                   |
| %status                     |              |  Returns the status of the current Glue session including its duration, configuration and executing user / role.                                          |
| %stop_session               |              |  Stops the current session.                                                                                                                               |
| %list_sessions              |              |  Lists all currently running sessions by name and ID.                                                                                                     |
| %spark_conf                 |  String      |  Specify custom spark configurations for your session. E.g. %spark_conf spark.serializer=org.apache.spark.serializer.KryoSerializer                       |

In [2]:
%number_of_workers 2

You are already connected to session fb60ca5b-b5d3-4e82-872c-6c55ddb8f804. Your change will not reflect in the current session, but it will affect future new sessions. 

Previous number of workers: 5
Setting new number of workers to: 2


In [1]:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)




In [33]:
s3_path = 's3://s3-bucket/prefix/'
dyf_customer = glueContext.create_dynamic_frame_from_options(connection_type= 's3',
                                                               connection_options={"paths": [s3_path]},
                                                               format='csv', format_options = {"withHeader": True, "optimizePerformance": True})




In [34]:
dyf_customer.toDF().show(5)

+--------+--------------------+------------+----------+
|order_id|         customer_id|total_amount|order_date|
+--------+--------------------+------------+----------+
|    1000|43fb5b29-3b6d-4a3...|      642.66|2022-08-24|
|     999|d510e386-4d2b-416...|     1324.86|2022-08-27|
|     998|a743267d-544d-464...|       76.37|2022-08-24|
|     997|e2ac237c-fae2-49e...|      981.05|2022-08-26|
|     996|ace8c4f9-5f10-497...|      750.84|2022-08-27|
+--------+--------------------+------------+----------+
only showing top 5 rows


## Rename Columns

In [36]:
dyf_rename_column=dyf_customer.rename_field(
    oldName = 'total_amount',
    newName = 'purchase_total',
    transformation_ctx = 'rename_one_column')
dyf_rename_column.toDF().show(5)

+--------+--------------------+----------+--------------+
|order_id|         customer_id|order_date|purchase_total|
+--------+--------------------+----------+--------------+
|    1000|43fb5b29-3b6d-4a3...|2022-08-24|        642.66|
|     999|d510e386-4d2b-416...|2022-08-27|       1324.86|
|     998|a743267d-544d-464...|2022-08-24|         76.37|
|     997|e2ac237c-fae2-49e...|2022-08-26|        981.05|
|     996|ace8c4f9-5f10-497...|2022-08-27|        750.84|
+--------+--------------------+----------+--------------+
only showing top 5 rows


## Rename Multiple Columns 

In [38]:
mapping = [("order_id", "string", "id", "int"),
                ("customer_id", "string", "customer_id", "string"),
                ("total_amount", "string", "total_purchase_amount", "double"),
                ("order_date", "string", "order_date", "string")]
dyf_rename_m_columns = dyf_customer.apply_mapping(mapping,transformation_ctx = 'rename_multiple_columns')
dyf_rename_m_columns.toDF().show(5)

+----+--------------------+---------------------+----------+
|  id|         customer_id|total_purchase_amount|order_date|
+----+--------------------+---------------------+----------+
|1000|43fb5b29-3b6d-4a3...|               642.66|2022-08-24|
| 999|d510e386-4d2b-416...|              1324.86|2022-08-27|
| 998|a743267d-544d-464...|                76.37|2022-08-24|
| 997|e2ac237c-fae2-49e...|               981.05|2022-08-26|
| 996|ace8c4f9-5f10-497...|               750.84|2022-08-27|
+----+--------------------+---------------------+----------+
only showing top 5 rows
