# AWS Glue Studio Notebook


####  Run this cell to set up and start your interactive session.


In [1]:
%idle_timeout 2880
%glue_version 5.0
%worker_type G.1X
%number_of_workers 5

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

Welcome to the Glue Interactive Sessions Kernel
For more information on available magic commands, please type %help in any new cell.

Please view our Getting Started page to access the most up-to-date information on the Interactive Sessions kernel: https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html
Installed kernel version: 1.0.10 
Current idle_timeout is None minutes.
idle_timeout has been set to 2880 minutes.
Setting Glue version to: 5.0
Previous worker type: None
Setting new worker type to: G.1X
Previous number of workers: None
Setting new number of workers to: 5
Trying to create a Glue session for the kernel.
Session Type: glueetl
Worker Type: G.1X
Number of Workers: 5
Idle Timeout: 2880
Session ID: c607b84c-4408-44f1-80ab-d63eb5ee643c
Applying the following default arguments:
--glue_kernel_version 1.0.10
--enable-glue-datacatalog true
Waiting for session c607b84c-4408-44f1-80ab-d63eb5ee643c to get into ready status...
Session c607b84c-4408-44f1-80ab-d63eb5ee643c 

#### Example: Create a DynamicFrame from a table in the AWS Glue Data Catalog and display its schema


In [2]:
dyf = glueContext.create_dynamic_frame.from_catalog(database='database_tutorial', table_name='incremental')
dyf.printSchema()

root
|-- orderid: string
|-- customer: string
|-- item: string
|-- quantity: string
|-- price: string
|-- orderdate: string
|-- col6: string


#### Example: Convert the DynamicFrame to a Spark DataFrame and display a sample of the data


In [3]:
df = dyf.toDF()
df.show()

+-------+----------+----------+--------+-------+-------------------+----+
|orderid|  customer|      item|quantity|  price|          orderdate|col6|
+-------+----------+----------+--------+-------+-------------------+----+
|OrderID|  Customer|      Item|Quantity|  Price|          OrderDate|NULL|
|  O1001|Customer_4|   Monitor|       3| 1901.7|2025-08-01 11:11:00|NULL|
|  O1002|Customer_2|  Keyboard|       1|1314.52|2025-08-01 01:10:00|NULL|
|  O1003|Customer_2|   Monitor|       1| 498.44|2025-08-01 02:27:00|NULL|
|  O1004|Customer_1|     Mouse|       2| 691.07|2025-08-01 00:59:00|NULL|
|  O1005|Customer_5|    Laptop|       5|1491.08|2025-08-01 02:51:00|NULL|
|  O1006|Customer_2|Headphones|       1|1623.36|2025-08-01 10:47:00|NULL|
|  O1007|Customer_1|  Keyboard|       1|1465.93|2025-08-01 06:15:00|NULL|
|  O1008|Customer_1|Headphones|       1| 994.24|2025-08-01 17:28:00|NULL|
|  O1009|Customer_3|  Keyboard|       3|  99.16|2025-08-01 18:48:00|NULL|
|  O1010|Customer_4|Headphones|       

#### Write the Data to S3

In [4]:
df.write.format("parquet")\
        .mode("overwrite")\
        .option("path","s3://buckettutorialansh/sink_notebook/")\
        .save()

IllegalArgumentException: Expected exactly one path to be specified, but got: .


#### Parameters in Notebook

In [8]:
%%configure
{
  "--para1": "default_value"
}

You are already connected to a glueetl session c607b84c-4408-44f1-80ab-d63eb5ee643c.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


The following configurations have been updated: {'--para1': 'default_value'}


In [5]:
import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['para1'])
print(args['para1'])

GlueArgumentError: the following arguments are required: --para1


#### Parameters in Notebook

Override the params in the Job Run

--para1 my_new_value