## Install Glue Spark Kernel 

jupyter kernelspec install aws_glue_interactive_sessions_kernel/glue_spark --user

/home/ubuntu/.local/share/jupyter/kernels/glue_spark

In [2]:
%help


# Available Magic Commands

## Sessions Magic

----
    %help                             Return a list of descriptions and input types for all magic commands. 
    %profile            String        Specify a profile in your aws configuration to use as the credentials provider.
    %region             String        Specify the AWS region in which to initialize a session. 
                                      Default from ~/.aws/config on Linux or macOS, 
                                      or C:\Users\ USERNAME \.aws\config" on Windows.
    %idle_timeout       Int           The number of minutes of inactivity after which a session will timeout. 
                                      Default: 2880 minutes (48 hours).
    %session_id_prefix  String        Define a String that will precede all session IDs in the format 
                                      [session_id_prefix]-[session_id]. If a session ID is not provided,
                                      a random UUID will be generated.
    %status                           Returns the status of the current Glue session including its duration, 
                                      configuration and executing user / role.
    %session_id                       Returns the session ID for the running session. 
    %list_sessions                    Lists all currently running sessions by ID.
    %stop_session                     Stops the current session.
    %glue_version       String        The version of Glue to be used by this session. 
                                      Currently, the only valid options are 2.0 and 3.0. 
                                      Default: 2.0.
----

## Selecting Job Types

----
    %streaming          String        Sets the session type to Glue Streaming.
    %etl                String        Sets the session type to Glue ETL.
    %glue_ray           String        Sets the session type to Glue Ray.
----

## Glue Config Magic 
*(common across all job types)*

----

    %%configure         Dictionary    A json-formatted dictionary consisting of all configuration parameters for 
                                      a session. Each parameter can be specified here or through individual magics.
    %iam_role           String        Specify an IAM role ARN to execute your session with.
                                      Default from ~/.aws/config on Linux or macOS, 
                                      or C:\Users\%USERNAME%\.aws\config` on Windows.
    %number_of_workers  int           The number of workers of a defined worker_type that are allocated 
                                      when a session runs.
                                      Default: 5.
    %additional_python_modules  List  Comma separated list of additional Python modules to include in your cluster 
                                      (can be from Pypi or S3).
----

                                      
## Magic for Spark Jobs (ETL & Streaming)

----
    %worker_type        String        Set the type of instances the session will use as workers. 
                                      ETL and Streaming support G.1X and G.2X. 
                                      Default: G.1X.
    %connections        List          Specify a comma separated list of connections to use in the session.
    %extra_py_files     List          Comma separated list of additional Python files From S3.
    %extra_jars         List          Comma separated list of additional Jars to include in the cluster.
    %spark_conf         String        Specify custom spark configurations for your session. 
                                      E.g. %spark_conf spark.serializer=org.apache.spark.serializer.KryoSerializer
----
                                      
## Magic for Ray Job

----
    %min_workers        Int           The minimum number of workers that are allocated to a Ray job. 
                                      Default: 1.
    %object_memory_head Int           The percentage of free memory on the instance head node after a warm start. 
                                      Minimum: 0. Maximum: 100.
    %object_memory_worker Int         The percentage of free memory on the instance worker nodes after a warm start. 
                                      Minimum: 0. Maximum: 100.
----

## Action Magic

----

    %%sql               String        Run SQL code. All lines after the initial %%sql magic will be passed
                                      as part of the SQL code.  
----



In [4]:
%iam_role arn:aws:iam::xxx:role/RoleForGlueNotebook

You are already connected to a glueetl session 29de39e3-cdc9-44de-8b11-e88a7fb49a5f.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Current iam_role is arn:aws:iam::371913002065:role/TeamRole
iam_role has been set to arn:aws:iam::371913002065:role/RoleForGlueNotebook.


In [6]:
%profile default

You are already connected to a glueetl session 29de39e3-cdc9-44de-8b11-e88a7fb49a5f.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Previous profile: default
Setting new profile to: default


In [8]:
%region "us-east-1"

You are already connected to a glueetl session 29de39e3-cdc9-44de-8b11-e88a7fb49a5f.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Previous region: us-east-1
Setting new region to: us-east-1
Reauthenticating Glue client with new region: us-east-1
IAM role has been set to arn:aws:iam::371913002065:role/RoleForGlueNotebook. Reauthenticating.
Authenticating with profile=default
glue_role_arn defined by user: arn:aws:iam::371913002065:role/RoleForGlueNotebook
Authentication done.
Region is set to: us-east-1


In [3]:
print("hello")

hello


In [4]:
from awsglue.context import GlueContext
from pyspark.context import SparkContext




In [5]:
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)




In [6]:
columns = ["language","users_count"]
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]




In [7]:
rdd = sc.parallelize(data)




In [8]:
df = rdd.toDF()




In [9]:
df.show()

+------+------+
|    _1|    _2|
+------+------+
|  Java| 20000|
|Python|100000|
| Scala|  3000|
+------+------+


In [10]:
spark = glueContext.spark_session




In [11]:
df = spark.read.parquet("s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet")




In [12]:
df.show(10)

+-----------+-----------+--------------+----------+--------------+--------------------+-----------+-------------+-----------+----+-----------------+--------------------+--------------------+-----------+----+
|marketplace|customer_id|     review_id|product_id|product_parent|       product_title|star_rating|helpful_votes|total_votes|vine|verified_purchase|     review_headline|         review_body|review_date|year|
+-----------+-----------+--------------+----------+--------------+--------------------+-----------+-------------+-----------+----+-----------------+--------------------+--------------------+-----------+----+
|         US|   51114360|R2O1ZWACRT6PGH|B00004U3M4|     988649596|Adult Robin Costu...|          1|            9|         17|   N|                Y|   Not as advertised|This is an adult ...| 2001-10-10|2001|
|         US|   17052567|R26I9SLC2PZTW7|B00WGDJJM4|     973297679|LookbookStore Wom...|          5|            2|          2|   N|                Y|Cute and Sophisti...

In [13]:
test = glueContext.spark_session




In [14]:
print(glueContext.spark_session)

<pyspark.sql.session.SparkSession object at 0x7f477fedc910>


In [15]:
from pyspark.sql import SparkSession




In [16]:
test = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()




In [17]:
output = test.read.parquet("s3://amazon-reviews-pds/parquet/product_category=Apparel/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet")




In [18]:
output.show(10)

+-----------+-----------+--------------+----------+--------------+--------------------+-----------+-------------+-----------+----+-----------------+--------------------+--------------------+-----------+----+
|marketplace|customer_id|     review_id|product_id|product_parent|       product_title|star_rating|helpful_votes|total_votes|vine|verified_purchase|     review_headline|         review_body|review_date|year|
+-----------+-----------+--------------+----------+--------------+--------------------+-----------+-------------+-----------+----+-----------------+--------------------+--------------------+-----------+----+
|         US|   51114360|R2O1ZWACRT6PGH|B00004U3M4|     988649596|Adult Robin Costu...|          1|            9|         17|   N|                Y|   Not as advertised|This is an adult ...| 2001-10-10|2001|
|         US|   17052567|R26I9SLC2PZTW7|B00WGDJJM4|     973297679|LookbookStore Wom...|          5|            2|          2|   N|                Y|Cute and Sophisti...

In [21]:
df2 = glueContext.create_dynamic_frame.from_catalog(database="default", table_name="amazon_reviews_parquet_table")




In [22]:
df2.show(10)

{"marketplace": "US", "customer_id": "5994119", "review_id": "R3F8EK3AZFR3K7", "product_id": "B00Q84KA7U", "product_parent": "961074580", "product_title": "Instantly Ageless Facelift In A Box- Anti Wrinkle Microcream (1 Box x 25 Vials)", "star_rating": 1, "helpful_votes": 1, "total_votes": 1, "vine": "N", "verified_purchase": "Y", "review_headline": "I used the product and it works accept for when ...", "review_body": "I used the product and it works accept for when it dries it leaves a crust under the eye where you put it.", "review_date": 2015-05-02, "year": 2015, "product_category": "Health_&_Personal_Care"}
{"marketplace": "US", "customer_id": "11277276", "review_id": "R1189S5JQK28YL", "product_id": "B001V9LO2W", "product_parent": "346067075", "product_title": "California Exotics Waterproof Jack Rabbit with Floating Beads", "star_rating": 3, "helpful_votes": 2, "total_votes": 3, "vine": "N", "verified_purchase": "N", "review_headline": "could be better", "review_body": "The overall