# AWS Glue Studio Notebook
##### You are now running a AWS Glue Studio notebook; To start using your notebook you need to start an AWS Glue Interactive Session.


#### Optional: Run this cell to see available notebook commands ("magics").


In [None]:
%help

####  Run this cell to set up and start your interactive session.


In [1]:
%idle_timeout 2880
%glue_version 3.0
%worker_type G.1X
%number_of_workers 5

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

Welcome to the Glue Interactive Sessions Kernel
For more information on available magic commands, please type %help in any new cell.

Please view our Getting Started page to access the most up-to-date information on the Interactive Sessions kernel: https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html
Installed kernel version: 0.37.0 
Current idle_timeout is 2880 minutes.
idle_timeout has been set to 2880 minutes.
Setting Glue version to: 3.0
Previous worker type: G.1X
Setting new worker type to: G.1X
Previous number of workers: 5
Setting new number of workers to: 5
Authenticating with environment variables and user-defined glue_role_arn: arn:aws:iam::494212176882:role/LabRole
Trying to create a Glue session for the kernel.
Worker Type: G.1X
Number of Workers: 5
Session ID: 3cf086c4-4ffe-4e33-aff7-4b721428249e
Job Type: glueetl
Applying the following default arguments:
--glue_kernel_version 0.37.0
--enable-glue-datacatalog true
Waiting for session 3cf086c4-4ffe-4e33-aff7

#### Example: Create a DynamicFrame from a table in the AWS Glue Data Catalog and display its schema


In [2]:
dyf = glueContext.create_dynamic_frame.from_catalog(database='trabajo1', table_name='capita')
dyf.printSchema()

root
|-- country name: string
|-- country code: string
|-- 1986: double
|-- 1987: double
|-- 1988: double
|-- 1989: double
|-- 1990: double
|-- 1991: double
|-- 1992: double
|-- 1993: double
|-- 1994: double
|-- 1995: double
|-- 1996: double
|-- 1997: double
|-- 1998: double
|-- 1999: double
|-- 2000: double
|-- 2001: double
|-- 2002: double
|-- 2003: double
|-- 2004: double
|-- 2005: double
|-- 2006: double
|-- 2007: double
|-- 2008: double
|-- 2009: double
|-- 2010: double
|-- 2011: double
|-- 2012: double
|-- 2013: double
|-- 2014: double
|-- 2015: string
|-- 2016: string
|-- 2017: string
|-- 2018: string
|-- 1960: double
|-- 1961: double
|-- 1962: double
|-- 1963: double
|-- 1964: double
|-- 1965: double
|-- 1966: double
|-- 1967: double
|-- 1968: double
|-- 1969: double
|-- 1970: double
|-- 1971: double
|-- 1972: double
|-- 1973: double
|-- 1974: double
|-- 1975: double
|-- 1976: double
|-- 1977: double
|-- 1978: double
|-- 1979: double
|-- 1980: double
|-- 1981: double
|-- 1982: 

In [4]:
dyf2 = dyf.drop_fields(paths=["1986", "1987","2000", "2001"])
dyf2.printSchema()

root
|-- country name: string
|-- country code: string
|-- 1988: double
|-- 1989: double
|-- 1990: double
|-- 1991: double
|-- 1992: double
|-- 1993: double
|-- 1994: double
|-- 1995: double
|-- 1996: double
|-- 1997: double
|-- 1998: double
|-- 1999: double
|-- 2002: double
|-- 2003: double
|-- 2004: double
|-- 2005: double
|-- 2006: double
|-- 2007: double
|-- 2008: double
|-- 2009: double
|-- 2010: double
|-- 2011: double
|-- 2012: double
|-- 2013: double
|-- 2014: double
|-- 2015: string
|-- 2016: string
|-- 2017: string
|-- 2018: string
|-- 1960: double
|-- 1961: double
|-- 1962: double
|-- 1963: double
|-- 1964: double
|-- 1965: double
|-- 1966: double
|-- 1967: double
|-- 1968: double
|-- 1969: double
|-- 1970: double
|-- 1971: double
|-- 1972: double
|-- 1973: double
|-- 1974: double
|-- 1975: double
|-- 1976: double
|-- 1977: double
|-- 1978: double
|-- 1979: double
|-- 1980: double
|-- 1981: double
|-- 1982: double
|-- 1983: double
|-- 1984: double
|-- 1985: double


TypeError: RenameField() takes no arguments


#### Example: Convert the DynamicFrame to a Spark DataFrame and display a sample of the data


#### Example: Write the data in the DynamicFrame to a location in Amazon S3 and a table for it in the AWS Glue Data Catalog


In [5]:
sink = glueContext.getSink(
    connection_type="s3", 
    path="s3://amaldonadelab4/Trabajo1/",
    enableUpdateCatalog=True)
sink.setFormat("csv")
sink.setCatalogInfo(catalogDatabase="trabajo1", catalogTableName="capita_new")
sink.writeFrame(dyf2)

<awsglue.dynamicframe.DynamicFrame object at 0x7f039073d210>
