Skip to content

1️⃣ Querying Parquet file from S3 using AwsWrangler. 2️⃣ Querying from Redshift tables using Glue & AwsWrangler

Notifications You must be signed in to change notification settings

datahealer/jupyter-s3-parquet-redshift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Jupyter, Python, Pandas, S3, Parquet, Redshift

The examples demonatrates:
1️⃣ Querying Parquet file from S3 using AwsWrangler.
2️⃣ Querying from Redshift tables using Glue & AwsWrangler.

Topics

Requirements

  • AWS Account: To create and use AWS services, you need to create an AWS account.

  • AWS Credentials: In order to manage your services from command line tools, you need an aws_access_key_id and an aws_secret_access_key. Creating a new user for IaC purposes is recommended. For this, you can create a new user with IAM. For this;

    1. Go to IAM
    2. Under Users click 'Add User'
    3. Give a. username (like terraform_user)
    4. For credential type select 'Access key - Programmatic access' and click next
    5. Click 'Create Group', specify a group name and select 'AdministratorAccess' policy.
    6. Click Review and create user. This user has a programmatic access and admin permissions.

    After you create the user, go to Users and select the user you have created. Go to 'Security Credentials' and click 'Create access key'. This will give an access key id and a secret access key. Save and dont share these credentials. You can not see access key again after you close this window.

  • Make Redshift Cluster Public: (at your own risk) https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cluster-private-public/

About

1️⃣ Querying Parquet file from S3 using AwsWrangler. 2️⃣ Querying from Redshift tables using Glue & AwsWrangler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published