Skip to content

Sigrid242/Data-Analysis-using-AWS-services-Athena-Glue-S3-IAM-Quicksight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 

Repository files navigation

Data-analysis-using-AWS-services-Athena-Glue-S3-IAM-Quicksight

This is an end-to-end simple data analytics solution using AWS services. From uploading the csv file to S3 bucket to visualizing results in Quicksight.The dataset used in this project is the data science job salaries from kaggle 'kaggle.com/datasets/ruchi798/data-science-job-salaries'.

Objective

The main objective of this project is to identify the top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title.

Dataset

The dataset contains variables which include work_year, experience_level, enployement_type, job_type, salary, salary_currency, salary_in_usd, employee_residence, remote_ratio, company_location, company_size.The data analysis process will be done as follows:

Step 1- First, we create an IAM user to grant access permission to s3. In the search bar we type IAM > users > add users

iam8

1a- Set user details and access type then next permissions

iam9

1b- We proceed by choosing attach existing policies directly since we already have a policy set up.

Capture 2

1c-Review all the details and create user iam13

1d- User successfully created

iam14(hide account   access key)

Step 2- S3 buckets are created. The data-science-salaries-bucket will hold the raw file, while the data-science-salaries-bucket-result will hold the query results from Athena. Capture1

2a- The csv file is uploaded to the data-science-salaries-bucket

Capture9

Step 3- Moving on to Athena. Before we can create our table we need to choose the bucket where the output query will be sent. In the Athena query editor we select settings > manage > browse s3 to choose the appropriate bucket.

3a- In Athena data catalogue, we select >create table > AWS glue crawler > add crawler to retrieve data information schema automatically. Capture12

3b- Crawler succesfully created

Capture (crawler was created) 14

Step 4- Data query is performed in Athena, then results are loaded to data-science-bucket-result athena_queries 1PNG

Step 5- Now quicksight needs to access S3 to build report. But before quicksight can read the s3 bucket, we have to make sure it has permission to do so. We navigate the account section by clicking on top right > manage quicksight > security & permissions > manage > select s3 bucket.

quicksight_bucket_permission

5a- Next we set up a new data source to access S3 from quicksight new analysis > new dataset > S3 > upload Json manifest file > importe to spice

5b- After the data is imported to spice, we create a report in Quicksight. Our interest was to identify top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title. quicksight__

Conclusion

Aws provides a suite of powerful tools to analyze data effectively. By using Athena, Glue, IAM, and Quicksight, businesses can gained valuable insights into their data, make informed decisions and optimize processes.

Releases

No releases published

Packages

No packages published