AWS Data Pipeline for YouTube Trending videos Dataset

Overview

This project objective is to build a data pipeline using AWS services to cleanse YouTube trending videos dataset from Kaggle by converting semi-structured data to structured format and perform analysis based on the video categories and the trending metrics.

AWS Services Used

Amazon S3: Amazon S3 is an object storage service that provides manufacturing scalability, data availability, security, and performance.
AWS IAM: Identity and access management which enables us to manage access to AWS services and resources securely.
AWS Glue: A serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
AWS Lambda: Lambda is a computing service that allows programmers to run code without creating or managing servers.
AWS Athena: Athena is an interactive query service for S3 in which there is no need to load data it stays in S3.
QuickSight: Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud.

Datasets Used

This Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. There are up to 200 trending videos published every day for many locations. The data for each region is in its own file. The video title, channel title, publication time, tags, views, likes and dislikes, description, and comment count are among the items included in the data. A category_id field, which differs by area, is also included in the JSON file linked to the region.

https://www.kaggle.com/datasets/datasnaek/youtube-new

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Images		Images
Glue_Job_script.py		Glue_Job_script.py
Lambda_function_script.py		Lambda_function_script.py
Quicksight_viz.png		Quicksight_viz.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Data Pipeline for YouTube Trending videos Dataset

Overview

AWS Services Used

Datasets Used

Data Pipeline Architecture

Data Cleansing

About

Releases

Packages

Languages

devallasaitej/Youtube_AWS_Data_Pipeline

Folders and files

Latest commit

History

Repository files navigation

AWS Data Pipeline for YouTube Trending videos Dataset

Overview

AWS Services Used

Datasets Used

Data Pipeline Architecture

Data Cleansing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages