GitHub - arvindcb-2023/bigdata

YouTube Trending video analysis using Apache Spark

Dataset link: https://www.kaggle.com/datasets/datasnaek/youtube-new

Description: This project takes the trending videos from YouTube using the dataset. The dataset gets the trending videos from the YouTube access API. The aim of this project is to analyze and provide insights on the trending videos in Canada to provide an insight on the kind of videos to be made and gives an idea to the content creator.

Team Members:

Arvind Boominathan (1212299)
Renjith Chacko (1213107)
Vignesh Selvaraju (1225705)

Technologies used:

Hadoop - for clustering 3 nodes and perform the analysis
Pyspark - for performing the analysis
MongoDB - for storing and fetching the video data/category data

Requirements:

Hadoop - Install the latest version of Hadoop for clustering and creating master node and worker node
Java 8 - Since Hadoop uses Java, install Java 8
MongoDB - To set the database, install the latest version
Pyspark - Install this to perform analysis using Python.

Make sure to add the given json files in MongoDB with the collection name "bigdata" to give the desired results

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Big_Data_Notebook.ipynb		Big_Data_Notebook.ipynb
CA_category_id.json		CA_category_id.json
bigdata.CAVideos.json		bigdata.CAVideos.json
readme.MD		readme.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Trending video analysis using Apache Spark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube Trending video analysis using Apache Spark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages