Dataset link: https://www.kaggle.com/datasets/datasnaek/youtube-new
Description: This project takes the trending videos from YouTube using the dataset. The dataset gets the trending videos from the YouTube access API. The aim of this project is to analyze and provide insights on the trending videos in Canada to provide an insight on the kind of videos to be made and gives an idea to the content creator.
Team Members:
- Arvind Boominathan (1212299)
- Renjith Chacko (1213107)
- Vignesh Selvaraju (1225705)
Technologies used:
- Hadoop - for clustering 3 nodes and perform the analysis
- Pyspark - for performing the analysis
- MongoDB - for storing and fetching the video data/category data
Requirements:
- Hadoop - Install the latest version of Hadoop for clustering and creating master node and worker node
- Java 8 - Since Hadoop uses Java, install Java 8
- MongoDB - To set the database, install the latest version
- Pyspark - Install this to perform analysis using Python.
Make sure to add the given json files in MongoDB with the collection name "bigdata" to give the desired results