Skip to content

arvindcb-2023/bigdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Trending video analysis using Apache Spark

Dataset link: https://www.kaggle.com/datasets/datasnaek/youtube-new

Description: This project takes the trending videos from YouTube using the dataset. The dataset gets the trending videos from the YouTube access API. The aim of this project is to analyze and provide insights on the trending videos in Canada to provide an insight on the kind of videos to be made and gives an idea to the content creator.


Team Members:

  1. Arvind Boominathan (1212299)
  2. Renjith Chacko (1213107)
  3. Vignesh Selvaraju (1225705)

Technologies used:

  1. Hadoop - for clustering 3 nodes and perform the analysis
  2. Pyspark - for performing the analysis
  3. MongoDB - for storing and fetching the video data/category data

Requirements:

  1. Hadoop - Install the latest version of Hadoop for clustering and creating master node and worker node
  2. Java 8 - Since Hadoop uses Java, install Java 8
  3. MongoDB - To set the database, install the latest version
  4. Pyspark - Install this to perform analysis using Python.

Make sure to add the given json files in MongoDB with the collection name "bigdata" to give the desired results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors