It analyse YouTube data and gives most popular genres on YouTube based on views and uploads.
-
GBvideos.csv (Dataset)
-
YouTube Data Analysis (Implementation MapReduce model to find the most popular genre on YouTube based on uploads)
-
Top Viewed Categories (Implementation MapReduce model to find the most popular genre on YouTube based on views)
-
Top Categories Output (Output files)
The output is obtained by creating a .jar
file using the following lines of code on Linux terminal
- Make an input directory in Hadoop filesystem:
hdfs dfs -mkdir /YouTubeInput
- Put input data from Linux filesystem to Hadoop DFS:
hdfs dfs -put /Downloads/YouTubeDataAnalysis/GBvideos.csv /YouTubeInput
- Create and execute a jar file and save results in ouptut directory in hdfs:
hadoop jar /home/hadoop/TopViewedCategories.jar TopCategoryDriver /YouTubeInput /YouTubeOutput
- To view results:
hdfs dfs -cat /YouTubeOutput/*
- Get results from Hadoop DFS to Linux filesystem:
hdfs dfs -get /YouTubeOutput/* /Downloads/YouTubeAnalysis/TopCategoryOutput