YouTube Video Analytics using Spark, Azure and Kafka

Project Description

YouTube uses few key factors like measuring user interactions; number of views, shares, comments, and likes etc. to rank the videos as the top trending on the platform. The top trending list of videos mostly belongs to movies, music, celebrities, reality shows, TV shows, etc. This dataset – USvideos.csv - includes a few years of daily trending YouTube videos. The dataset category_title.csv has category_id field and corresponding title mapping.

Objective

Use Spark on a cloud service to do exploratory analysis and extract actionable insights in a proper format with API for retrieval. Please note that the dataset is not available on the cloud service you'll use, thus it needs to be streamed to the cloud service.

Dataset:

Insights to be collected: Insights.md

Solution Overview

We are going to use the Confluent Platform Community version and use Kafka Connect’s CSV Source Connector to read the CSV files and publish their data along with schema into their respective topics
With ADLS Gen2 Sink Connector as consumer, we’ll read data from the published topics and convert into Avro format files which are going to be stored in an ADLS Gen2 container named ‘topics’ created automatically by the connector
Our Scala based Spark application running on an Azure HDInsight Spark cluster will read the Avro data, generate insights and store the results as JSON files in ADLS Gen2 container. Intellij IDEA being our development IDE, it is convenient to use the Azure Toolkit for Intellij to remotely run our application in the HDInsight cluster from within the IDE itself.
Finally, we’ll copy the insights JSON files from the ADLS Gen2 container into a Cosmos DB SQL API container using Azure Data Factory
The insights out of the YouTube videos analysis are now available for query using the Cosmos SQL API

Detailed steps of the solution

Please check solution_steps.pdf

License

GNU General Public License v3.0 or later

See LICENSE.txt to see the full text.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
results		results
src		src
.gitignore		.gitignore
Insights.md		Insights.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml
solution_architecture.png		solution_architecture.png
solution_steps.pdf		solution_steps.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouTube Video Analytics using Spark, Azure and Kafka

Project Description

Objective

Solution Overview

Detailed steps of the solution

License

About

Uh oh!

Releases

Packages

Languages

License

abcoep/YoutubeVideoAnalyticsSpark

Folders and files

Latest commit

History

Repository files navigation

YouTube Video Analytics using Spark, Azure and Kafka

Project Description

Objective

Solution Overview

Detailed steps of the solution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages