Skip to content

#User can extract a YouTube Channel's Data by using the Channel's ID and can analyze the data using a set of questions provided # | YouTube | API | Python | MySQL | Pandas | SQLAlchemy | Streamlit |

Abinaya-Ganesh/YouTube_Data_Harvesting_and_Warehousing

Repository files navigation

YouTube_Data_Harvesting_and_Warehousing

User can extract a YouTube Channel's Data by providing the Channel's ID and can analyze the data using a set of questions provided.

Introduction

YouTube is a huge Video Streaming Platform where lot of people post their creative videos which are either to the likes of audience or of use to the target audience. It also generates a lot of data with all the videos that are being posted by content creators and data is also generated when people interact through comments.

Through this project, we will see how to extract data from Youtube using YouTube API in Python, then store them in suitable format in MySQL for the end user to analyze the data using a set of questions provided to them on the Streamlit User Interface.

cap1

User Guide

  1. Data Collection Tab

    a. Open a YouTube channel's Homepage. Click on About ---> Share channel ---> Copy channel ID. This is how you get a YouTube Channel's ID

    b. Enter the Channel Id and click the Get data button

  2. Data Migration Tab

    To Migrate the channel date to MySQL and store it, click the Migrate data button

  3. This project can collect upto 10 YouTube channels' data unless and until the user doesn't exhaust the daily YouTube API quota

  4. Data Analysis Tab

    Users are put forward with 10 questions where they can choose one to get a resultant analysed data

Developer Guide

  1. Tools required

    β€’ Python

    β€’ Visual Studio Code

    β€’ MySQL workbench

    β€’ YouTube API key from Google Developers console

  2. Python libraries to install

    β€’ google-api-python-client

    β€’ pandas

    β€’ mysql-connector-python

    β€’ SQLAlchemy

    β€’ streamlit

    β€’ plotly-express

    β€’ isodate

  3. Modules to import

    a. Google API Library

       β€’	import googleapiclient.discovery
    

    b. ISO Date Library to convert ISO time string into a time object

       β€’	import isodate
    

    c. Pandas Library

       β€’	import pandas as pd
    

    d. MySQL and SQLAlchemy Libraries

       β€’	from mysql.connector import connect
    
       β€’	from urllib.parse import quote
    
       β€’	from sqlalchemy import create_engine
    
       β€’	import sqlalchemy
    

    e. UI Dashboard Libraries

       β€’	import streamlit as st
    
       β€’	import plotly.express as px
    
  4. Process

    β€’	With the help of YouTube API developer console, extract the required channel, video and comments data of a YouTube channel
    
    β€’	Store the extracted data temporarily in a pandas DataFrame
    
    β€’	Migrate the data and store it in a SQL database
    
    β€’	Use SQLAlchemy tool to query from the database as per the question the user selects
    
    β€’	Create a Streamlit dashboard where the user can input channel ID and get the data
    
    β€’	Display the analysed data also using Streamlit data visualization tools
    

NOTE:

I have created a multipage Streamlit app. Except the menu.py, other files should be in a folder named 
"pages" under the directory where menu.py is saved.

About

#User can extract a YouTube Channel's Data by using the Channel's ID and can analyze the data using a set of questions provided # | YouTube | API | Python | MySQL | Pandas | SQLAlchemy | Streamlit |

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages