# Data Extraction

##  Step 1: Data is extracted using Youtube API on Appscript, imported as an excel sheet.
//Code for Keyword Search  : Js code

function YouTubeData() {
  var spreadSheet = SpreadsheetApp.getActiveSpreadsheet();
  var activeSheet = spreadSheet.getActiveSheet();
  
  var search = YouTube.Search.list("snippet, id", {q: "Data Science", maxResults: 50});
  var videoIds = search.items.map(item => item.id.videoId).join(",");

  var videoDetails = YouTube.Videos.list("snippet,statistics", {id: videoIds});
  var channelDetails = YouTube.Channels.list("snippet,statistics", {id: videoDetails.items.map(item => item.snippet.channelId).join(",")});

  var data = videoDetails.items.map(video => {
    var channel = channelDetails.items.find(channel => channel.id === video.snippet.channelId);
    return [video.id, video.snippet.title, channel.snippet.title, video.statistics.viewCount, video.statistics.likeCount, video.statistics.commentCount, channel.statistics.subscriberCount];
  });

  activeSheet.getRange(2, 1, data.length, data[0].length).setValues(data);
}


#### Google Apps Script code for a YouTube scraper that uses the YouTube API to gather information about videos related to "data science". The code searches for videos, retrieves the video IDs, and then gets the video information such as the title, channel name, subscriber count, view count, like count, dislike count, and comment count. The retrieved information is then written to a Google Sheet.


# Data Cleaning and Processing

In [40]:
import pandas as pd
import numpy as np

# Visvalization
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker


In [41]:
ds = pd.read_excel(r'C:\Users\Rizwanaa\OneDrive\Desktop\Data_Science.xlsx')

In [42]:
ds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Video_ID          50 non-null     object
 1   Channel_Ttile     50 non-null     object
 2   Channel_Name      50 non-null     object
 3   View_Count        50 non-null     int64 
 4   Like_Count        50 non-null     int64 
 5   Comment_Count     50 non-null     int64 
 6   Subscriber_Count  50 non-null     int64 
dtypes: int64(4), object(3)
memory usage: 2.9+ KB


In [43]:
ds.isnull().any()

Video_ID            False
Channel_Ttile       False
Channel_Name        False
View_Count          False
Like_Count          False
Comment_Count       False
Subscriber_Count    False
dtype: bool

In [44]:
ds = pd.DataFrame(ds)

In [45]:
ds.tail()

Unnamed: 0,Video_ID,Channel_Ttile,Channel_Name,View_Count,Like_Count,Comment_Count,Subscriber_Count
45,Z79AqDouS-Y,How I'd Learn Data Science In 2023 (If I Could...,Data Nash,3139,233,35,4150
46,T08eJt9DlgU,Data Scientist vs Data Analyst - Which Is Righ...,CareerFoundry,10870,372,12,158000
47,pLon_Mit7sk,day in the life of a Business Analyst at Spoti...,Lillian Chiu,246153,6975,246,173000
48,HqoLnQ0X-F8,Full Data Science Roadmap for Beginner,Ayush Singh,47227,1889,117,35900
49,qrhRfPY4F4w,The most important skills of data scientists |...,TEDx Talks,250949,3889,94,37500000


In [46]:
ds.shape

(50, 7)

In [47]:
ds.size

350

In [48]:
ds.columns

Index(['Video_ID ', 'Channel_Ttile', 'Channel_Name', 'View_Count',
       'Like_Count', 'Comment_Count', 'Subscriber_Count'],
      dtype='object')

In [49]:
len(ds)

50

In [50]:
ds

Unnamed: 0,Video_ID,Channel_Ttile,Channel_Name,View_Count,Like_Count,Comment_Count,Subscriber_Count
0,X3paOmcrTjQ,Data Science In 5 Minutes | Data Science For B...,Simplilearn,3231797,47521,1085,2800000
1,RBSUwFGa6Fk,What is Data Science?,IBM Technology,60188,1990,47,379000
2,ua-CiDNNj30,Learn Data Science Tutorial - Full Course for ...,freeCodeCamp.org,2504045,57663,945,7180000
3,xC-c7E5PK0Y,What REALLY is Data Science? Told by a Data Sc...,Joma Tech,3277329,126792,3645,2080000
4,-ETQ97mXXF0,Data Science Full Course - Learn Data Science ...,edureka!,2992711,63014,673,3690000
5,ZWgRvW8d_N4,How I Would Learn Data Science in 2023? (If I ...,Sundas Khalid,57616,2575,124,109000
6,t6CD1EwU5kc,How I Would NOT Learn Data Science in 2023.,Ken Jee,65164,2509,195,235000
7,pn0PUY0jwGQ,The Harsh Reality of Being a Data Scientist,Sundas Khalid,364605,6944,965,109000
8,0w9BAIVGuCA,How I Would Learn Data Science in 2023 (if I h...,Internet Made Coder,7642,462,43,208000
9,ho9vNL4MYZ8,Top Courses to Learn Data Science Skills FAST!,Thu Vu data analytics,88502,4886,137,118000


# Analysis on the extracted data

Comparison metrics on the attributes Like_Count, View_Count, Subscriber_Count using various visualization librarires in python