### YOUTUBE ANALYTICS DATA NOTEBOOK

This notebook dives into analyzing and pre-processing the data received from the Youtube API, providing insights into an efficient ETL pipeline that enables us to clean the data and process it such that it can be used for further analysis in our Streamlit Dashboard.

In [2]:
# Import pandas and json
import pandas as pd
import json

In [3]:
# Load the JSON data
with open('output.json', 'r', encoding='utf-8') as json_file:
    data = json.load(json_file)

# Extract the "data" part and create a DataFrame
video_data = data['data']
df = pd.DataFrame(video_data)

In [4]:
# View the first 5 rows
df.head()

Unnamed: 0,type,videoId,title,channelTitle,channelId,channelThumbnail,description,viewCount,publishedTimeText,publishDate,publishedAt,lengthText,thumbnail,richThumbnail
0,video,sOZrJUkCLYU,Crazy Moment #shorts #crazy #funny #viral,boxtoxtv,UCCfKlFlKYBxZ-UU2dWc17IQ,[{'url': 'https://yt3.ggpht.com/ZyLe6_qIkBG96J...,,706727072,11 months ago,2022-11-12,2022-11-12T00:00:00Z,0:20,[{'url': 'https://i.ytimg.com/vi/sOZrJUkCLYU/h...,[{'url': 'https://i.ytimg.com/an_webp/sOZrJUkC...
1,video,I_ZJ4wYxKw4,It was the worst dream😱😭 #shorts #funny #viral,CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,[{'url': 'https://yt3.ggpht.com/fi_rOiP7BWJIWq...,,105426597,2 months ago,2023-08-12,2023-08-12T00:00:00Z,0:09,[{'url': 'https://i.ytimg.com/vi/I_ZJ4wYxKw4/h...,
2,video,eHUPUpjjmnw,When you’re broke 😅 #ytshorts #shorts #funny #...,Sadiq Ahmed Vines,UCq8K2QcYKGFZXSkgJk1AvWA,[{'url': 'https://yt3.ggpht.com/4yk82mJVMKSNnn...,,53247719,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:44,[{'url': 'https://i.ytimg.com/vi/eHUPUpjjmnw/h...,[{'url': 'https://i.ytimg.com/an_webp/eHUPUpjj...
3,video,F_9PnhbCcUM,Testing 100 Viral Food Hacks,Nick DiGiovanni,UCMyOj6fhvKFMjxUCp3b_3gA,[{'url': 'https://yt3.ggpht.com/flQNxMIkGvrAau...,,13424854,1 year ago,2022-10-12,2022-10-12T00:00:00Z,21:53,[{'url': 'https://i.ytimg.com/vi/F_9PnhbCcUM/h...,[{'url': 'https://i.ytimg.com/an_webp/F_9PnhbC...
4,video,e8qgMiCmuWY,Meet the 6-year-old whose morning routine is g...,TODAY,UChDKyKQ59fYz3JO2fl0Z6sg,[{'url': 'https://yt3.ggpht.com/ytc/APkrFKZRuc...,,2697882,4 months ago,2023-06-12,2023-06-12T00:00:00Z,6:18,[{'url': 'https://i.ytimg.com/vi/e8qgMiCmuWY/h...,[{'url': 'https://i.ytimg.com/an_webp/e8qgMiCm...


### Create Helper Function to Process Data

In [5]:
# View video data
type(video_data)

list

In [30]:
# Check for nested values and specify values to skip in dataframe
nested_values = ['channelThumbnail', 'thumbnail', 'richThumbnail']
skip_values = ['description', 'type', 'richThumbnail']  # Skip 'description', 'type', and 'richThumbnail'

# Initialize a dictionary to store flattened data
flattened_data = {}

# Loop through each video
for idx, value in enumerate(video_data):
    flattened_data[idx] = {}
    # Loop through each property in each video
    for prop_idx, prop_value in value.items():
        # Check if it's a list (nested value)
        if isinstance(prop_value, list):
            # Loop through each nested property
            for nested_idx, nested_value in prop_value[0].items():
                if prop_idx not in skip_values:
                    flattened_data[idx][prop_idx + '_' + nested_idx] = nested_value
        # If it's not a list and not in the skip list, add it directly
        elif prop_idx not in skip_values:
            flattened_data[idx][prop_idx] = prop_value

In [31]:
type(flattened_data)

dict

In [34]:
# Create a DataFrame from the flattened data
flattened_df = pd.DataFrame.from_dict(flattened_data, orient='index')
flattened_df

Unnamed: 0,videoId,title,channelTitle,channelId,channelThumbnail_url,channelThumbnail_width,channelThumbnail_height,viewCount,publishedTimeText,publishDate,publishedAt,lengthText,thumbnail_url,thumbnail_width,thumbnail_height
0,sOZrJUkCLYU,Crazy Moment #shorts #crazy #funny #viral,boxtoxtv,UCCfKlFlKYBxZ-UU2dWc17IQ,https://yt3.ggpht.com/ZyLe6_qIkBG96JOzSTDxi58k...,68,68,706727072,11 months ago,2022-11-12,2022-11-12T00:00:00Z,0:20,https://i.ytimg.com/vi/sOZrJUkCLYU/hq2.jpg?sqp...,168,94
1,I_ZJ4wYxKw4,It was the worst dream😱😭 #shorts #funny #viral,CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,105426597,2 months ago,2023-08-12,2023-08-12T00:00:00Z,0:09,https://i.ytimg.com/vi/I_ZJ4wYxKw4/hq2.jpg?sqp...,168,94
2,eHUPUpjjmnw,When you’re broke 😅 #ytshorts #shorts #funny #...,Sadiq Ahmed Vines,UCq8K2QcYKGFZXSkgJk1AvWA,https://yt3.ggpht.com/4yk82mJVMKSNnn5--NqI6u4A...,68,68,53247719,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:44,https://i.ytimg.com/vi/eHUPUpjjmnw/hq2.jpg?sqp...,168,94
3,F_9PnhbCcUM,Testing 100 Viral Food Hacks,Nick DiGiovanni,UCMyOj6fhvKFMjxUCp3b_3gA,https://yt3.ggpht.com/flQNxMIkGvrAaublvNUp1l0M...,68,68,13424854,1 year ago,2022-10-12,2022-10-12T00:00:00Z,21:53,https://i.ytimg.com/vi/F_9PnhbCcUM/hqdefault.j...,168,94
4,e8qgMiCmuWY,Meet the 6-year-old whose morning routine is g...,TODAY,UChDKyKQ59fYz3JO2fl0Z6sg,https://yt3.ggpht.com/ytc/APkrFKZRucSXlbwsWWP1...,68,68,2697882,4 months ago,2023-06-12,2023-06-12T00:00:00Z,6:18,https://i.ytimg.com/vi/e8qgMiCmuWY/hqdefault.j...,168,94
5,Q-AIG7wGfqk,"???: It is 9,999,999$ sir🤑 #shorts #funny #viral",CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,417438190,11 months ago,2022-11-12,2022-11-12T00:00:00Z,0:28,https://i.ytimg.com/vi/Q-AIG7wGfqk/hq2.jpg?sqp...,168,94
6,5fqvOx-2dVg,Bhai Ko Dekho 😱😅 #shorts #funny #viral,Vikram Singh Fitness,UCBIyn-ZVn6YhkVWF0L1lvcg,https://yt3.ggpht.com/J3v-XxSgFJhL0I3ow9Ta5IyM...,68,68,72186953,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:22,https://i.ytimg.com/vi/5fqvOx-2dVg/hq2.jpg?sqp...,168,94
7,Z2OvfutQAIY,POV:-HUMAN VS ANIMAL 👊👻|| hanuman | #hanumanji...,DRAZNOX,UCE_firCoOwAAjU9Rva5AmfQ,https://yt3.ggpht.com/MkIEyDd4MnyHFHVuWD2dVAbX...,68,68,210490747,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:19,https://i.ytimg.com/vi/Z2OvfutQAIY/hqdefault.j...,168,94
8,oVuBKvE7xVw,independence day #independenceday #india #indi...,Shrija Art Gallery,UCeyVihJyOANl9qGSVFYV59Q,https://yt3.ggpht.com/M5RLUyHJ_yUvvZDrAyi3UHUS...,68,68,41536059,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:15,https://i.ytimg.com/vi/oVuBKvE7xVw/hqdefault.j...,168,94
9,pwKzDG_8Rfw,Every couples be like.. 😱😂 #shorts #funny #viral,CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,54859313,7 months ago,2023-03-12,2023-03-12T00:00:00Z,0:19,https://i.ytimg.com/vi/pwKzDG_8Rfw/hq2.jpg?sqp...,168,94


In [35]:
# Rename column names for standardization

# Define a dictionary to map the old column names to the new ones
column_mapping = {
    'videoId': 'Video ID',
    'title': 'Title',
    'channelTitle': 'Channel Title',
    'channelId': 'Channel ID',
    'channelThumbnail_url': 'Channel Thumbnail URL',
    'channelThumbnail_width': 'Channel Thumbnail Width',
    'channelThumbnail_height': 'Channel Thumbnail Height',
    'viewCount': 'View Count',
    'publishedTimeText': 'Published Time Text',
    'publishDate': 'Publish Date',
    'publishedAt': 'Published At',
    'lengthText': 'Video Length Text',
    'thumbnail_url': 'Thumbnail URL',
    'thumbnail_width': 'Thumbnail Width',
    'thumbnail_height': 'Thumbnail Height'
}

# Use the rename method to apply the column name changes
flattened_df = flattened_df.rename(columns=column_mapping)
flattened_df

Unnamed: 0,Video ID,Title,Channel Title,Channel ID,Channel Thumbnail URL,Channel Thumbnail Width,Channel Thumbnail Height,View Count,Published Time Text,Publish Date,Published At,Video Length Text,Thumbnail URL,Thumbnail Width,Thumbnail Height
0,sOZrJUkCLYU,Crazy Moment #shorts #crazy #funny #viral,boxtoxtv,UCCfKlFlKYBxZ-UU2dWc17IQ,https://yt3.ggpht.com/ZyLe6_qIkBG96JOzSTDxi58k...,68,68,706727072,11 months ago,2022-11-12,2022-11-12T00:00:00Z,0:20,https://i.ytimg.com/vi/sOZrJUkCLYU/hq2.jpg?sqp...,168,94
1,I_ZJ4wYxKw4,It was the worst dream😱😭 #shorts #funny #viral,CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,105426597,2 months ago,2023-08-12,2023-08-12T00:00:00Z,0:09,https://i.ytimg.com/vi/I_ZJ4wYxKw4/hq2.jpg?sqp...,168,94
2,eHUPUpjjmnw,When you’re broke 😅 #ytshorts #shorts #funny #...,Sadiq Ahmed Vines,UCq8K2QcYKGFZXSkgJk1AvWA,https://yt3.ggpht.com/4yk82mJVMKSNnn5--NqI6u4A...,68,68,53247719,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:44,https://i.ytimg.com/vi/eHUPUpjjmnw/hq2.jpg?sqp...,168,94
3,F_9PnhbCcUM,Testing 100 Viral Food Hacks,Nick DiGiovanni,UCMyOj6fhvKFMjxUCp3b_3gA,https://yt3.ggpht.com/flQNxMIkGvrAaublvNUp1l0M...,68,68,13424854,1 year ago,2022-10-12,2022-10-12T00:00:00Z,21:53,https://i.ytimg.com/vi/F_9PnhbCcUM/hqdefault.j...,168,94
4,e8qgMiCmuWY,Meet the 6-year-old whose morning routine is g...,TODAY,UChDKyKQ59fYz3JO2fl0Z6sg,https://yt3.ggpht.com/ytc/APkrFKZRucSXlbwsWWP1...,68,68,2697882,4 months ago,2023-06-12,2023-06-12T00:00:00Z,6:18,https://i.ytimg.com/vi/e8qgMiCmuWY/hqdefault.j...,168,94
5,Q-AIG7wGfqk,"???: It is 9,999,999$ sir🤑 #shorts #funny #viral",CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,417438190,11 months ago,2022-11-12,2022-11-12T00:00:00Z,0:28,https://i.ytimg.com/vi/Q-AIG7wGfqk/hq2.jpg?sqp...,168,94
6,5fqvOx-2dVg,Bhai Ko Dekho 😱😅 #shorts #funny #viral,Vikram Singh Fitness,UCBIyn-ZVn6YhkVWF0L1lvcg,https://yt3.ggpht.com/J3v-XxSgFJhL0I3ow9Ta5IyM...,68,68,72186953,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:22,https://i.ytimg.com/vi/5fqvOx-2dVg/hq2.jpg?sqp...,168,94
7,Z2OvfutQAIY,POV:-HUMAN VS ANIMAL 👊👻|| hanuman | #hanumanji...,DRAZNOX,UCE_firCoOwAAjU9Rva5AmfQ,https://yt3.ggpht.com/MkIEyDd4MnyHFHVuWD2dVAbX...,68,68,210490747,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:19,https://i.ytimg.com/vi/Z2OvfutQAIY/hqdefault.j...,168,94
8,oVuBKvE7xVw,independence day #independenceday #india #indi...,Shrija Art Gallery,UCeyVihJyOANl9qGSVFYV59Q,https://yt3.ggpht.com/M5RLUyHJ_yUvvZDrAyi3UHUS...,68,68,41536059,4 months ago,2023-06-12,2023-06-12T00:00:00Z,0:15,https://i.ytimg.com/vi/oVuBKvE7xVw/hqdefault.j...,168,94
9,pwKzDG_8Rfw,Every couples be like.. 😱😂 #shorts #funny #viral,CuRe 구래,UCkLNHZZsC3LKdUAroKNwOOw,https://yt3.ggpht.com/fi_rOiP7BWJIWqLTf6rpW93Y...,68,68,54859313,7 months ago,2023-03-12,2023-03-12T00:00:00Z,0:19,https://i.ytimg.com/vi/pwKzDG_8Rfw/hq2.jpg?sqp...,168,94


In [36]:
# Convert dataframe to CSV
flattened_df.to_csv('youtube_analytics.csv', index=False)