# <center> Final Exam – YouTube-Watch-History </center>

### <center>Made by:</center>

**<center> Exam number S176512 </center>**
<center>Lasse Gustavo Strandbygaard (last24ag)</center>


<center> <img src="YouTube_Logo.jpeg"/> </center>

##  Project Background


YouTube is a video-sharing platform where users upload, view, and share content. Google, its owner, allows users to export their personal data, including YouTube watch history, via [Google Takeout](https://takeout.google.com/), ensuring transparency and user control. This project uses a Flask website to display insights into YouTube watch history, demonstrating skills learned during the course. Users can download their watch history through [Google Takeout](https://takeout.google.com/) and analyze their own viewing habits using this program. Instructions are provided in the Report.pdf and README file.

## Executive Summary

The four visualizations reveal key insights into the user's YouTube habits. For the test data of my watching habits, the line plot shows consistent peaks in video consumption during seasons, particularly aligned with holidays or weekends. The heatmap highlights evenings as the prime viewing hours, especially on weekends. The seasonal table confirms summer and winter as peak seasons for watching content.

The topic modelling function reveals popular topics in the videos viewed. For instance with my data it shows that most common topic for the videos I have watched is music related, with words such as review, album and naming several music artists. The word cloud visually reinforces these recurring themes as the largest words most often is "*Album review*" and"*VS*". This is also reflected from the barplot that shows the most watched channel is a music reviewer and several mixed martial arts YouTube channels such as UFC and Top Rank Boxing.

### **Dowloading necessary libraries**

***WARNING***: 

Ensure the following libraries are installed before running the code: 

*pandas, numpy, matplotlib, seaborn, scikit-learn, wordcloud, ijson, calendar, flask, os, json, threading, sklearn.decomposition, sklearn.feature_extraction.text*. 

Missing libraries will cause errors when using app.py or library.py. Run pip install to install them before proceeding.

In [1]:
import os
from threading import Thread
from app import YouTubePlots, YouTubeApp, install_libraries
from library import YouTubeWrangler, YouTubeTextStats

In [2]:
# List of required libraries
required_libraries = [
    "json", "ijson", "numpy", "pandas", "seaborn", "matplotlib", 
    "scikit-learn", "wordcloud", "threading"]

# Map library names to import names where different
library_mapping = {
    "json": "json",
    "ijson": "ijson",
    "numpy": "numpy",
    "pandas": "pandas",
    "seaborn": "seaborn",
    "matplotlib": "matplotlib",
    "scikit-learn": "sklearn",
    "wordcloud": "wordcloud",
    "threading": "threading"
}

In [3]:
# Install libraries
install_libraries(required_libraries)

# Verify installation and print success
for lib in required_libraries:
    try:
        __import__(library_mapping[lib])
    except ImportError:
        print(f"Failed to install {lib}. Please install manually.")

json is already installed.
ijson is already installed.
numpy is already installed.
pandas is already installed.
seaborn is already installed.
matplotlib is already installed.
Installing scikit-learn...
wordcloud is already installed.
threading is already installed.



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### **Creating images for website:**

In [4]:
# locating the patj of the json file
path = os.path.join(os.getcwd(), 'watch-history.json')

# Initialize the class with the JSON file path
pipeline = YouTubePlots(path)

# Run the full pipeline (set save_visuals=True to save images)
pipeline.run_pipeline(save_visuals=True)

Saving plots to: /Users/lassestrandbygaard/Desktop/Python:SQL Projects/main/static


### **Show data of your YouTube viewing history:**

In [5]:
df = pipeline.get_cleaned_data()  # Access the cleaned DataFrame
df

Unnamed: 0,title,time,channel_name
0,"Watched Install Git on MacOS (Macbook M1, M1 M...",2024-02-18T15:16:21.549Z,Code With Arjun
1,Watched Merab wrestled Bradley Martyn 😂 (via m...,2024-02-18T15:02:17.549Z,ESPN MMA
2,Watched 5 Basic Essential Tactics for NOOBS! -...,2024-02-18T15:01:03.263Z,Zerkovich
4,Watched Skaven Inventions in a Nutshell,2024-02-18T15:00:47.482Z,Tarriff
6,Watched Ilia Topuria says featherweight divisi...,2024-02-18T14:53:49.346Z,ESPN MMA
...,...,...,...
48402,Watched Bevis Arealfunktionen er en stamfunk...,2020-07-12T20:57:18.686Z,Annepande
48403,Watched Tretrinsreglen - Bevis: Differentialkv...,2020-07-12T20:51:34.640Z,KG MAT
48404,Watched https://www.youtube.com/watch?v=19FT6b...,2020-07-12T20:48:08.113Z,
48405,Watched Tame Impala - On Track (Acoustic Live),2020-07-12T20:46:59.003Z,tameimpalaVEVO


### **Show general descriptive statistics:**

In [6]:
seasons = YouTubeWrangler(df)
seasons.watch_stats()

Unnamed: 0,Unnamed: 1,Time Period,Video Count
0,Year with least videos watched,2024,2336
1,Year with most videos watched,2023,13917
2,Month with least videos watched,Feb 2022,360
3,Month with most videos watched,Jul 2022,1799


In [7]:
seasons.top_seasons()

Unnamed: 0,season,Videos Watched Count
1,Winter,12423
2,Summer,12185
3,Autumn,11815
4,Spring,8576


In [8]:
# Initialize the YouTubeTextStats class
text_stats = YouTubeTextStats(df)

# Get the most common topics
text_stats.topic_modelling(num_topics=3)

Top Topics Identified:
Topic 1: review, album, jpegmafia, west, kanye, king, danny, brown, film, drake
Topic 2: vs, fight, ufc, highlights, crawford, usyk, joshua, spence, terence, jr
Topic 3: reaction, episode, time, official, group, trailer, naruto, avatar, arcane, shippuden


### **Run Flask Website**

In [9]:
# Initialize the Flask app
dashboard_app = YouTubeApp()

# Run the Flask app in a separate thread
flask_thread = Thread(target=dashboard_app.run, kwargs={'host': '127.0.0.1', 'port': 5000})
flask_thread.start()
print("Flask app is running! Open http://127.0.0.1:5000 in your browser.")

Flask app is running! Open http://127.0.0.1:5000 in your browser.
 * Serving Flask app 'app'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
[33mPress CTRL+C to quit[0m
127.0.0.1 - - [20/Dec/2024 05:45:41] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2024 05:45:41] "GET /static/wordcloud.png HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2024 05:45:41] "GET /static/top_channels.png HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2024 05:45:41] "GET /static/heatmap.png HTTP/1.1" 200 -
127.0.0.1 - - [20/Dec/2024 05:45:41] "GET /static/lineplot.png HTTP/1.1" 200 -



**Now it is your turn! Go to [Google Takeout](https://takeout.google.com/) and download you YouTube data following the README file!**