Ask Your Favorite YouTube Playlist

This web application allows users to ask questions about any YouTube playlist.

Overview

This application answers questions about any YouTube playlist or set of them.

The task will be divided in two steps:

Information Retrieval: This step involves identifying relevant episodes, sections, or segments from the playlist or playlists that might contain the answer to the user's question. Techniques we plan to use include:

Pre-trained sentence transformers: Models like DistilBERT, MiniLM or Ada can be used to create sentence embeddings and measure the semantic similarity between the user's question and the podcast data.

Here we have a table summarizing the available sentence transformers:

Model Name	Model Type	Max Sequence Length
msmarco-MiniLM-L-6-v3	sentence-transformers	512
msmarco-distilbert-base-v4	sentence-transformers	512
msmarco-distilbert-base-tas-b	sentence-transformers	512
text-embedding-ada-002	openai	8191

Natural Language Understanding: Once relevant portions of the dataset have been identified, a languague model processes the user's question and the relevant information to generate an appropriate answer. We explore two approaches:

Extractive question-answering: In this approach, the model is trained to identify and extract the exact answer from the relevant text. Models like BERT or RoBERTa have been fine-tuned on a question-answering dataset like squad2 for this purpose.
Generative question-answering: This technique involves generating a human-like answer by paraphrasing or summarizing the relevant information. Models like GPT can be employed for this task. We use the Open AI API to use powerful models such as GPT-3.5 OR GPT-4, although other locally hosted models such as GPT-2 can be used.

🚀 Installation

Clone the repository.
Duplicate the .env.template file and rename it to .env.
Fill in the environment variables in the .env file.
Install Poetry and Python if you don't have them already.
Run poetry install to install the dependencies in a virtual environment.
Run poetry shell to activate the virtual environment.

Usage

After following the steps described in the Installation section, we can run the web application by executing the following command:

make run_app

To complete this task, we use the YouTube API to download the transcripts and timestamps from the episodes of the playlist introduced by the user. The transcripts and timestamps will be stored inside the $data/playlist_name/raw folder.

The playlist name is the name of the playlist introduced by the user.

Inside this file, you will find files Video_i.json that follow the structure:

{   
    "title": "Title of the video",
    "video_id": "ID of the video",
    "transcript": [
    {
        "text": "Hey there",
        "start": 7.58,
        "duration": 6.13,
    },
    {
        "text": "how are you",
        "start": 14.08,
        "duration": 7.58
    },
    # ...
}

Then, we will create chunks from that data, since the raw data is quite separated, so we merge some chunks. We can define the maximum length of each chunk and the overlap between chunks depending on our needs. We also add the thumbnail and the link with the timestamp

[
    {
        "text": "Hey there how are you... more text until reach the max number of characters.",
        "start": 7.58,
        "duration": 34.08,
        "url": "https://www.youtube.com/watch?v=...",
        "title": "Title of the video",
        "thumbnail": "https://i.ytimg.com/vi/..."
    },
    # ...
]

📚 Resources

Resources and tutorials that we have found useful for this project.

🔥 PyTorch

How to learn PyTorch?: YouTube, How to learn PyTorch? (3 easy steps) | 2021
Official tutorial: https://pytorch.org/tutorials/
Blog: Understanding PyTorch with an example: a step-by-step tutorial

⚙️ Set Up

Poetry: YouTube, How to Create and Use Virtual Environments in Python With Poetry

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
.github/workflows		.github/workflows
ask_youtube_playlists		ask_youtube_playlists
data		data
docs		docs
tests		tests
web_app		web_app
.env.template		.env.template
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
tox.ini		tox.ini

Pabloo22/ask-youtube-playlists

Folders and files

Latest commit

History

Repository files navigation

Ask Your Favorite YouTube Playlist

Overview

🚀 Installation

Usage

📚 Resources

🔥 PyTorch

⚙️ Set Up

About

Topics

Resources

Stars

Watchers

Forks

Languages