<a href="https://colab.research.google.com/github/Joshika-Mentor/AI-Query-Tube/blob/Jayashree/Query_Tube_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Query-Tube Project**

Lets Python to use Google APIs

In [1]:
pip install google-api-python-client



It installs a library that helps the computer understand the meaning of sentences.

In [2]:
pip install sentence-transformers



It installs a library that helps Python learn from data and make predictions.

In [3]:
pip install scikit-learn



It installs a library that can read subtitles (captions) from YouTube videos.

In [4]:
pip install youtube-transcript-api

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-1.2.3-py3-none-any.whl.metadata (24 kB)
Downloading youtube_transcript_api-1.2.3-py3-none-any.whl (485 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.1/485.1 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-1.2.3


It installs a tool that helps you create a simple web app for your Python program.

In [5]:
pip install gradio



It installs a library that helps Python work with data easily.

In [6]:
pip install pandas



In [7]:
#Imports pandas library
#pd is just a short name
import pandas as pd

#Imports Gradio library
#gr is short form
import gradio as gr

#Imports NumPy
#np is short name
import numpy as np


#Connects your Python code to Google services
from googleapiclient.discovery import build

#Fetches subtitles (captions) from YouTube videos
from youtube_transcript_api import YouTubeTranscriptApi

#Converts text → numbers (embeddings)
from sentence_transformers import SentenceTransformer

#Measures how similar two meanings are
#Used to:
#rank videos
#find best match
#semantic ranking
from sklearn.metrics.pairwise import cosine_similarity



In [8]:
from googleapiclient.discovery import build
# Your YouTube Data API key
# This key allows your program to access YouTube data
# (like searching videos, getting titles, views, etc.)
API_KEY = "AIzaSyC1NUBYrAEe0oBm3rX71lZHG7Jz-o7HPOE"

# Create a YouTube API service object
# "youtube" → service name
# "v3" → YouTube Data API version
# developerKey → your API key for authentication
youtube = build(
    "youtube",
    "v3",
    developerKey=API_KEY
)


In [9]:
# Load a pre-trained sentence transformer model
# This model converts text sentences into numerical vectors (embeddings)
# so the computer can understand the meaning of the text

# "all-MiniLM-L6-v2" → fast and lightweight AI model for semantic search
# device="cpu" → run the model on CPU (works on normal laptops)
model = SentenceTransformer("all-MiniLM-L6-v2",device="cpu")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [10]:
# Function to get trending YouTube videos
# max_results = number of videos to fetch (default is 10)
def get_trending_videos(max_results=10):

    # Create a request to YouTube API
    # part → what data we want (title, description, views etc.)
    # chart="mostPopular" → fetch trending videos
    # regionCode="IN" → trending videos in India
    # maxResults → how many videos to return
    request = youtube.videos().list(
        part="snippet,statistics",
        chart="mostPopular",
        regionCode="IN",
        maxResults=max_results
    )

    # Send the request to YouTube and get response data
    response = request.execute()

    # Empty list to store all video details
    videos = []

     # Loop through each video returned by YouTube
    for item in response["items"]:

         # Store required video information in dictionary format
         # unique video ID
         # video title
         # video description
         # thumbnail image
         # channel name
         videos.append({
            "video_id": item["id"],
            "title": item["snippet"]["title"],
            "description": item["snippet"]["description"],
            "thumbnail": item["snippet"]["thumbnails"]["high"]["url"],
            "channel": item["snippet"]["channelTitle"]
        })

    # Return the list of trending videos
    return videos

In [11]:

# Function to search YouTube videos based on user query
# query → search text entered by user
# max_results → number of videos to fetch (default 40)
def youtube_search(query, max_results=40):

    # Create a YouTube search request
    # part="snippet" → fetch title, description, thumbnail, channel name
    # q=query → user search text
    # type="video" → return only videos (not channels or playlists)
    # maxResults → limit number of results
    request = youtube.search().list(
        part="snippet",
        q=query,
        type="video",
        maxResults=max_results
    )

    # Execute the request and get response from YouTube
    response = request.execute()

    # Empty list to store video details
    videos = []

    # Loop through all returned search results
    for item in response.get("items", []):

        try:
            # Add required video information into list
            videos.append({
                "video_id": item["id"]["videoId"],        # unique video ID
                "title": item["snippet"]["title"],         # video title
                "description": item["snippet"].get("description", ""),  # description (safe)
                "thumbnail": item["snippet"]["thumbnails"]["high"]["url"],  # thumbnail image
                "channel": item["snippet"]["channelTitle"] # channel name
            })

        # If any video has missing data, skip it
        except:
            continue

    # Convert list of videos into pandas DataFrame (table format)
    return pd.DataFrame(videos)


In [12]:
# Dictionary to store already fetched transcripts
# This avoids downloading the same transcript again and again
TRANSCRIPT_CACHE = {}


# Function to get transcript of a YouTube video
# video_id → unique ID of the YouTube video
def get_transcript(video_id):

    # If transcript is already stored in cache
    # return it immediately (faster)
    if video_id in TRANSCRIPT_CACHE:
        return TRANSCRIPT_CACHE[video_id]

    try:
        # Fetch transcript (subtitles) from YouTube
        transcript = YouTubeTranscriptApi.get_transcript(video_id)

        # Combine all subtitle lines into one long text
        text = " ".join([x["text"] for x in transcript])

    except Exception:
        # If transcript is disabled or not available
        text = "Transcript not available"

    # Save transcript text in cache
    TRANSCRIPT_CACHE[video_id] = text

    # Return transcript text
    return text


In [13]:
# Function to rank YouTube videos based on meaning (semantic search)
# query → user search text
# df → dataframe containing YouTube videos
# top_k → number of best results to return
def semantic_rank_df(query, df, top_k=5):

    # Safety check:
    # If dataframe is empty, return empty result
    if df.empty:
        return pd.DataFrame()

    # Combine title and description into one text
    # fillna("") prevents errors if value is missing
    texts = (
        df["title"].fillna("") +
        " " +
        df["description"].fillna("")
    ).tolist()

    try:
        # Convert all video texts into embeddings (numbers)
        video_embeddings = model.encode(texts)

        # Convert user search query into embedding
        query_embedding = model.encode([query])

        # Calculate similarity between query and all videos
        scores = cosine_similarity(
            query_embedding,
            video_embeddings
        )[0]

    except Exception as e:
        # If embedding fails, print error and return empty dataframe
        print("Embedding error:", e)
        return pd.DataFrame()

    # Get indexes of highest similarity scores
    # argsort → sorts indexes
    # [::-1] → descending order
    # [:top_k] → take top results only
    top_idx = np.argsort(scores)[::-1][:top_k]

    # Select top ranked videos from dataframe
    results = df.iloc[top_idx][
        ["title", "video_id", "channel"]
    ].copy()

    # Add similarity score column
    results["score"] = scores[top_idx]

    # Return ranked videos
    return results


In [14]:
# Main AI YouTube search function
# query → text entered by the user in search box
def search_youtube_ai(query):

    # Check if user entered nothing
    if query is None or query.strip() == "":
        return "<h3>⚠ Please enter search text</h3>", pd.DataFrame()

    # Search YouTube videos using keyword search
    df = youtube_search(query)

    # If no videos are found
    if df.empty:
        return "<h3>No videos found</h3>", pd.DataFrame()

    # Rank videos using semantic (AI meaning-based) search
    ranked_df = semantic_rank_df(query, df)

    # If semantic ranking fails
    if ranked_df.empty:
        return "<h3>Semantic ranking failed</h3>", pd.DataFrame()

    # HTML string to display results nicely
    html = ""

    # Loop through top ranked videos
    for _, row in ranked_df.iterrows():

        # Get thumbnail URL of the video
        thumb = df[df.video_id == row.video_id].iloc[0]["thumbnail"]

        # Create HTML block for each video result
        html += f"""
        <div style="display:flex;margin-bottom:20px;">
            <img src="{thumb}" width="300"
                 style="border-radius:12px;margin-right:15px;">
            <div>
                <h3>{row.title}</h3>
                <b>{row.channel}</b><br>
                <b>Similarity:</b> {row.score:.4f}<br><br>
                <a target="_blank"
                   href="https://www.youtube.com/watch?v={row.video_id}">
                   ▶ Watch on YouTube
                </a>
            </div>
        </div>
        <hr>
        """

    # Return HTML output and ranked dataframe
    return html, ranked_df


In [15]:
# Function to display trending YouTube videos page
def trending_page():

    # Get trending videos data from YouTube API
    videos = get_trending_videos()

    # HTML heading for trending section
    html = "<h2>🔥 Trending on YouTube</h2><br>"

    # Loop through each trending video
    for v in videos:

        # Create HTML block for each video
        html += f"""
        <div style="display:flex;margin-bottom:25px;">

            <!-- Video thumbnail -->
            <img src="{v['thumbnail']}" width="320"
                 style="border-radius:12px;margin-right:15px;"/>

            <div>
                <!-- Video title -->
                <h3>{v['title']}</h3>

                <!-- Channel name -->
                <p><b>{v['channel']}</b></p>

                <!-- YouTube watch link -->
                <a href="https://www.youtube.com/watch?v={v['video_id']}"
                   target="_blank">
                    ▶ Watch
                </a>
            </div>
        </div>

        <hr>
        """

    # Return final HTML page
    return html


In [16]:
# Create a Gradio app using Blocks layout
# theme=Soft → gives a clean and modern UI design
with gr.Blocks(theme=gr.themes.Soft()) as demo:

    # App title and description shown at the top
    gr.Markdown(
        """
        # ▶ QueryTube — AI YouTube Search
        **Search exactly like YouTube with AI semantic understanding**
        """
    )

    # Text box where user enters search query
    query = gr.Textbox(
        placeholder="Search anything — python, cricket, motivation, news...",
        label="🔍 Search YouTube"
    )

    # Button to start search
    search_btn = gr.Button("Search")

    # HTML output area (used to display thumbnails and video cards)
    html_output = gr.HTML()

    # Table output to show structured data
    table_output = gr.Dataframe(
        headers=["Video ID", "Title", "Channel", "Transcript"],
        interactive=False
    )

    # When app loads, show trending videos automatically
    demo.load(
        trending_page,          # function to call
        outputs=html_output     # where to display output
    )

    # When search button is clicked:
    # - take input from textbox
    # - run AI search function
    # - show results in HTML and table
    search_btn.click(
        fn=search_youtube_ai,
        inputs=query,
        outputs=[html_output, table_output]
    )


# Launch the Gradio web app
demo.launch()


  with gr.Blocks(theme=gr.themes.Soft()) as demo:


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1868a6270f3afe2c9b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


