
#Web-Based Educational Content Recommendation System

*Student Name* : Vikrant Patil

*Mentor Name* : Kartik Gupta

*Project Track* : AI Applications - Recommendation Systems

*Platform* : Google Colab  

## 1. Problem Definition & Objective

### a. Selected Project Track
**AI Applications ‚Äì Content Recommendation Systems**

### b. Clear Problem Statement
With the rapid growth of online learning content, learners often find it
difficult to identify relevant and high-quality educational resources.
Searching manually across platforms such as YouTube can be time-consuming
and inefficient.

This project focuses on building a **web-based educational content
recommendation system** that suggests programming and technical learning
resources based on a user‚Äôs interest.

### c. Real-World Relevance and Motivation
Students, beginners, and working professionals frequently depend on online
platforms like YouTube, Udemy, and Coursera for skill development. An
intelligent recommendation system reduces search effort and helps users
quickly discover suitable learning content, leading to a better and more
personalized learning experience.


In [12]:
!pip install gradio google-api-python-client sentence-transformers
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

from googleapiclient.discovery import build
import gradio as gr




2.Data Understanding & Preparation
    a. Dataset Source

      A synthetic educational dataset was manually created to represent common learning topics in:
      Programming languages, Data Science, Artificial Intelligence.
      This dataset acts as the base content repository for the recommendation system. In later stages, this is enriched using real-time data fetched from the YouTube Data API.

   b. Data Loading and Exploration

      The dataset consists of:
      Title: Name of the educational topic
      Content: Text description used for similarity comparison.An additional link field is generated dynamically to allow users to directly explore related resources.

  c. Cleaning, Preprocessing, Feature Engineering

      Textual keywords were normalized and combined into a single content field.
      Titles were converted into searchable URLs for easy redirection.
      No duplicate or noisy entries were introduced

  d. Handling Missing Values or Noise

      The dataset is synthetically created and verified, so there are no missing values or null entries at this stage.

In [13]:
data = [
    # Programming
    {"title": "Python Programming Basics", "content": "python programming basics variables loops"},
    {"title": "Advanced Python Programming", "content": "advanced python decorators generators"},
    {"title": "C Programming Fundamentals", "content": "c programming pointers memory"},
    {"title": "C++ STL and Algorithms", "content": "c++ stl algorithms competitive"},
    {"title": "Java Programming Essentials", "content": "java programming oops basics"},
    {"title": "Java Multithreading", "content": "java threads concurrency"},
    {"title": "JavaScript Basics", "content": "javascript basics web programming"},
    {"title": "Full Stack Web Development", "content": "frontend backend full stack"},
    {"title": "HTML and CSS", "content": "html css web design"},
    {"title": "SQL for Beginners", "content": "sql database queries"},

    # Data Science & AI
    {"title": "Data Analysis with Pandas", "content": "data analysis pandas dataframe"},
    {"title": "Data Visualization", "content": "matplotlib seaborn visualization"},
    {"title": "Statistics for Data Science", "content": "statistics probability data science"},
    {"title": "Machine Learning Basics", "content": "machine learning supervised unsupervised"},
    {"title": "Deep Learning Fundamentals", "content": "deep learning neural networks"},
    {"title": "Natural Language Processing", "content": "nlp text processing"},
    {"title": "AI for Beginners", "content": "artificial intelligence basics"},
    {"title": "Recommendation Systems", "content": "content based collaborative filtering"},
    {"title": "Model Evaluation Techniques", "content": "precision recall accuracy"},
    {"title": "Python for Data Science", "content": "python numpy pandas"}
]

df = pd.DataFrame(data)
df["link"] = "https://www.google.com/search?q=" + df["title"].str.replace(" ", "+")

df


Unnamed: 0,title,content,link
0,Python Programming Basics,python programming basics variables loops,https://www.google.com/search?q=Python+Program...
1,Advanced Python Programming,advanced python decorators generators,https://www.google.com/search?q=Advanced+Pytho...
2,C Programming Fundamentals,c programming pointers memory,https://www.google.com/search?q=C+Programming+...
3,C++ STL and Algorithms,c++ stl algorithms competitive,https://www.google.com/search?q=C+++STL+and+Al...
4,Java Programming Essentials,java programming oops basics,https://www.google.com/search?q=Java+Programmi...
5,Java Multithreading,java threads concurrency,https://www.google.com/search?q=Java+Multithre...
6,JavaScript Basics,javascript basics web programming,https://www.google.com/search?q=JavaScript+Basics
7,Full Stack Web Development,frontend backend full stack,https://www.google.com/search?q=Full+Stack+Web...
8,HTML and CSS,html css web design,https://www.google.com/search?q=HTML+and+CSS
9,SQL for Beginners,sql database queries,https://www.google.com/search?q=SQL+for+Beginners


In [14]:
API_KEY = "AIzaSyBu7dwBToyzGdyUKoev6QEcjh1GK8ttAts"


def fetch_youtube_videos(query, max_results=10):
    youtube = build("youtube", "v3", developerKey=API_KEY)

    request = youtube.search().list(
        q=query,
        part="snippet",
        type="video",
        maxResults=max_results
    )

    response = request.execute()

    videos = []
    for item in response["items"]:
        title = item["snippet"]["title"]
        video_id = item["id"]["videoId"]
        link = f"https://www.youtube.com/watch?v={video_id}"

        videos.append({
            "title": title,
            "content": title,
            "link": link
        })

    return pd.DataFrame(videos)


## 4. Core Implementation

### a. Model Training / Inference Logic
The combined dataset is vectorized using TF-IDF. User input is transformed
into the same vector space and compared against all content entries using
cosine similarity.

### c. Recommendation Pipeline
1. User provides a learning topic  
2. Input is converted into a TF-IDF vector  
3. Similarity scores are computed  
4. Top-N relevant resources are selected  
5. Results are displayed with clickable links

In [15]:
topics = [
    "python programming tutorial",
    "machine learning tutorial",
    "data science tutorial",
    "web development tutorial"
]

yt_frames = []
for topic in topics:
    yt_frames.append(fetch_youtube_videos(topic, max_results=8))

yt_df = pd.concat(yt_frames, ignore_index=True)
yt_df


combined_df = pd.concat(
    [df[["title","content","link"]], yt_df],
    ignore_index=True
)

combined_df


Unnamed: 0,title,content,link
0,Python Programming Basics,python programming basics variables loops,https://www.google.com/search?q=Python+Program...
1,Advanced Python Programming,advanced python decorators generators,https://www.google.com/search?q=Advanced+Pytho...
2,C Programming Fundamentals,c programming pointers memory,https://www.google.com/search?q=C+Programming+...
3,C++ STL and Algorithms,c++ stl algorithms competitive,https://www.google.com/search?q=C+++STL+and+Al...
4,Java Programming Essentials,java programming oops basics,https://www.google.com/search?q=Java+Programmi...
...,...,...,...
64,Back-End Web Development (Tutorial for Beginners),Back-End Web Development (Tutorial for Beginners),https://www.youtube.com/watch?v=1oTuMPIwHmk
65,&quot;HTML Basics to Advanced: Mastering Junio...,&quot;HTML Basics to Advanced: Mastering Junio...,https://www.youtube.com/watch?v=gAXbB_4BavE
66,The Complete Web Development Roadmap,The Complete Web Development Roadmap,https://www.youtube.com/watch?v=GxmfcnU3feo
67,HTML Input Tags List Tutorial For Beginners (W...,HTML Input Tags List Tutorial For Beginners (W...,https://www.youtube.com/watch?v=GO_sp-Wl00M


## 4. Core Implementation

### a. Model Training / Inference Logic
The combined dataset is vectorized using TF-IDF. User input is transformed
into the same vector space and compared against all content entries using
cosine similarity.

### c. Recommendation Pipeline
1. User provides a learning topic  
2. Input is converted into a TF-IDF vector  
3. Similarity scores are computed  
4. Top-N relevant resources are selected  
5. Results are displayed with clickable links

In [16]:
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(combined_df["content"])


In [17]:
def recommend_content(user_interest, top_n=6):
    user_vector = tfidf.transform([user_interest])
    similarity = cosine_similarity(user_vector, tfidf_matrix)

    top_indices = similarity.argsort()[0][-top_n:][::-1]

    return combined_df.iloc[top_indices][["title", "link"]]


## 5. Evaluation & Analysis

### a. Evaluation Method
The system is evaluated qualitatively by checking the relevance of
recommended content for different user inputs.

### b. Sample Outputs
The system recommends:
- Programming tutorials
- AI and Machine Learning videos
- Educational resources with direct YouTube links

### c. Performance Analysis & Limitations
**Strengths**
- Fast inference
- Real-time data using API
- Simple and interpretable logic

**Limitations**
- No user personalization
- Depends on text quality of titles
- English-language content bias


## 6. Ethical Considerations & Responsible AI

### a. Bias and Fairness
The system may reflect bias present in online content sources and
search trends from the YouTube API.

### b. Dataset Limitations
The local dataset is limited in size, and API results depend on
availability and ranking of external content.

### c. Responsible Use of AI
No personal data is collected or stored. The system is designed only
for educational guidance and uses transparent, explainable methods.


In [18]:
suggestions = sorted(list(set(combined_df["title"].tolist())))


def ui_recommend(topic):
    return recommend_content(topic)

with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # üéì AI Educational Content Recommendation System
    **Personalized learning using AI + YouTube API**
    """)

    topic = gr.Dropdown(
        choices=suggestions,
        label="Choose or type a learning topic",
        allow_custom_value=True
    )

    output = gr.Dataframe(
        headers=["Title", "Link"],
        datatype=["str", "str"],
        wrap=True
    )

    gr.Button("üîç Recommend").click(
        fn=ui_recommend,
        inputs=topic,
        outputs=output
    )

demo.launch()


  with gr.Blocks(theme=gr.themes.Soft()) as demo:


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7f32ffbb4e72cf0a3e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## 7. Conclusion & Future Scope

### a. Conclusion
In this project, a web-based AI educational content recommendation system
was successfully developed. The system combines a curated local dataset
with real-time YouTube API data and uses NLP-based text similarity to
recommend relevant learning resources. The working prototype demonstrates
how AI techniques can be applied to guide learners toward suitable
educational content in an efficient and user-friendly manner.

### b. Future Scope
The system can be further enhanced by:
- Adding user profiles and learning history for personalization  
- Using semantic embeddings or LLM-based models for deeper understanding   
- Introducing filters such as difficulty level or content duration  
- Supporting multilingual educational content
