# 🎓 SmartTutorAI: Your AI Tutor Based on Your Own Learning Material
This project uses **Gemini Pro** and **LangChain** to build a personalized tutor agent trained on user-uploaded PDFs, TXT files, or YouTube videos. It uses **retrieval-augmented generation (RAG)**, **role-based prompting**, and **document understanding** — three core GenAI capabilities.

Let's learn anything with a tutor that adapts to you

## 🧰 Step 1: Install & Import Required Libraries

Here we install the necessary Python packages and import required modules to power our AI tutor.

- `google-generativeai`: Gemini API by Google
- `pytube`: For extracting YouTube transcripts
- `PyMuPDF` (`fitz`): For reading PDFs (added for future steps)


In [1]:
# Install dependencies
!pip install openai




In [2]:
pip install youtube_transcript_api

Collecting youtube_transcript_api
  Downloading youtube_transcript_api-1.0.3-py3-none-any.whl.metadata (23 kB)
Downloading youtube_transcript_api-1.0.3-py3-none-any.whl (2.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: youtube_transcript_api
Successfully installed youtube_transcript_api-1.0.3
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install PymuPdf


Collecting PymuPdf
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m72.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PymuPdf
Successfully installed PymuPdf-1.25.5
Note: you may need to restart the kernel to use updated packages.


## 🔐 Step 2: Configure Gemini API Key

We securely connect to the Gemini API (Google Generative AI) using an API key to enable text generation and other GenAI capabilities.


In [4]:
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient
import os

user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("GEMINI_API_KEY")
genai.configure(api_key=secret_value_0)
model = genai.GenerativeModel("gemini-2.0-flash")

In [5]:
def generate_text_with_few_shot_prompting(user_text, subject):
    prompt = f"""
    You are a personal AI tutor. Your task is to explain the following content to a beginner in a simple and effective manner.
    Subject: {subject}
    Content: {user_text}
    """
    response = model.generate_content(prompt)
    return response.text

## 📺 Step 3: YouTube Transcript Extractor

This block extracts transcripts from any YouTube video using `pytube`.

- We remove timestamps and speaker tags.
- Returns a clean transcript string for use in prompting Gemini.


In [6]:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter

def get_video_transcript(video_id: str):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        formatter = TextFormatter()
        formatted_transcript = formatter.format_transcript(transcript)
        return formatted_transcript
    except Exception as e:
        return f"Error retrieving transcript: {str(e)}"

In [7]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path: str):
    try:
        doc = fitz.open(pdf_path)
        text = ""
        for page_num in range(len(doc)):
            page = doc.load_page(page_num)
            text += page.get_text()
        return text
    except Exception as e:
        return f"Error extracting text from PDF: {str(e)}"

## 🎓 Step 4: Prompt Gemini to Become a Tutor

We create a smart prompt to instruct Gemini to act as a **custom tutor** based on:
- What the user wants to learn
- The transcript from the content (like YouTube)

This is a classic **few-shot prompting + contextual generation** scenario.


In [8]:

def parse_and_tutor_from_source(source_type: str, source_content, subject="General Knowledge"):
    if source_type == "youtube":
        transcript = get_video_transcript(source_content)
        if "Error" not in transcript:
            print(f"\nTranscript Preview:\n{transcript[:500]}\n...")
            response = generate_text_with_few_shot_prompting(transcript, subject)
            print("\n\U0001F9D1 AI Tutor's Explanation:")
            print(response)
        else:
            print(transcript)

    elif source_type == "pdf":
        pdf_text = extract_text_from_pdf(source_content)
        if "Error" not in pdf_text:
            print(f"\nPDF Text Preview:\n{pdf_text[:500]}\n...")
            response = generate_text_with_few_shot_prompting(pdf_text, subject)
            print("\n\U0001F9D1 AI Tutor's Explanation:")
            print(response)
        else:
            print(pdf_text)
    else:
        print("Invalid source type. Please use 'youtube' or 'pdf'.")

## ✅ Step 6: Run Test Case

We run the system on a YouTube video to check if the personalized tutor is working correctly.

You can change the video URL and subject to try different test cases.


In [9]:
from youtube_transcript_api import YouTubeTranscriptApi

def get_youtube_transcript(video_url):
    import re
    match = re.search(r"(?:v=|youtu\.be/)([a-zA-Z0-9_-]{11})", video_url)
    if not match:
        raise ValueError("Invalid YouTube URL.")
    
    video_id = match.group(1)
    transcript = YouTubeTranscriptApi.get_transcript(video_id)
    full_text = " ".join([item['text'] for item in transcript])
    return full_text


In [10]:
def ask_tutor(prompt, content=None):
    full_prompt = prompt
    if content:
        full_prompt += f"\n\nHere is the content you should base your tutoring on:\n{content}"
    
    response = model.generate_content(full_prompt)
    return response.text


## ✅ Step 6: Run Test Case

We run the system on a YouTube video to check if the personalized tutor is working correctly.

You can change the video URL and subject to try different test cases.


In [11]:
video_url = "https://youtu.be/BBz-Jyr23M4?si=rdN5CWmFQ59QXF3c"
transcript = get_youtube_transcript(video_url)

user_prompt = "You're a Guitar teacher, Teach me the fundamentals of guitar from the beginning based on this video"

response = ask_tutor(user_prompt, transcript)
print(response)


Alright, let's get you started on your guitar journey! Based on Andy's video, here's a breakdown of the fundamentals, acting as your personal guitar teacher:

**Our Goal:**

By the end of this lesson, you'll be able to play two chords (E major and A major) and play a simplified version of the song "For What It's Worth" by Buffalo Springfield.

**1. Guitar Anatomy & Basics**

*   **Frets:** These are the metal strips running across the neck of the guitar.
*   **Strings:** The six strings run from the headstock to the bridge. We number them from thinnest to thickest: 1, 2, 3, 4, 5, 6.

**2. Tuning (Important!)**

*   Andy emphasizes this is crucial. If your guitar isn't in tune, it won't sound right, no matter what you do.
*   He recommends checking out his tuning video (link should be in the description of the video you watched).
*   You can use a guitar tuner (electronic or app) or tune by ear (the video will show you how).

**3. Finger Placement - The Key to Clean Sound**

*   **"This

# Structured output generation with JSON 

In [12]:
import json

def generate_structured_output(prompt):
    model = genai.GenerativeModel('gemini-2.0-flash')
    generation_config = {
        "temperature": 0.7,
        "top_p": 1,
        "top_k": 1,
        "max_output_tokens": 2048,
        "response_mime_type": "application/json"
    }

    response = model.generate_content(
        prompt,
        generation_config=generation_config
    )

    try:
        structured_json = json.loads(response.text)
        return structured_json
    except json.JSONDecodeError:
        return {"error": "Failed to parse structured output."}


In [13]:
prompt = f"""
You are a personal AI tutor. Based on this content below, return a structured lesson plan in JSON format with:
- topic
- key concepts
- difficulty level (easy, medium, hard)
- estimated study time in minutes
- 3 quiz questions

Content: {transcript}
"""
lesson_plan = generate_structured_output(prompt)
print(json.dumps(lesson_plan, indent=2))


{
  "topic": "Beginner Guitar Lesson: E Major and A Major Chords",
  "key_concepts": [
    "Proper finger placement on frets (close to the metal strip, on the tip of the finger)",
    "Tuning the guitar",
    "E Major chord fingering and strumming all six strings",
    "A Major chord fingering (using the first finger as an anchor) and strumming from the fifth string",
    "Changing between E Major and A Major chords smoothly",
    "Understanding bars and beats (counting to four)",
    "Strumming pattern (four strums per chord)",
    "Playing 'For What It's Worth' by Buffalo Springfield using E Major and A Major chords"
  ],
  "difficulty_level": "easy",
  "estimated_study_time": 10,
  "quiz_questions": [
    {
      "question": "Where should your fingers be placed on the fret when playing a chord?",
      "options": [
        "In the middle of the fret",
        "On the metal strip",
        "Close to the metal strip"
      ],
      "answer": "Close to the metal strip"
    },
    {
   

# ✅ Step 7: Convert Structured Lesson Plan to Audio
Now that we have the structured lesson plan as JSON, let's convert key parts of it—like the topic, key concepts, and quiz questions—into spoken audio using gTTS.

In [14]:
!pip install gTTS


Collecting gTTS
  Downloading gTTS-2.5.4-py3-none-any.whl.metadata (4.1 kB)
Downloading gTTS-2.5.4-py3-none-any.whl (29 kB)
Installing collected packages: gTTS
Successfully installed gTTS-2.5.4


In [15]:
# Assume `lesson_plan_json` is the dynamic JSON returned from your model or API
lesson_plan_json = """
{
  "topic": "Beginner Guitar Lesson: E and A Chords",
  "key_concepts": [
    "Proper guitar tuning",
    "Finger placement on frets (tips close to the fret)",
    "E major chord fingering and strumming",
    "A major chord fingering and strumming",
    "Changing between E and A chords (anchor finger technique)",
    "Understanding bars and beats (4/4 time)",
    "Strumming patterns (four strums per chord)",
    "Playing 'For What It's Worth' by Buffalo Springfield"
  ],
  "difficulty_level": "easy",
  "estimated_study_time": 10,
  "quiz_questions": [
    {
      "question": "Where should your fingers be placed in relation to the fret when playing a chord?",
      "options": [
        "In the middle of the fret",
        "Right on top of the metal fret",
        "Close to the metal fret, on the side nearest to you"
      ],
      "answer": "Close to the metal fret, on the side nearest to you"
    },
    {
      "question": "Which finger is used as an 'anchor' when changing between the E and A major chords in the lesson?",
      "options": [
        "Index finger",
        "Middle finger",
        "Ring finger"
      ],
      "answer": "Index finger"
    },
    {
      "question": "How many strums should you play for each chord (E and A) in a bar when playing 'For What It's Worth'?",
      "options": [
        "2",
        "4",
        "6"
      ],
      "answer": "4"
    }
  ]
}
"""

# Convert the JSON string into a Python dictionary
import json

lesson_plan = json.loads(lesson_plan_json)

# Now you can access it directly
print(lesson_plan["topic"])  # Print the topic


Beginner Guitar Lesson: E and A Chords


In [16]:
from gtts import gTTS
import IPython.display as ipd

def text_to_speech(text, lang='en'):
    tts = gTTS(text=text, lang=lang)
    audio_path = "guitar_lesson_audio.mp3"
    tts.save(audio_path)
    return ipd.Audio(audio_path)

# Automatically create the lesson summary
lesson_summary = f"""
Today's guitar tutorial is about {lesson_plan['topic']}.
We will cover the following key concepts: {', '.join(lesson_plan['key_concepts'])}.
The difficulty level is {lesson_plan['difficulty_level']}.
You will need approximately {lesson_plan['estimated_study_time']} minutes to study this topic.
"""

# Generate the speech
audio_output = text_to_speech(lesson_summary)

# Play the audio
audio_output


# 🚀 Next Steps: Making It Smarter & More Personal
Now that we’ve successfully converted structured lesson plans into audio, here’s how we can level it up:

Personalization Based on User Level
Let users choose their level (beginner, intermediate, expert), or even their age group (kids, teens, adults). For example:

Kids can get lessons with simpler examples and playful language.

Adults might prefer real-world analogies or technical depth.

Intermediate learners could skip basics and focus on technique improvements.

Support More File Types
Enable parsing from various file types:

.pdf, .txt, .docx, and more.

This way, users can upload any format and still get lessons + audio output.

Specialized Personal Tutors per Domain
Scale the idea by training different lightweight models for each niche:

🥋 Martial Arts Tutor

🎸 Guitar Tutor

🧮 Mathematics Tutor

📚 Language Tutor
Each one becomes an expert only in that subject, offering rich, contextual, and focused guidance.
While there are generalist AI tools, this approach keeps it simple, low-cost, and highly performant, especially since the datasets per tutor are focused and not massive.

# 🎯 Conclusion: A New Era of AI-Powered Micro Tutoring
What we’re building is not just another assistant. It’s a personal tutor in your pocket—something that understands your level, your pace, and your interest. By keeping things personalized and lightweight:

We reduce compute costs.

We increase model performance.

We provide a truly human-like experience without needing large-scale data.

This is micro-AI done right—not a one-size-fits-all bot, but a series of focused, affordable, and scalable mini tutors that can grow with the learner.

With each step, we move closer to creating an ecosystem of smart, personalized learning companions—and this is just the beginning. 🚀

**Thank you! Let’s make learning smarter.**
