<a href="https://colab.research.google.com/github/abhichiku18/smart-study-assistant/blob/main/smart_study_assistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Smart Study Assistant – AI-Powered Q&A Chatbot**
Built with Gradio, Transformers & FAISS for semantic PDF question answering.

In [None]:
!pip install gradio sentence-transformers faiss-cpu transformers pymupdf


Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting pymupdf
  Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>

## Install dependencies
We use transformers for embeddings, faiss-cpu for vector search, pymupdf for PDF reading, and Gradio for the web interface.

In [None]:
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import faiss
import numpy as np
import fitz  # PyMuPDF
import re
import gradio as gr

# Load models once
embedder = SentenceTransformer('all-MiniLM-L6-v2')
qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
def process_pdf(pdf_file):
    try:
        doc = fitz.open(pdf_file.name)
        text = ""
        for page in doc:
            text += page.get_text()
        doc.close()

        if not text.strip():
            return None, None,

        sentences = re.split(r'(?<=[.!?]) +', text)
        chunks, current = [], ""
        for sentence in sentences:
            if len(current) + len(sentence) <= 500:
                current += sentence + " "
            else:
                chunks.append(current.strip())
                current = sentence + " "
        if current:
            chunks.append(current.strip())

        if not chunks:
            return None, None, " Failed: No text chunks created."

        embeddings = embedder.encode(chunks)
        if embeddings.shape[0] == 0:
            return None, None, " Failed: No embeddings created."

        index = faiss.IndexFlatL2(embeddings.shape[1])
        index.add(np.array(embeddings))

        return index, chunks, "PDF processed! You can now ask questions."

    except Exception as e:
        return None, None, f"❌ Error: {str(e)}"


In [None]:
def answer_question(question, index, chunks):
    """
    Finds top similar chunks and uses QA model to answer.
    """
    q_embedding = embedder.encode([question])
    distances, indices = index.search(np.array(q_embedding), k=3)
    top_chunks = [chunks[i] for i in indices[0]]

    # Ask QA model on top chunks
    answers = []
    for context in top_chunks:
        result = qa_model(question=question, context=context)
        answers.append((result['score'], result['answer']))

    # Return best scored answer
    best_answer = sorted(answers, reverse=True)[0][1]
    return best_answer


In [None]:
# Store index & chunks after upload
global_index = None
global_chunks = None

def upload_file(file):
    global global_index, global_chunks
    index, chunks, status = process_pdf(file)
    global_index, global_chunks = index, chunks
    return status

def ask(question):
    if global_index is None:
        return "⚠️ Please upload a PDF first!"
    return answer_question(question, global_index, global_chunks)

with gr.Blocks() as demo:
    gr.Markdown("# 📚 Smart Study Assistant – AI Q&A on your PDF")

    with gr.Row():
        upload = gr.File(label="📄 Upload PDF")
        upload_output = gr.Textbox(label="Status")
    upload_btn = gr.Button("Process PDF")
    upload_btn.click(upload_file, inputs=upload, outputs=upload_output)

    question = gr.Textbox(label="❓ Your Question")
    answer = gr.Textbox(label=" Answer")
    ask_btn = gr.Button("Get Answer")
    ask_btn.click(ask, inputs=question, outputs=answer)

demo.launch()


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://379296a1d3c5f68f4d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


