# GenAI-Powered RAG Assistant for "Fundamentals of Electric Circuits"

This notebook showcases a Retrieval-Augmented Generation (RAG) assistant built using the textbook **Fundamentals of Electric Circuits by Alexandar and Sadiku**. This assistant allow users to ask questions and receive accurate, grounded answers directly from the book’s content.

The system leverages the power of **Google's Gemini API**, **text embeddings**, and **ChromaDB** to retrieve and generate responses. It also supports **chapter-wise summarization** and **structured JSON output** for seamless integration with apps or UIs.

## Features
- Text cleaning and preprocessing
- Chunking by chapters with metadata
- Semantic embeddings using `models/text-embedding-004`
- Vector storage in ChromaDB
- Question Answering using RAG
- Structured JSON Output for clean, readable answers


# Setup

In [None]:
!pip install pymupdf
!pip install chromadb

In [None]:
from google import genai
from google.genai import types
from IPython.display import Markdown

genai.__version__

'1.10.0'

In [None]:
from google.colab import userdata
userdata.get('GOOGLE_API_KEY')
client = genai.Client(
    api_key=userdata.get('GOOGLE_API_KEY')
)

In [None]:
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate=is_retriable)(genai.models.Models.generate_content)

# Cleaning Data

> Loading the text file

In [None]:
import kagglehub
doc = kagglehub.dataset_download("syedsharjeelnajam/fundamentals-of-electric-circuits")

import fitz

doc = fitz.open('/content/drive/MyDrive/Colab Notebooks/Fundamentals_of_Electric_Circuits.txt')
all_pages = [page.get_text() for page in doc]
doc.close()

> Cleaning text file by removing extra spaces, figures, Characters, Headers

In [None]:
import re

def clean_text(text):
    # Remove multiple newlines
    text = re.sub(r'\n+', '\n', text)

    # Remove page numbers, figure/table labels
    text = re.sub(r'Figure\s+\d+\.\d+|Table\s+\d+\.\d+', '', text, flags=re.IGNORECASE)

    # Remove weird non-ASCII characters and formatting
    text = re.sub(r'[^\x00-\x7F]+', ' ', text)

    # Remove numbered headers like "Chapter 1", "Section 1.1"
    text = re.sub(r'(Chapter|Section)\s+\d+(\.\d+)?', '', text)

    # Strip leading/trailing whitespace from lines
    lines = [line.strip() for line in text.split('\n') if line.strip()]

    # Replace 2+ line breaks with just one
    text = re.sub(r'\n{2,}', '\n', text)

    # Remove line breaks in the middle of sentences
    text = re.sub(r'(?<!\n)\n(?!\n)', ' ', text)

    return '\n'.join(lines)

> The Cleaned text file is stored in list `texts`

In [None]:
texts = []
for page in all_pages:
  texts.append(clean_text(page))

# Function Embedding the Document

>**Embedding** refers to the process of converting words, sentences, or documents into numerical vectors that capture their meaning, allowing AI models to understand and compare them semantically.

In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.genai import types
# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

In [None]:
class GeminiEmbeddingFunction(EmbeddingFunction):
    doc_mode = True
    def __call__(self, input: texts) -> Embeddings:
        if self.doc_mode:
            task = "retrieval_document"
        else:
            task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=task,
            ),
        )
        return [e.values for e in response.embeddings]

# Converting text file into Vector Database (ChromaDB)

> **Vector storage** is a system that stores text as numerical vectors (embeddings) and allows for fast similarity search. Instead of searching for exact words, it finds content that’s semantically similar based on meaning.

> **ChromaDB** is an open-source vector database designed to store embeddings and perform fast, efficient similarity searches on them.

In [None]:
import chromadb

DB_NAME = "fundamentals_of_electric_circuits"

embed_fn = GeminiEmbeddingFunction()
embed_fn.doc_mode = True

chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

# Define batch size
BATCH_SIZE = 100  # Adjust this based on the API limit

# Split the texts list into batches
for i in range(0, len(texts), BATCH_SIZE):
    batch_texts = texts[i : i + BATCH_SIZE]
    batch_ids = [str(j) for j in range(i, i + len(batch_texts))]

    # Embed and add the batch to the collection
    collection.add(documents=batch_texts, ids=batch_ids)

  embed_fn = GeminiEmbeddingFunction()


Check for successful updation of data into Vector Storage

In [None]:
collection.count()

3655

# Functions (Tools)

> **Functions or tools** are real code functions that you define and you let the GenAI model call them automatically when needed.

In [None]:
chap_detail = {'Chapters': ['Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4', 'Chapter 5', 'Chapter 6', 'Chapter 7', 'Chapter 8', 'Chapter 9',
                            'Chapter 10', 'Chapter 11', 'Chapter 12', 'Chapter 13', 'Chapter 14', 'Chapter 15', 'Chapter 16', 'Chapter 17', 'Chapter 18',
                            'Chapter 19'],
               'Names': ['Basic Concepts', 'Basic Laws', 'Methods of Analysis', 'Circuit Theorems', 'Operational Amplifier', 'Capacitors & Inductor',
                         'First Order Circuits', 'Second Order Circuits', 'Sinusoids & Phasors', 'Sinusoidal Steady State Analysis', 'AC Power Analysis',
                         'Three Phase Circuits', 'Magnetically Coupled Circuits', 'Frequency Response', 'Introduction to Laplace Transform',
                         'Applications of Laplace Transform', 'The Fourier Series', 'Fourier Transform', 'Two Port Networks',]
               }

# Retrieval Process

> **Retrieval Process** refers to the process of processing query by user and producing the AI-generated response

In [None]:
embed_fn.doc_mode = False

query = "Explain Fourier Analysis"
result = collection.query(query_texts=[query], n_results=5)
[all_passages] = result["documents"]

> **Prompt** is the set of instructions given to the model after producing the answer or result. It tells them their role, how to handle different types of quereies, and how to answer

In [None]:
query_oneline = query.replace("\n", " ")
db_tools = [chap_detail]
prompt = f"""You are a textbook assistant answering questions using "Fundamentals of Electric Circuits". Use the context below and return your answer in this JSON format:
Then answer in following format
'answer': '', 'source_chapter': '', "keywords": ['', '']

Question: {query_oneline}"""

for passage in all_passages:
    passage_oneline = passage.replace("\n", " ")
    prompt += f"PASSAGE: {passage_oneline}\n"

In [None]:
answer = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt)

> **JSON (JavaScript Object Notation)** is a lightweight format used to structure data in a way that other apps or UIs can display it easily

In [None]:
import json

raw_text = answer.text.strip()

if raw_text.startswith("```json"):
    raw_text = raw_text[7:]
if raw_text.endswith("```"):
    raw_text = raw_text[:-3]
data = json.loads(raw_text)
Markdown(data['answer'])

Fourier analysis is a mathematical tool that represents a periodic function \(f(t)\) as a sum of a DC component and an AC component, which consists of an infinite series of harmonic sinusoids. The exponential Fourier series describes the spectrum of \(f(t)\) using the amplitude and phase angle of AC components at positive and negative harmonic frequencies.

# Project Summary

This project demonstrates a practical application of Generative AI using RAG to create a smart textbook assistant.

## What Was Built:
- Cleaned and chunked the full **Fundamentals of Electric Circuits** textbook
- Stored document chunks in **ChromaDB** with metadata per chapter
- Used **text embeddings** from Gemini to enable semantic search
- Built a **RAG system** to generate accurate, book-based answers
- Added **structured JSON output** to make responses usable in apps

## GenAI Capabilities Demonstrated:
- **Embeddings**
- **Retrieval-Augmented Generation (RAG)**
- **Vector Search (ChromaDB)**
- **Document Understanding**
- **Structured Output / JSON Mode**

## Next Steps:
- Add user interface (e.g., Streamlit or chatbot)
- Enable multi-turn memory or context caching
- Expand to multi-book support or cross-referencing

This notebook is part of the **Google x Kaggle GenAI Intensive Course Capstone (2025Q1)** and demonstrates applied GenAI in education.
