## This notebook demonstrates how the AI research system works

In [None]:
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

## Load Environment Variables

In [5]:
from dotenv import load_dotenv

load_dotenv(project_root / ".env")

from src.config import settings

try:
    settings.validate()
    print(f"\nLLM Fast Model: {settings.GEMINI_MODEL_FAST}")
    print(f"LLM Smart Model: {settings.GEMINI_MODEL_SMART}")
    print(f"Google Search API: {'+' if settings.GOOGLE_SEARCH_API_KEY else '-'}")
    print(f"Google CSE ID: {'+' if settings.GOOGLE_CSE_ID else '-'}")
    print(f"Jina API Key: {'+' if settings.JINA_API_KEY else '-'}")
    print(f"Gemini API Key: {'+' if settings.GEMINI_API_KEY else '-'}")
    
except ValueError as e:
    print(f"\nConfiguration Error: {e}")

Configuration loaded
LLM FAST: gemini-2.0-flash-lite
LLM SMART: gemini-2.5-flash

LLM Fast Model: gemini-2.0-flash-lite
LLM Smart Model: gemini-2.5-flash
Google Search API: +
Google CSE ID: +
Jina API Key: +
Gemini API Key: +


## Test Google Search Component

In [6]:
from src.search.google_search import GoogleSearcher

searcher = GoogleSearcher(
    api_key=settings.GOOGLE_SEARCH_API_KEY,
    cse_id=settings.GOOGLE_CSE_ID
)

test_query = "quantum computing basics"
print(f"Searching for: '{test_query}'")

results = await searcher.search(test_query, num_results=5)

print(f"\nFound {len(results)} results\n")

for i, result in enumerate(results, 1):
    print(f"[{i}] {result['title']}")
    print(f"    URL: {result['link'][:60]}...")
    print(f"    Snippet: {result['snippet'][:100]}...")
    print()

Searching for: 'quantum computing basics'
[12:31:25] [32mINFO[0m - Found 5 results for 'quantum computing basics'

Found 5 results

[1] BlueQubit: Quantum Computing Basics
    URL: https://www.bluequbit.io/quantum-computing-basics...
    Snippet: Oct 19, 2023 ... This article provides an accessible and informative guide to the basics of quantum ...

[2] What Is Quantum Computing? | IBM
    URL: https://www.ibm.com/think/topics/quantum-computing...
    Snippet: Quantum computing is built on the principles of quantum mechanics, which describe how very small obj...

[3] Basic Quantum Computing — Introduction | by Charlie Thomas ...
    URL: https://medium.com/@charlie.thomas_94667/basic-quantum-compu...
    Snippet: Sep 25, 2023 ... The series is split into 3 main sections. The first section will teach you all the ...

[4] The basics of Quantum Computing - Quantum Inspire
    URL: https://www.quantum-inspire.com/kbase/introduction-to-quantu...
    Snippet: 500: Couldn't resolve componen

## Test Jina Web Scraper

In [8]:
from src.search.jina_scraper import JinaWebScraper

scraper = JinaWebScraper(max_content_length=3000)

test_url = results[0]['link']

print(f"Scraping: {test_url}")

content = await scraper.scrape_url(test_url)

if content:
    print(f"\nSuccessfully scraped {len(content)} characters\n")
    print("First 500 characters:")
    print(content[:500])
    print(f"\n... ({len(content) - 500} more characters)")
else:
    print("Scraping failed")

Scraping: https://www.bluequbit.io/quantum-computing-basics
[12:33:11] [32mINFO[0m - Scraped 3003 chars from https://www.bluequbit.io/quantum-computing-basics...

Successfully scraped 3003 characters

First 500 characters:
Title: Breaking Down the Barriers: Quantum Computing Basics Explained!

URL Source: https://www.bluequbit.io/quantum-computing-basics

Markdown Content:
Quantum computing is a rapidly evolving field with the potential to revolutionize the way we approach computing.

**The ability to solve problems exponentially faster than classical computers could put quantum computing in a position to have a significant impact on many industries.**

This article provides an accessible and informative guide to 

... (2503 more characters)


## Test RAG Store

1. Split into chunks
2. Embedded into vectors
3. Stored in ChromaDB
4. Retrieved using semantic search

In [9]:
from src.rag.store import RAGStore

rag_store = RAGStore(jina_key=settings.JINA_API_KEY)

collection_id = "demo_quantum_computing"

print("1: Chunking and Indexing")

documents = [{
    "url": test_url,
    "content": content,
    "title": results[0]['title']
}]

chunks_added = await rag_store.add_documents(
    collection_id=collection_id,
    docs=documents,
    quality_scores={test_url: 0.95}
)

print(f"Added {chunks_added} chunks to vector database\n")

print("2: Semantic Search")

search_query = "what are qubits?"
print(f"Query: '{search_query}'\n")

retrieved_chunks = await rag_store.search(
    collection_id=collection_id,
    query=search_query,
    n=3
)

print(f"Retrieved {len(retrieved_chunks)} most relevant chunks:\n")

for i, chunk in enumerate(retrieved_chunks, 1):
    print(f"[Chunk {i}] Relevance Score: {chunk['score']:.3f}")
    print(f"Content: {chunk['content'][:200]}...")
    print(f"Source: {chunk['url']}")
    print()

1: Chunking and Indexing
[12:36:13] [32mINFO[0m - Added 5 chunks to demo_quantum_computing
Added 5 chunks to vector database

2: Semantic Search
Query: 'what are qubits?'

[12:36:14] [32mINFO[0m - Retrieved 3 chunks from demo_quantum_computing
Retrieved 3 most relevant chunks:

[Chunk 1] Relevance Score: 0.509
Content: ‍

### **The Uncertainty Principle**

The uncertainty principle, also known as Heisenberg's uncertainty principle, states that the position and momentum of a particle cannot be precisely measured simu...
Source: https://www.bluequbit.io/quantum-computing-basics

[Chunk 2] Relevance Score: 0.761
Content: So, what are the basics of quantum computing? Quantum computing is a computing paradigm that relies on the principles of the [quantum mechanical model](https://www.bluequbit.io/quantum-mechanical-mode...
Source: https://www.bluequbit.io/quantum-computing-basics

[Chunk 3] Relevance Score: 0.712
Content: ![Image 1](https://cdn.prod.website-files.com/63d774102fb109fd799

## Test LLM

In [10]:
from src.services.llm import GeminiLLM
from src.prompts import get_topic_breakdown_prompt, get_reflection_prompt, format_context_chunks

llm_fast = GeminiLLM(settings.GEMINI_MODEL_FAST, "FAST")
llm_smart = GeminiLLM(settings.GEMINI_MODEL_SMART, "SMART")

print("Test 1: Topic Breakdown (SMART model)")

query = "How does quantum computing work?"
prompt = get_topic_breakdown_prompt(query, num_topics=3)

print("Prompt being sent:")
print(prompt)

response = await llm_smart.generate(prompt, max_tokens=500)

print("\nLLM Response:")
print(response)

print("\nTest 2: Reflection Decision")

context = format_context_chunks(retrieved_chunks[:2])
reflection_prompt = get_reflection_prompt(
    topic="what are qubits?",
    parent_query="How does quantum computing work?",
    context=context,
    searches=["quantum computing basics"],
    num_chunks=len(retrieved_chunks)
)

print("This prompt asks the LLM: 'Do we have enough information or should we search more?'")
print()

json_response = await llm_fast.generate_json(reflection_prompt, max_tokens=500)

print("LLM Decision:")
import json
print(json.dumps(json_response, indent=2))

  from .autonotebook import tqdm as notebook_tqdm


Test 1: Topic Breakdown (SMART model)
Prompt being sent:
Break this research question into 3 specific sub-topics:

Question: "How does quantum computing work?"

RULES:
1. Each sub-topic MUST include the main subject from the query.
2. Be SPECIFIC and SEARCHABLE (good for Google Search).
3. Cover different aspects of the question.
4. Generate EXACTLY 3 topics, one per line
5. NO numbering, NO markdown formatting

Example Input: "How does quantum computing work?"
Example Output:
Quantum computing basic principles and qubits
Quantum algorithms and quantum gates explained
Current quantum computers and their applications

Now generate 3 sub-topics for: "How does quantum computing work?"


LLM Response:
Quantum computing basic principles: qubits, superposition, and entanglement
Quantum computing operations: quantum

Test 2: Reflection Decision
This prompt asks the LLM: 'Do we have enough information or should we search more?'

LLM Decision:
{
  "facts_learned": [
    "Quantum computing relie

## Show how the system creates a research plan from a user query

In [12]:
from src.pipeline import DeepResearchPipeline
from src.services import set_current_session
import uuid

session_id = str(uuid.uuid4())
set_current_session(session_id)

print(f"Session ID: {session_id[:8]}...\n")

pipeline = DeepResearchPipeline(session_id)

user_query = "What are the latest developments in fusion energy?"
depth = "standard"

print(f"User Query: {user_query}")
print(f"Depth: {depth}")
print("Creating Research Plan...")

plan = await pipeline.create_plan(user_query, depth)

print(f"Plan Created!\n")
print(f"Main Query: {plan['query']}")
print(f"\nResearch Strategy:\n{plan['reasoning']}\n")
print(f"Sub-topics to investigate ({len(plan['sub_topics'])}):")
for i, topic in enumerate(plan['sub_topics'], 1):
    print(f"{i}. {topic}")

Session ID: 5ecb36d5...

[12:43:50] [32mINFO[0m - Pipeline created for session 5ecb36d5
User Query: What are the latest developments in fusion energy?
Depth: standard
Creating Research Plan...
[12:43:50] [32mINFO[0m - Creating plan for: 'What are the latest developments in fusion energy?' (depth: standard)
[12:43:50] [32mINFO[0m - Cache initialized (session-based)
Created new session: 5ecb36d5
[12:43:54] [32mINFO[0m - Generated 5 sub-topics
Plan Created!

Main Query: What are the latest developments in fusion energy?

Research Strategy:
Our research strategy will analyze technical breakthroughs in plasma confinement, energy gain, and materials across diverse fusion approaches (e.g., magnetic, inertial) to identify core scientific progress. Simultaneously, we'll investigate developments in funding, commercialization efforts, and regulatory landscapes to gauge the field's practical momentum and future viability.

Sub-topics to investigate (5):
1. Latest developments in magnetic c

## Executes the full research process

In [13]:
async def progress_callback(status):
    print(f"Status: {status}")

result = await pipeline.execute_research(plan, on_progress=progress_callback)
print("RESEARCH COMPLETE!")

[12:45:57] [32mINFO[0m - Executing research with 5 topics
Status: Researching sub-topics...
[12:45:57] [32mINFO[0m - Planning: What are the latest developments in fusion energy?
[12:45:57] [32mINFO[0m -     1. Latest developments in magnetic confinement fusion energy projects
[12:45:57] [32mINFO[0m -     2. Recent breakthroughs in inertial confinement fusion energy
[12:45:57] [32mINFO[0m -     3. Advances in materials science for fusion energy reactors
[12:45:57] [32mINFO[0m -     4. Private sector investment and commercialization pathways for fusion energy
[12:45:57] [32mINFO[0m -     5. New fuel cycles and plasma heating techniques in fusion energy research
[12:45:57] [32mINFO[0m - Dispatch (Iter 1)
[12:45:57] [32mINFO[0m - Searching: Latest developments in magnetic confinement fusion...
[12:45:57] [32mINFO[0m - Searching: Recent breakthroughs in inertial confinement fusio...
[12:45:58] [32mINFO[0m - Searching: Advances in materials science for fusion energy re.

## View Research Results

In [17]:
print("RESEARCH RESULTS")
print(f"\nQuery: {result['query']}")
print(f"Timestamp: {result['timestamp']}")
print(f"Total Iterations: {result['iterations']}")
print(f"\nQuality Metrics:")
print(f"  Confidence: {result['quality_metrics']['confidence']:.1%}")
print(f"  Sources Found: {result['quality_metrics']['source_count']}")
print(f"  Sources Cited: {len(result['citations'])}")

print("\nSUB-TOPICS RESEARCHED")
for i, topic in enumerate(result['sub_topics'], 1):
    print(f"{i}. {topic}")

print("\nREPORT")
print()
print(result['report_text'])

print(f"SOURCES ({len(result['sources'])})")

cited_ids = set(result['citations'])
cited_sources = [s for s in result['sources'] if s.get('id') in cited_ids]

print(f"\nCited in Report ({len(cited_sources)}):")
for source in cited_sources:
    print(f"  [{source.get('id')}] {source.get('title', 'Untitled')[:60]}")
    print(f"      {source.get('url', '')[:70]}")
    print()

RESEARCH RESULTS

Query: What are the latest developments in fusion energy?
Timestamp: 2025-12-14T12:47:17.023673
Total Iterations: 2

Quality Metrics:
  Confidence: 72.0%
  Sources Found: 37
  Sources Cited: 11

SUB-TOPICS RESEARCHED
1. Latest developments in magnetic confinement fusion energy projects
2. Recent breakthroughs in inertial confinement fusion energy
3. Advances in materials science for fusion energy reactors
4. Private sector investment and commercialization pathways for fusion energy
5. New fuel cycles and plasma heating techniques in fusion energy research

REPORT

## The Latest Developments in Fusion Energy

### Executive Summary

The pursuit of fusion energy, a clean and potentially limitless power source, is experiencing significant advancements across multiple fronts, ranging from groundbreaking confinement research to accelerating private investment and critical materials science [2, 4, 7, 8]. Breakthroughs in both magnetic and inertial confinement fusion approach