# GeoResearch Assistant Demo

This notebook demonstrates the GeoResearch Assistant, a small LLM-powered system that answers questions about geography and climate using Retrieval-Augmented Generation (RAG).
It uses the module **GeoAssistant**, which combines retrieval and generation to answer questions

## How It Works

The GeoResearch Assistant uses a RAG (Retrieval-Augmented Generation) approach:

1. **Embedding**: Your question is converted to a vector using sentence-transformers
2. **Retrieval**: The system finds the most relevant documents using cosine similarity
3. **Generation**: The retrieved documents are used as context for Flan-T5 to generate an answer

## Setup

First, let's import the necessary modules and initialize the assistant.

In [15]:
# Import the GeoAssistant and dataset
from geo_assistant import GeoAssistant
from data import get_dataset, get_dataset_info

# Show dataset information
dataset_info = get_dataset_info()
print(f"Dataset contains {dataset_info['num_documents']} documents")
print(f"Topics covered: {', '.join(dataset_info['topics'])}")

Dataset contains 35 documents
Topics covered: Temperature and Climate, Geography and Landforms, Countries and Regions, Climate Phenomena, Oceans and Water Bodies, Ecosystems and Biomes, Climate Change


## Initialize the Assistant

This will download and load the models (may take a few minutes on first run).

In [16]:
# Initialize the GeoResearch Assistant
assistant = GeoAssistant(
    embedder_model="all-MiniLM-L6-v2",  # Lightweight embedding model
    llm_model="google/flan-t5-small"     # Lightweight generation model
)

Initializing GeoResearch Assistant...
Loading model google/flan-t5-small on cpu...
Model loaded successfully!
GeoResearch Assistant initialized successfully!


## Load Knowledge Base

Load the geographic and climate dataset into the assistant's knowledge base.

In [17]:
# Load the dataset
documents = get_dataset()
assistant.load_knowledge_base(documents)

# Check knowledge base info
kb_info = assistant.get_knowledge_base_info()
print(f"\nKnowledge base loaded: {kb_info['num_documents']} documents")
print(f"Embedding dimension: {kb_info['embedding_dimension']}")

Loading 35 documents into knowledge base...
Knowledge base loaded successfully!

Knowledge base loaded: 35 documents
Embedding dimension: 384


## Ask Questions

Now let's ask some questions about geography and climate!

In [18]:
# Example 1: Question about a specific location
question1 = "What is the highest mountain in the world?"
result1 = assistant.ask(question1, top_k=2, return_context=True)

print(f"Question: {result1['question']}")
print(f"\nAnswer: {result1['answer']}")
print(f"\n--- Retrieved Context ---")
for i, doc in enumerate(result1['retrieved_docs'], 1):
    print(f"\n[Document {i}] (Score: {doc['score']:.4f})")
    print(doc['text'])

Question: What is the highest mountain in the world?

Answer: Mount Everest

--- Retrieved Context ---

[Document 1] (Score: 0.6933)
Mount Everest, located in the Himalayas on the border between Nepal and Tibet, is the Earth's highest mountain above sea level at 8,848.86 meters (29,031.7 feet).

[Document 2] (Score: 0.3645)
Lake Baikal in Siberia, Russia, is the world's deepest and oldest freshwater lake, containing about 23% of the world's fresh surface water and reaching depths of 1,642 meters.


In [19]:
# Example 2: Question about climate
question2 = "What is the Amazon rainforest climate like?"
result2 = assistant.ask(question2, top_k=3)

print(f"Question: {result2['question']}")
print(f"\nAnswer: {result2['answer']}")

Question: What is the Amazon rainforest climate like?

Answer: tropical


In [20]:
# Example 3: Question about oceans
question3 = "Which is the largest ocean on Earth?"
result3 = assistant.ask(question3, top_k=2)

print(f"Question: {result3['question']}")
print(f"\nAnswer: {result3['answer']}")

Question: Which is the largest ocean on Earth?

Answer: Pacific Ocean


In [21]:
# Example 4: Question about climate change
question4 = "How much have global temperatures increased?"
result4 = assistant.ask(question4, top_k=3)

print(f"Question: {result4['question']}")
print(f"\nAnswer: {result4['answer']}")

Question: How much have global temperatures increased?

Answer: 1.1°C


In [22]:
# Example 5: Question about a specific country
question5 = "What is special about the Netherlands geography?"
result5 = assistant.ask(question5, top_k=2)

print(f"Question: {result5['question']}")
print(f"\nAnswer: {result5['answer']}")

Question: What is special about the Netherlands geography?

Answer: It is famous for its extensive system of dikes and polders


## Try Your Own Questions!

Feel free to ask your own questions about geography and climate.

In [23]:
# Ask your own question
your_question = "What causes monsoons?"  # Change this to your question

result = assistant.ask(your_question, top_k=3, return_context=True)

print(f"Question: {result['question']}")
print(f"\nAnswer: {result['answer']}")
print(f"\n--- Retrieved Context ---")
for i, doc in enumerate(result['retrieved_docs'], 1):
    print(f"\n[Document {i}] (Score: {doc['score']:.4f})")
    print(doc['text'][:200] + "..." if len(doc['text']) > 200 else doc['text'])

Question: What causes monsoons?

Answer: heavy rainfall to South and Southeast Asia

--- Retrieved Context ---

[Document 1] (Score: 0.6765)
Monsoons are seasonal wind patterns that bring heavy rainfall to South and Southeast Asia, typically from June to September. India receives about 80% of its annual rainfall during the summer monsoon.

[Document 2] (Score: 0.4386)
Extreme weather events, including heatwaves, droughts, floods, and hurricanes, are becoming more frequent and intense due to climate change, affecting millions of people worldwide.

[Document 3] (Score: 0.3854)
El Niño is a climate pattern characterized by unusually warm ocean temperatures in the Equatorial Pacific, affecting weather patterns globally and typically occurring every 2-7 years.
