# 1. Install required packages

We install all the dependencies needed for building a
Retrieval-Augmented Generation (RAG) pipeline.
These include LangChain components, Hugging Face models,
ChromaDB for vector storage, and PyTorch for GPU acceleration.

In [2]:
%pip install somepackage -qq langchain langchain-community langchain-core langchain-text-splitters langchain-huggingface sentence-transformers chromadb transformers torch accelerate unstructured codecarbon

Note: you may need to restart the kernel to use updated packages.


# 2. Import libraries and set configuration

Here we import the necessary modules and define paths, constants,
and model settings.
We also suppress warnings to keep the notebook output clean.

In [4]:
from pathlib import Path
import json
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from codecarbon import OfflineEmissionsTracker
import torch
import warnings

warnings.filterwarnings("ignore")

PROMPTS_FILE = "data/test_data.json"
PERSIST_DIR = "data/chroma_db"
EMBED_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 400
CHUNK_OVERLAP = 50
TOP_K_RESULTS = 5
RELEVANCE_THRESHOLD = 0.3
LLM_MODEL = "MBZUAI/LaMini-Flan-T5-248M"
MAX_NEW_TOKENS = 100
LLM_TEMPERATURE = 0.2
COUNTRY_ISO_CODE = "EGY"
USE_GPU = torch.cuda.is_available()

PROMPT_TEMPLATE = """Answer the question about Apollo 11 based on the context below. If you cannot answer based on the context, say "I don't have enough information to answer that."

Context:
{context}

Question: {question}

Answer:"""

# 3. Initialize embedding model and text splitter

The embedding model converts text into numeric vectors, while the text
splitter breaks long documents into manageable chunks for retrieval.

In [5]:
embedder = HuggingFaceEmbeddings(model_name=EMBED_MODEL)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
)

# 4. Load the local language model

We initialize a small, local LLM (LaMini-Flan-T5) that can run on CPU or GPU.
This model will later generate answers based on retrieved context.

In [6]:
def initialize_local_llm():
    tracker = OfflineEmissionsTracker(country_iso_code=COUNTRY_ISO_CODE)
    tracker.start()
    
    device = 0 if USE_GPU else -1
    tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)
    model = AutoModelForSeq2SeqLM.from_pretrained(
        LLM_MODEL,
        torch_dtype=torch.float16 if USE_GPU else torch.float32,
        device_map="auto" if USE_GPU else None,
        low_cpu_mem_usage=True,
    )
    pipe = pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=MAX_NEW_TOKENS,
        temperature=LLM_TEMPERATURE,
        repetition_penalty=1.2,
        do_sample=False,
        top_p=0.95,
        device=device,
    )
    
    emissions = tracker.stop()
    print(f"Model loading emissions: {emissions:.6f} kg CO2")
    
    return HuggingFacePipeline(pipeline=pipe), emissions

llm, emissions_loading = initialize_local_llm()

[codecarbon INFO @ 18:55:03] offline tracker init
[codecarbon INFO @ 18:55:03] [setup] RAM Tracking...
[codecarbon INFO @ 18:55:03] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU

[codecarbon INFO @ 18:55:07] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
[codecarbon INFO @ 18:55:07] [setup] GPU Tracking...
[codecarbon INFO @ 18:55:07] No GPU found.
[codecarbon INFO @ 18:55:07] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 18:55:07] >>> Tracker's metadata:
[codecarbon INFO @ 18:55:07]   Platform system: Windows-11-10.0.26200-SP0
[codecarbon INFO @ 18:55:07]   Python version: 3.12.10
[codecarbon INFO @ 18:55:07]   CodeCarbon version: 3.0.8
[codecarbon INFO @ 18:55:07]   Available RAM : 15.843 GB
[codecarbon INFO 

Model loading emissions: 0.000047 kg CO2


# 5. Load documents from JSON

We read the context and metadata directly from a JSON file.
We also clean metadata and split text into chunks.

In [7]:
def load_documents_from_json(json_path=PROMPTS_FILE):
    data_path = Path(json_path)
    if not data_path.exists():
        print(f"JSON file not found at: {json_path}")
        return []

    with open(data_path, "r", encoding="utf-8") as f:
        data = json.load(f)

    source_text = data.get("source_text", "")
    metadata = data.get("metadata", {})

    if not source_text.strip():
        print("No source text found in JSON.")
        return []

    for k, v in metadata.items():
        if isinstance(v, (list, dict)):
            metadata[k] = str(v)

    split_docs = splitter.create_documents([source_text])

    for doc in split_docs:
        doc.metadata = metadata.copy()
        doc.metadata["topic"] = "Apollo 11"
        doc.metadata["section"] = ", ".join(metadata.get("sections", ["General"]))

    print(f"Loaded and split {len(split_docs)} chunks from JSON.")
    return split_docs

# 6. Build Chroma vector store

Here we embed the document chunks and save them into a local vector database (Chroma).
This enables fast similarity-based retrieval of relevant context later.

In [8]:
def build_chroma_store(docs, persist_dir=PERSIST_DIR):
    db = Chroma.from_documents(
        documents=docs, embedding=embedder, persist_directory=persist_dir
    )
    db.persist()
    return db

# 7. Calling the Load Document Function 

This cell loads the source document (text and metadata) from the JSON file, and
splits it into smaller chunks for embedding.

In [9]:
documents = load_documents_from_json()

Loaded and split 30 chunks from JSON.


# 8. Calling the Build Chroma Function

This cell builds a Chroma vector database
that stores those embeddings for efficient similarity search.

Once the database is built, it’s saved to disk,
so you only need to run this cell once, unless you change or add new data.

Running it again will overwrite the existing database.

In [10]:
tracker_embeddings = OfflineEmissionsTracker(country_iso_code=COUNTRY_ISO_CODE)
tracker_embeddings.start()

db = build_chroma_store(documents)

emissions_embeddings = tracker_embeddings.stop()
print(f"Embeddings creation emissions: {emissions_embeddings:.6f} kg CO2")

[codecarbon INFO @ 18:55:11] offline tracker init
[codecarbon INFO @ 18:55:11] [setup] RAM Tracking...
[codecarbon INFO @ 18:55:11] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU

[codecarbon INFO @ 18:55:14] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
[codecarbon INFO @ 18:55:14] [setup] GPU Tracking...
[codecarbon INFO @ 18:55:14] No GPU found.
[codecarbon INFO @ 18:55:14] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 18:55:14] >>> Tracker's metadata:
[codecarbon INFO @ 18:55:14]   Platform system: Windows-11-10.0.26200-SP0
[codecarbon INFO @ 18:55:14]   Python version: 3.12.10
[codecarbon INFO @ 18:55:14]   CodeCarbon version: 3.0.8
[codecarbon INFO @ 18:55:14]   Available RAM : 15.843 GB
[codecarbon INFO 

Embeddings creation emissions: 0.000055 kg CO2


# 9. Define query and response generation

These functions retrieve the most relevant text chunks and use the
LLM to answer a question.

In [11]:
def query_database(query_text, k=TOP_K_RESULTS, threshold=RELEVANCE_THRESHOLD):
    results = db.similarity_search_with_relevance_scores(query_text, k=k)

    if len(results) == 0 or results[0][1] < threshold:
        return []

    return results


def generate_rag_response(
    query_text, k=TOP_K_RESULTS, threshold=RELEVANCE_THRESHOLD, verbose=False
):
    results = db.similarity_search_with_relevance_scores(query_text, k=k)

    if len(results) == 0 or results[0][1] < threshold:
        return {
            "answer": "No relevant information found.",
            "sources": [],
            "context": "",
            "prompt": "",
        }

    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = PromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)

    if llm is None:
        return {
            "answer": "LLM not initialized.",
            "sources": [],
            "context": context_text,
            "prompt": prompt,
        }

    response_text = llm.invoke(prompt)
    sources = [doc.metadata.get("source", "Unknown") for doc, _score in results]

    if verbose:
        print(f"\nQuery: {query_text}")
        print(f"\nAnswer: {response_text}")
        print(f"\nSources: {', '.join([Path(s).name for s in sources])}")

    return {
        "answer": response_text,
        "sources": sources,
        "context": context_text,
        "prompt": prompt,
        "scores": [score for _, score in results],
    }


def ask(query_text):
    result = generate_rag_response(query_text, verbose=True)
    return result["answer"]

# 10. Load evaluation prompts

We load a list of test questions from a JSON file.
Each question is labeled with a category (e.g., summarization, reasoning, or RAG).

In [12]:
with open(PROMPTS_FILE, "r") as f:
    prompts_data = json.load(f)

prompts = prompts_data["prompts"]
print(f"Loaded {len(prompts)} evaluation prompts")
print("\nCategories:")
for category in ["summarization", "reasoning", "rag"]:
    count = len([p for p in prompts if p["category"] == category])
    print(f"  - {category.title()}: {count} prompts")

Loaded 15 evaluation prompts

Categories:
  - Summarization: 5 prompts
  - Reasoning: 5 prompts
  - Rag: 5 prompts


# 11. Run automated evaluation

For each question, we generate an answer using the RAG system and print
both the model’s response and the expected answer (if provided).

In [13]:
results = []

tracker_inference = OfflineEmissionsTracker(country_iso_code=COUNTRY_ISO_CODE)
tracker_inference.start()

for p in prompts:
    question = p["prompt"]
    expected = p.get("expected_answer", None)
    print(f"\nTesting Prompt {p['id']}: {question}")

    result = generate_rag_response(question, verbose=False)
    answer = result["answer"]

    results.append(
        {
            "id": p["id"],
            "category": p["category"],
            "difficulty": p["difficulty"],
            "prompt": question,
            "answer": answer,
            "expected": expected,
            "context_used": len(result["context"]),
            "top_sources": result["sources"],
        }
    )

    print(f" Model Answer: {answer}")
    if expected:
        print(f" Expected: {expected}")

emissions_inference = tracker_inference.stop()

[codecarbon INFO @ 18:55:19] offline tracker init
[codecarbon INFO @ 18:55:20] [setup] RAM Tracking...
[codecarbon INFO @ 18:55:20] [setup] CPU Tracking...
 Windows OS detected: Please install Intel Power Gadget to measure CPU

[codecarbon INFO @ 18:55:23] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
[codecarbon INFO @ 18:55:23] [setup] GPU Tracking...
[codecarbon INFO @ 18:55:23] No GPU found.
[codecarbon INFO @ 18:55:23] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: global constant
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 18:55:23] >>> Tracker's metadata:
[codecarbon INFO @ 18:55:23]   Platform system: Windows-11-10.0.26200-SP0
[codecarbon INFO @ 18:55:23]   Python version: 3.12.10
[codecarbon INFO @ 18:55:23]   CodeCarbon version: 3.0.8
[codecarbon INFO @ 18:55:23]   Available RAM : 15.843 GB
[codecarbon INFO 


Testing Prompt 1: Summarize the main events during the Apollo 11 lunar landing in 3 sentences.


[codecarbon INFO @ 18:55:38] Energy consumed for RAM : 0.000042 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:55:38] Delta energy consumed for CPU with constant : 0.000251 kWh, power : 60.0 W
[codecarbon INFO @ 18:55:38] Energy consumed for All CPU : 0.000251 kWh
[codecarbon INFO @ 18:55:38] 0.000293 kWh of electricity and 0.000000 L of water were used since the beginning.
Token indices sequence length is longer than the specified maximum sequence length for this model (525 > 512). Running this sequence through the model will result in indexing errors


 Model Answer: The main events during the Apollo 11 lunar landing were: 1. The computer prevented an abort. 2. A complete set of recovery programs was incorporated into the software to eliminate lower priority tasks and re-establish the more important ones. 3. Armstrong collected a contingency soil sample using a sample bag on a stick. 4. Aldrin joined Armstrong on the surface. 5. The surface dust was described as "very fine-grained" and "almost like a

Testing Prompt 2: What were the main challenges Armstrong faced while landing the Eagle?
 Model Answer: Armstrong initially had some difficulties squeezing through the hatch with his portable life support system (PLSS).

Testing Prompt 3: Describe the activities the astronauts performed on the lunar surface.


[codecarbon INFO @ 18:55:53] Energy consumed for RAM : 0.000083 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:55:53] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:55:53] Energy consumed for All CPU : 0.000501 kWh
[codecarbon INFO @ 18:55:53] 0.000585 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 18:56:08] Energy consumed for RAM : 0.000125 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:56:08] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:56:08] Energy consumed for All CPU : 0.000751 kWh
[codecarbon INFO @ 18:56:08] 0.000877 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: The astronauts planted the Lunar Flag Assembly containing a flag of the United States on the lunar surface, in clear view of the TV camera. They also lifted film and two sample boxes containing 21.55 kilograms (47.5 lb) of lunar surface material to the LM hatch using a flat cable pulley device called the Lunar Equipment Conveyor (LEC).

Testing Prompt 4: Explain what scientific equipment the astronauts deployed on the Moon.
 Model Answer: The astronauts deployed the EASEP, which included a Passive Seismic Experiment Package used to measure moonquakes and a retroreflector array used for the lunar laser ranging experiment.

Testing Prompt 5: Compare the planned timeline for the lunar surface operations with what actually happened.
 Model Answer: No relevant information found.

Testing Prompt 6: Why did the computer alarms (1201 and 1202) occur during the descent?


[codecarbon INFO @ 18:56:23] Energy consumed for RAM : 0.000167 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:56:23] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:56:23] Energy consumed for All CPU : 0.001001 kWh
[codecarbon INFO @ 18:56:23] 0.001168 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: The computer alarms (1201 and 1202) occurred during the descent to indicate "executive overflows", meaning the guidance computer could not complete all its tasks in real-time and had to postpone some of them.

Testing Prompt 7: What would have happened if Armstrong had not taken manual control during the landing?


[codecarbon INFO @ 18:56:38] Energy consumed for RAM : 0.000208 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:56:38] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:56:38] Energy consumed for All CPU : 0.001252 kWh
[codecarbon INFO @ 18:56:38] 0.001460 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: The context does not provide information about what would have happened if Armstrong had not taken manual control during the landing.

Testing Prompt 8: Why did Armstrong's famous quote become controversial?


[codecarbon INFO @ 18:56:53] Energy consumed for RAM : 0.000250 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:56:53] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:56:53] Energy consumed for All CPU : 0.001501 kWh
[codecarbon INFO @ 18:56:53] 0.001751 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: Armstrong's famous quote became controversial because the word "a" was not audible in the transmission and was not initially reported by most observers of the live broadcast.

Testing Prompt 9: Analyze how the fuel situation during landing reflects the risk management challenges of the mission.


[codecarbon INFO @ 18:57:08] Energy consumed for RAM : 0.000292 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:57:08] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:57:08] Energy consumed for All CPU : 0.001751 kWh
[codecarbon INFO @ 18:57:08] 0.002043 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: The fuel situation during landing reflects the risk management challenges of the mission because the LM guidance computer (LGC) distracted the crew with the first of several unexpected 1201 and 1202 program alarms.

Testing Prompt 10: Based on the text, what does Margaret Hamilton's statement reveal about the Apollo Guidance Computer's design philosophy?
 Model Answer: Margaret Hamilton's statement reveals that the Apollo Guidance Computer was programmed to do more than recognize error conditions and incorporated a complete set of recovery programs to eliminate lower priority tasks and re-establish the more important ones.

Testing Prompt 11: At what time (UTC) did Eagle land on the Moon?


[codecarbon INFO @ 18:57:23] Energy consumed for RAM : 0.000333 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:57:23] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:57:23] Energy consumed for All CPU : 0.002001 kWh
[codecarbon INFO @ 18:57:23] 0.002335 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 18:57:23] 0.011083 g.CO2eq/s mean an estimation of 349.514354629282 kg.CO2eq/year


 Model Answer: Eagle landed at 20:17:40 UTC on Sunday July 20.
 Expected: 20:17:40 UTC on July 20

Testing Prompt 12: How much lunar material did the astronauts collect?
 Model Answer: The astronauts collected 21.55 kilograms (47.5 lb) of lunar surface material.
 Expected: 21.55 kilograms (47.5 lb)

Testing Prompt 13: What was Armstrong's famous first words when stepping on the Moon?


[codecarbon INFO @ 18:57:38] Energy consumed for RAM : 0.000375 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:57:38] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:57:38] Energy consumed for All CPU : 0.002251 kWh
[codecarbon INFO @ 18:57:38] 0.002626 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: Armstrong's famous first words when stepping on the Moon were "Here men from the planet Earth first set foot upon the Moon July 1969, A. D. We came in peace for all mankind."
 Expected: That's one small step for [a] man, one giant leap for mankind

Testing Prompt 14: What scientific instruments were included in the EASEP package?
 Model Answer: No relevant information found.
 Expected: Passive Seismic Experiment Package and retroreflector array

Testing Prompt 15: How much usable fuel remained when Eagle landed, and how many seconds of powered flight did this represent?


[codecarbon INFO @ 18:57:53] Energy consumed for RAM : 0.000417 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:57:53] Delta energy consumed for CPU with constant : 0.000250 kWh, power : 60.0 W
[codecarbon INFO @ 18:57:53] Energy consumed for All CPU : 0.002501 kWh
[codecarbon INFO @ 18:57:53] 0.002918 kWh of electricity and 0.000000 L of water were used since the beginning.
[codecarbon INFO @ 18:57:56] Energy consumed for RAM : 0.000425 kWh. RAM Power : 10.0 W
[codecarbon INFO @ 18:57:56] Delta energy consumed for CPU with constant : 0.000053 kWh, power : 60.0 W
[codecarbon INFO @ 18:57:56] Energy consumed for All CPU : 0.002554 kWh
[codecarbon INFO @ 18:57:56] 0.002980 kWh of electricity and 0.000000 L of water were used since the beginning.


 Model Answer: The LM had enough fuel for another 25 seconds of powered flight before an abort without touchdown would have become unsafe, but post-mission analysis showed that the real figure was probably closer to 50 seconds.
 Expected: 216 pounds (98 kg); about 25 seconds according to initial estimates, but post-mission analysis showed closer to 50 seconds


# 12. Calculate Total Emissions

We calculate the total emissions from our model including, model loading, embedding creation and query emissions.

In [14]:
total_emissions = emissions_loading + emissions_embeddings + emissions_inference

print(f"\nModel Loading:        {emissions_loading:.6f} kg CO2")
print(f"Embeddings Creation:  {emissions_embeddings:.6f} kg CO2")
print(f"15 queries: {emissions_inference:.6f} kg CO2")
print(f"TOTAL EMISSIONS:      {total_emissions:.6f} kg CO2")
print(f"Equivalent to:        {total_emissions * 1000:.2f} g CO2")
print(f"  - Per query average: {emissions_inference/len(prompts):.6f} kg CO2")


Model Loading:        0.000047 kg CO2
Embeddings Creation:  0.000055 kg CO2
15 queries: 0.001699 kg CO2
TOTAL EMISSIONS:      0.001801 kg CO2
Equivalent to:        1.80 g CO2
  - Per query average: 0.000113 kg CO2
