# Lesson 1: Router Engine

Welcome to Lesson 1.

To access the `requirements.txt` file, the data/pdf file required for this lesson and the `helper` and `utils` modules, please go to the `File` menu and select`Open...`.

I hope you enjoy this course!

## Setup

In [1]:
from helper import get_openai_api_key

OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio

nest_asyncio.apply()

## Load Data

To download this paper, below is the needed code:

#!wget "https://arxiv.org/pdf/2405.13063" -O AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf"]).load_data()

## Define LLM and Embedding model

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index over the Same Data

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

## Define Query Engines and Set Metadata

In [7]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [8]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to AURORA"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the AURORA A FOUNDATION MODEL OF THE ATMOSPHERE paper."
    ),
)

## Define Router Query Engine

In [9]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [10]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The document is likely to contain specific context from the AURORA A FOUNDATION MODEL OF THE ATMOSPHERE paper..
[0mThe document discusses the comparison of a model against operational weather forecasting systems at different resolutions, validation against weather station measurements, forecasting extreme events, specifically Storm Ciarán, and the fast prediction of atmospheric chemistry and air pollution. Additionally, it includes information on power spectra related to the atmosphere.


In [11]:
print(len(response.source_nodes))

2


In [13]:
response = query_engine.query(
    "How does Aurora handle the challenge of limited training data for specific atmospheric prediction tasks?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The choice related to AURORA is more likely to address how the model handles limited training data for specific atmospheric prediction tasks..
[0mAurora handles the challenge of limited training data for specific atmospheric prediction tasks by leveraging extensive pretraining on diverse weather and climate data to learn a general-purpose representation of atmospheric dynamics. It further fine-tunes its model in multiple stages, including roll-out fine-tuning, to adapt to new atmospheric prediction tasks. By scaling the model and pretraining it on a vast amount of atmospheric data, Aurora demonstrates improved forecasting compared to training on a single dataset. Additionally, Aurora incorporates historical data affected by natural and anthropogenic factors, implicitly learning to account for these effects without explicitly using emissions data as inputs. These strategies enable Aurora to overcome the limitations posed by limited training data

## Let's put everything together

In [14]:
from utils import get_router_query_engine

query_engine = get_router_query_engine("AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf")

In [15]:
response = query_engine.query("How does Aurora's ability to generate forecasts at different lead times contribute to its operational capabilities?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Useful for retrieving specific context from the MetaGPT paper..
[0mAurora's ability to generate forecasts at different lead times contributes to its operational capabilities by showing improved performance particularly at lead times past 3 days. This enhanced performance is most notable for upper atmospheric levels, with a reduction in RMSE up to 40%. The model excels in predicting variables like temperature, geopotential height, and wind velocity components, especially at longer lead times. Additionally, Aurora's forecasts closely resemble the ensemble mean at long lead times, showcasing its reliability for operational forecasting.
