# Lesson 3: Building an Agent Reasoning Loop

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## Load the data

To download this paper, below is the needed code:

#!wget "https://arxiv.org/pdf/2405.13063" -O AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

## Setup the Query Tools

In [4]:
from utils import get_doc_tools

vector_tool, summary_tool = get_doc_tools("AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf", "AURORA")

## Setup Function Calling Agent

In [5]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [6]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [8]:
response = agent.query(
    "What experimental results validate Aurora's performance in predicting air pollution?, "
    "How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?"
)

Added user message to memory: What experimental results validate Aurora's performance in predicting air pollution?, How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?
=== Calling Function ===
Calling function: vector_tool_AURORA with args: {"query": "experimental results validate Aurora's performance in predicting air pollution"}
=== Function Output ===
Experimental results validate Aurora's performance in predicting air pollution.
=== Calling Function ===
Calling function: vector_tool_AURORA with args: {"query": "How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?"}
=== Function Output ===
Aurora adapts to the non-stationary and scarce data from CAMS analysis by fine-tuning on CAMS analysis data from a limited temporal extent and incorporating lower resolution CAMS reanalysis data in the fine-tuning process.
=== LLM Response ===
The experimental results validate Aurora's performance in predicting air pollution. Aurora adapts to 

In [9]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 3
file_name: AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf
file_path: AURORA A FOUNDATION MODEL OF THE ATMOSPHERE.pdf
file_type: application/pdf
file_size: 9864973
creation_date: 2024-06-08
last_modified_date: 2024-06-08

Aurora: A Foundation Model of the Atmosphere
We pretrain Aurora on a vast body of weather and climate data. By including as much and as diverse data as possible,
Aurora attempts to learn a general-purpose representation of atmospheric dynamics. The pretraining data is a mixture of
six weather and climate datasets: ERA5, CMCC, IFS-HR, HRES Forecasts, GFS Analysis and GFS Forecasts. These
datasets are a mixture of forecasts, analysis data, reanalysis data, and climate simulations. The pretraining datasets
comprise a standard collection of meteorological variables (see Supplementary C for more details on the datasets).
During pretraining, we minimise the next–time step mean absolute error (MAE) with a 6-hour lead time for 150 k
steps on 32 A100 GPUs, which 

In [10]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_AURORA with args: {"input": "evaluation datasets used in Aurora"}
=== Function Output ===
The evaluation datasets used in Aurora consist of a wide range of meteorological datasets such as ERA5, CMCC, IFS-HR, HRES Forecasts, GFS Analysis, GFS Forecasts, CAMS analysis data, CAMS reanalysis data, CAMS analysis data with specific heuristic factors, 3D Perceiver encoder, Multi-scale 3D Swin Transformer U-Net backbone, 3D Perceiver decoder, Position, scale, level, and time encodings, Data normalization, Extensions for 0.1° weather forecasting, Extensions for air pollution forecasting, HRRR, CONUS404, and various other datasets like HRES-0.25, IFS-ENS-0.25, HRES-T0, HRES Analysis, IFS ENS, IFS ENS mean, GFS Forecast, GFS-T0 Analysis, GEFS Reforecast, CMCC-CM2-VHR4, ECMWF-IFS-HR, MERRA-2, CAMS, and CAMSRA. These datasets cover a diverse collection of weather variabl

In [11]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_AURORA with args: {"query": "results over CAMS analysis data"}
=== Function Output ===
Aurora is competitive with CAMS on 95% of all targets, and matches or outperforms CAMS on 74% of all targets. At the three-day mark, Aurora is competitive with CAMS on all variables and matches or outperforms CAMS on 86% of all variables. However, Aurora is outperformed by CAMS on ozone in the very upper atmosphere and the twelve-hour prediction of all species in the lower part of the atmosphere. This is primarily because variables near the surface and in the lower part of the atmosphere are mainly affected by anthropogenic factors, which Aurora does not explicitly account for.
=== LLM Response ===
Aurora is competitive with CAMS on 95% of all targets and matches or outperforms CAMS on 74% of all targets. At the three-day mark, Aurora is competitive with CAMS on all

## Lower-Level: Debuggability and Control

In [12]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [13]:
task = agent.create_task(
   "What experimental results validate Aurora's performance in predicting air pollution?, "
    "How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?"
)

In [14]:
step_output = agent.run_step(task.task_id)

Added user message to memory: What experimental results validate Aurora's performance in predicting air pollution?, How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?
=== Calling Function ===
Calling function: vector_tool_AURORA with args: {"query": "experimental results validate Aurora's performance in predicting air pollution"}
=== Function Output ===
Experimental results validate Aurora's performance in predicting air pollution.
=== Calling Function ===
Calling function: vector_tool_AURORA with args: {"query": "How does Aurora adapt to the non-stationary and scarce data from CAMS analysis?"}
=== Function Output ===
Aurora adapts to the non-stationary and scarce data from CAMS analysis by fine-tuning on CAMS analysis data from Oct 2017 to May 2022 and testing on CAMS analysis data from May 2022 to Nov 2022. Additionally, to better learn the dynamics of the pollutants, Aurora also incorporates lower resolution and CAMS reanalysis data from Jan 2003 to Dec 

In [15]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task 736f8b3d-47a3-4b93-a852-44ee4707a594: 1
Experimental results validate Aurora's performance in predicting air pollution.


In [16]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task 736f8b3d-47a3-4b93-a852-44ee4707a594: 1


TaskStep(task_id='736f8b3d-47a3-4b93-a852-44ee4707a594', step_id='05d173b1-b1be-4bcf-90ec-3ad5d4f4b4ff', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

In [17]:
step_output = agent.run_step(
    task.task_id, input="Explain the key tenets of foundation models mentioned in the introduction."
)

Added user message to memory: Explain the key tenets of foundation models mentioned in the introduction.
=== Calling Function ===
Calling function: summary_tool_AURORA with args: {"input": "key tenets of foundation models mentioned in the introduction"}
=== Function Output ===
The key tenets of foundation models mentioned in the introduction include pretraining on diverse datasets to capture intricate patterns, fine-tuning for excelling at new tasks with limited data, operating at different resolutions, showcasing superior forecasting skills, leveraging scaling properties of AI for improved accuracy, resolution, and adaptability, serving as few-shot learners, providing accurate medium-range global weather forecasting, utilizing a 3D Perceiver encoder and Multi-scale 3D Swin Transformer U-Net backbone for simulating physics, incorporating positional, scale, level, and time encodings for essential information, normalizing data for consistency, handling air pollution variables effectively

In [18]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
The key tenets of foundation models mentioned in the introduction include pretraining on diverse datasets, fine-tuning for new tasks with limited data, operating at different resolutions, showcasing superior forecasting skills, leveraging scaling properties of AI, serving as few-shot learners, providing accurate global weather forecasting, utilizing advanced encoders and backbones, incorporating essential encodings, normalizing data, handling air pollution variables effectively, utilizing various datasets, fine-tuning with varying learning rates, evaluating performance through metrics like RMSE and ACC, and assessing forecast quality through power spectra analysis.
True


In [19]:
response = agent.finalize_response(task.task_id)

In [20]:
print(str(response))

assistant: The key tenets of foundation models mentioned in the introduction include pretraining on diverse datasets, fine-tuning for new tasks with limited data, operating at different resolutions, showcasing superior forecasting skills, leveraging scaling properties of AI, serving as few-shot learners, providing accurate global weather forecasting, utilizing advanced encoders and backbones, incorporating essential encodings, normalizing data, handling air pollution variables effectively, utilizing various datasets, fine-tuning with varying learning rates, evaluating performance through metrics like RMSE and ACC, and assessing forecast quality through power spectra analysis.
