# Chapter 08: LLMs integrating with Forecasting Models

Integrating LLMs with forecasting tools is a powerful strategy for enhancing decision-making across various domains. LLMs excel at interpreting complex datasets and generating actionable insights. When combined with forecasting models, this synergy allows for automatic forecast interpretation and action recommendations. While applications such as demand forecasting and machine degradation predictions are common, our focus will be on earthquake forecasting. We will outline practical steps for integrating LLMs with forecasting models, demonstrating how this combination can transform abstract concepts into actionable and insightful implementations.

In [1]:
from assets.tools.earthquake import count_earthquakes, query_earthquakes, USGeopoliticalSurveyEarthquakeAPI
from assets.tools.forecasting import forecast_earthquakes, get_regions
from pydantic import BaseModel, Field, EmailStr
from language_models.models.llm import OpenAILanguageModel
from language_models.agent import (
    Agent,
    Workflow,
    WorkflowLLMStep,
    WorkflowFunctionStep,
    OutputType,
    PromptingStrategy,
)
from language_models.tools import Tool, current_date
from language_models.proxy_client import ProxyClient
from language_models.settings import settings

In [2]:
proxy_client = ProxyClient(
    client_id=settings.CLIENT_ID,
    client_secret=settings.CLIENT_SECRET,
    auth_url=settings.AUTH_URL,
    api_base=settings.API_BASE,
)

## Use Case: Automating Earthquake Forecast Inquiries

Integrating LLMs with forecasting tools offers a transformative approach to managing inquiries about earthquake forecasts. Currently, handling these inquiries involves significant manual effort, with responses being drafted and sent individually. By automating this process, we can streamline operations and enhance efficiency. To automate responses to earthquake forecast inquiries effectively, we will design a workflow that integrates various steps, starting from the receipt of an email and ending in the generation of an email response, ensuring accurate and timely replies.

**Step 1: Extracting relevant information**

To handle the unstructured data in the email body, we first leverage an LLM. The LLM is tasked with extracting key details such as the regions and the forecasting horizon. To ensure that the regions are in the same format as the dataset, the LLM is tasked to provide the extracted region names in a format that matches the dataset’s spelling. This step is crucial as it transforms the free-form text into structured data.

In [3]:
get_regions_tool = Tool(
    function=get_regions,
    name="Get Valid Regions",
    description="Use this tool to access the valid regions that can be used for forecasting",
)

In [4]:
system_prompt = """You are tasked with responding to an inquiry about earthquake forecasts.

Use the tool called Get Valid Regions to validate and standardize the spelling of the region names.

Your response should include:
- horizon: Provide the forecasting horizon as specified in the inquiry.
- regions: List the names of the regions, ensuring that the spelling of each region follows the standardized format provided by the Get Valid Regions tool.

The email may contain misspellings or abbreviations."""

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model='gpt-4',
    max_tokens=250,
    temperature=0.2,
)

class Forecast(BaseModel):
    horizon: int = Field(3, description="The number of days to forecast for")
    regions: list[str] = Field(description="The regions to forecast for")

extract_regions = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{email.body}",
    prompt_variables=["email"],
    tools=[get_regions_tool],
    output_type=OutputType.OBJECT,
    output_schema=Forecast,
    prompting_strategy=PromptingStrategy.CHAIN_OF_THOUGHT,
    verbose=True,
)

extract_regions_step = WorkflowLLMStep(name="extract_regions", agent=extract_regions)

**Step 2: Forecasting earthquakes**

With standardized region names, the workflow runs the forecasting model to retrieve predictions for earthquake magnitudes and depths based on specified regions and forecasting horizons.

In [5]:
class ExtractedRegions(BaseModel):
    extract_regions: Forecast

class RegionForecast(BaseModel):
    region: str
    forecast: list[dict]

def forecast_earthquakes_for_regions(extract_regions: Forecast) -> list[dict]:
    forecasts = []
    for region in extract_regions.regions:
        forecast = forecast_earthquakes(region, extract_regions.horizon)
        forecasts.append(RegionForecast(region=region, forecast=forecast))
    return [forecast.model_dump() for forecast in forecasts]

forecast_earthquakes_step = WorkflowFunctionStep(name="forecast_earthquakes", inputs=ExtractedRegions, function=forecast_earthquakes_for_regions)

**Step 3: Writing the email**

Once the forecast data is obtained, the LLM integrates this information into a comprehensive response. The response includes the predicted magnitudes and depths, reasoning behind the forecasts, and any additional context that might be useful to the inquirer. This draft response is reviewed and then sent back to the inquirer via the Email Management System.

In [6]:
system_prompt = """Create an email regarding the earthquake forecasts based on the specified regions.

The email should include:
- details of the earthquake forecast
- whether the region is in danger
- actions that you recommend to take

Use the closing signature:
Best regards,
Earthquake Forecasting Team"""

prompt = """Respond to this email: {email.sender}

Forecast horizon: {extract_regions.horizon}

Forecasts:
{forecast_earthquakes}"""

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model='gpt-4',
    max_tokens=1000,
    temperature=0.2,
)

class Email(BaseModel):
    to: EmailStr
    subject: str
    body: str


create_email = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt=prompt,
    prompt_variables=["email", "extract_regions", "forecast_earthquakes"],
    tools=[get_regions_tool],
    output_type=OutputType.OBJECT,
    output_schema=Email,
    prompting_strategy=PromptingStrategy.SINGLE_COMPLETION,
    verbose=True,
)

create_email_step = WorkflowLLMStep(name="create_email", agent=create_email)

**Step 4: Creating the workflow**

Here, we define the input to structure the email details for our workflow. We then create the workflow, which manages earthquake forecast inquiries by extracting regions, generating forecasts, and creating email responses through a series of defined steps.

In [7]:
class Email(BaseModel):
    sender: EmailStr = Field(description="Person to send it to")
    subject: str = Field(description="Subject of the email")
    body: str = Field(description="Body of the email")

class WorkflowInput(BaseModel):
    email: Email

workflow = Workflow(
    name="Automate Earthquake Forecast Inquiries",
    description="Allows you to respond to inquiries/questions about earthquake forecasts",
    inputs=WorkflowInput,
    output="create_email",
    steps=[extract_regions_step, forecast_earthquakes_step, create_email_step],
    verbose=True,
)

**Step 5: Running the workflow**

Finally, we create an email with specified sender, subject, and body content. We then invoke the workflow using this email as input to process the inquiry and generate the appropriate output.

In [8]:
email = Email(
    sender="jennifer.smith@researchinstitute.org",
    subject="Request for 5-Day Earthquake Forecast for Selected Regions",
    body="""Dear Earthquake Forecasting Team,

I hope this email finds you well.

I am reaching out to request a 5-day earthquake forecast for the following regions:
- CA, USA
- JP
- GR

Could you please provide the predicted magnitudes and depths for these regions over the next five days? Additionally, any relevant context or insights related to these forecasts would be very helpful.

Thank you very much for your assistance.

Best regards,
Jennifer Smith
Research Scientist
Research Institute
jennifer.smith@researchinstitute.org""",
)

output = workflow.invoke({"email": email})

[1m[38;2;45;114;210mUse LLM[0m[1m[0m: extract_regions
[1m[38;2;236;154;60mInputs[0m[1m[0m: {'email': Email(sender='jennifer.smith@researchinstitute.org', subject='Request for 5-Day Earthquake Forecast for Selected Regions', body='Dear Earthquake Forecasting Team,\n\nI hope this email finds you well.\n\nI am reaching out to request a 5-day earthquake forecast for the following regions:\n- CA, USA\n- JP\n- GR\n\nCould you please provide the predicted magnitudes and depths for these regions over the next five days? Additionally, any relevant context or insights related to these forecasts would be very helpful.\n\nThank you very much for your assistance.\n\nBest regards,\nJennifer Smith\nResearch Scientist\nResearch Institute\njennifer.smith@researchinstitute.org')}
[1m[38;2;115;128;145mPrompt[0m[1m[0m: Dear Earthquake Forecasting Team,

I hope this email finds you well.

I am reaching out to request a 5-day earthquake forecast for the following regions:
- CA, USA
- JP
- GR


In [9]:
print(f"Email to: {output.output.to}")
print(f"Email subject: {output.output.subject}")
print(f"Email body: {output.output.body}")

Email to: jennifer.smith@researchinstitute.org
Email subject: Earthquake Forecasts for the Next 5 Days
Email body: Dear Jennifer,

We are writing to inform you about the earthquake forecasts for the next five days in California, Japan, and Greece.

In California, the magnitude forecast ranges from 1.29 to 1.32, with the depth forecast ranging from 7.05 to 7.33. These are relatively low magnitudes and depths, so the region is not in immediate danger. However, we recommend monitoring the situation closely.

In Japan, the magnitude forecast ranges from 4.52 to 4.75, with the depth forecast ranging from 62.89 to 85.62. These are significantly higher magnitudes and depths, indicating a potential risk. We recommend implementing earthquake preparedness measures and alerting the public to the potential for seismic activity.

In Greece, the magnitude forecast ranges from 4.43 to 4.85, with the depth forecast ranging from 25.02 to 32.55. These forecasts also suggest a potential risk. We recommen

You can view the entire workflow in detail by rendering each step sequentially, allowing you to examine the inputs, outputs, and decision points at every stage of the process.

In [10]:
for i, step in enumerate(output.steps):
    if step.name == "llm":
        names = {
            "system_prompt": "System Prompt",
            "prompting_strategy": "Prompting Strategy",
            "prompt": "Prompt",
            "raw_output": "Raw Output",
            "observation": "Observation",
            "final_answer": "Final Answer",
            "tool_use": "Tool Use",
            "tool_output": "Tool Output",
        }
        string = "Use LLM"
        print(string)
        print("=" * len(string))
        for entry in step.steps:
            print(names[entry.name])
            print("-" * len(entry.name))
            if entry.name == "tool_use":
                print(f"Thought: {entry.content.thought}")
                print()
                print(f"Used: {entry.content.used}")
                print()
                print(f"Arguments: {entry.content.arguments}")
            elif entry.name == 'final_answer':
                if entry.content.thought is not None:
                    print(f"Thought: {entry.content.thought}")
                    print()
                print(f"Output: {entry.content.output}")
            else:
                print(entry.content)

            if entry.name != "final_answer" or i != len(output.steps) - 1:
                print()
    else:
        string = "Use Function"
        print(string)
        print("=" * len(string))
        for entry in step.steps:
            if entry.name == "inputs":
                string = "Inputs"
                print(string)
                print("-" * len(string))
                for argument, value in entry.content.items():
                    print(f"{argument}: {value}")
                    print()
            else:
                string = "Output"
                print(string)
                print("-" * len(string))
                print(entry.content)
                if i != len(output.steps) - 1:
                    print()

Use LLM
System Prompt
-------------
You are tasked with responding to an inquiry about earthquake forecasts.

Use the tool called Get Valid Regions to validate and standardize the spelling of the region names.

Your response should include:
- horizon: Provide the forecasting horizon as specified in the inquiry.
- regions: List the names of the regions, ensuring that the spelling of each region follows the standardized format provided by the Get Valid Regions tool.

The email may contain misspellings or abbreviations.

Prompting Strategy
------------------
chain_of_thought

Prompt
------
Dear Earthquake Forecasting Team,

I hope this email finds you well.

I am reaching out to request a 5-day earthquake forecast for the following regions:
- CA, USA
- JP
- GR

Could you please provide the predicted magnitudes and depths for these regions over the next five days? Additionally, any relevant context or insights related to these forecasts would be very helpful.

Thank you very much for your 

The next steps involve choosing the optimal path for advancing our workflow. If our goal is to streamline operations and improve efficiency, we could deploy the current workflow and automate it to trigger whenever a new email arrives, ensuring a seamless and timely response. On the other hand, if we aim to develop a more interactive solution, we could transform this workflow into a tool that an AI agent can use. This would involve integrating additional tools to enable the agent to manage a range of tasks, allowing for conversational interactions.

In [11]:
query_earthquakes_tool = Tool(
    function=query_earthquakes,
    name="Query Earthquakes",
    description="Use this tool to search recent earthquakes",
    args_schema=USGeopoliticalSurveyEarthquakeAPI,
)

count_earthquakes_tool = Tool(
    function=count_earthquakes,
    name="Count Earthquakes",
    description="Use this tool to count and aggregate recent earthquakes",
    args_schema=USGeopoliticalSurveyEarthquakeAPI,
)

And if we need to ask questions about forecasts, we can easily build a tool that the LLM can utilize for generating and retrieving accurate predictions.

In [12]:
forecast_earthquakes_tool = Tool(
    function=forecast_earthquakes_for_regions,
    name="Forecast Earthquakes",
    description="Use this tool to run region based forecasts",
    args_schema=ExtractedRegions,
)

In [13]:
system_prompt = """You are an United States Geological Survey expert who can answer questions regarding earthquakes.

Additionally, you can run forecasts. Before you use the Forecast Earthquakes tool, always check which regions are available using Get Valid Regions first."""

agent = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{question}",
    prompt_variables=["question"],
    output_type=OutputType.STRING,
    tools=[
        current_date,
        count_earthquakes_tool,
        query_earthquakes_tool,
        get_regions_tool,
        forecast_earthquakes_tool,
        workflow.as_tool()
    ],
    prompting_strategy=PromptingStrategy.CHAIN_OF_THOUGHT,
    verbose=True,
)

In [14]:
prompt = """I received this email:

Sender: exampleuser@example.com
Subject: Request for 5-Day Earthquake Forecast for Selected Regions

Dear Earthquake Forecasting Team,

I hope this message finds you well.

I am writing to request a 5-day earthquake forecast for the following regions:

1. **California (CA)**
2. **Japan (JP)**
3. **Greece (GR)**

Could you please provide the predicted magnitudes and depths for these regions over the next five days? Any additional context or insights related to these forecasts would be greatly appreciated.

Thank you for your assistance.

Best regards,
[Your Name]
exampleuser@example.com

Can you reply to this email."""

output = agent.invoke({"question": prompt})

[1m[38;2;45;114;210mThought[0m[1m[0m: The user is requesting a 5-day earthquake forecast for California, Japan, and Greece. To fulfill this request, I need to use the Forecast Earthquakes tool. However, before I can do that, I need to confirm that these regions are valid for forecasting using the Get Valid Regions tool.
[1m[38;2;236;154;60mTool[0m[1m[0m: Get Valid Regions
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: {'Oregon', 'California', 'Puerto Rico', 'Japan', 'Aleutian Islands', 'Hawaii', 'Chile', 'Tonga', 'Indonesia', 'Idaho', 'Greece', 'Wyoming', 'Papua New Guinea', 'Utah', 'Mexico', 'Russia', 'Baja California', 'Alaska', 'Washington', 'Oklahoma', 'Italy', 'Turkey', 'Nevada', 'Philippines', 'Montana'}
[1m[38;2;45;114;210mThought[0m[1m[0m: The regions requested by the user (California, Japan, and Greece) are valid for forecasting as they are included in the output of the Get Valid Regions tool. Now, I can proceed t

In [15]:
print(output.final_answer)

The email has been successfully generated and is ready to be sent to the user. The email includes the 5-day earthquake forecasts for the regions California, Japan, and Greece, along with some additional context and recommendations. The user will be informed that these are forecasts and actual seismic activity may vary. They will also be advised to stay prepared and have a plan in place in case of an earthquake.


Or we can ask the AI agent to answer different questions with the help of the tools the LLM has available.

In [16]:
output = agent.invoke({"question": "How many earthquakes occurred today?"})

[1m[38;2;45;114;210mThought[0m[1m[0m: To find out how many earthquakes occurred today, I need to use the Count Earthquakes tool. However, I first need to get the current date using the Current Date tool.
[1m[38;2;236;154;60mTool[0m[1m[0m: Current Date
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: 2024-08-15 13:28:03.803246
[1m[38;2;45;114;210mThought[0m[1m[0m: Now that I have the current date, I can use the Count Earthquakes tool to find out how many earthquakes occurred today. I will set the start_time to the beginning of today (2024-08-15 00:00:00) and the end_time to the current time (2024-08-15 13:28:03).
[1m[38;2;236;154;60mTool[0m[1m[0m: Count Earthquakes
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {'start_time': '2024-08-15T00:00:00', 'end_time': '2024-08-15T13:28:03'}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: {'count': 113, 'maxAllowed': 20000}
[1m[38;2;45;114;210mThought[0m[1m[0m: The Count Earth

In [17]:
print(output.final_answer)

There have been 113 earthquakes today.


In [18]:
output = agent.invoke({"question": "Will there be earthquakes in Alaska and Idaho tomorrow?"})

[1m[38;2;45;114;210mThought[0m[1m[0m: The user is requesting a 1-day earthquake forecast for Alaska and Idaho. To fulfill this request, I need to use the Forecast Earthquakes tool. However, before I can do that, I need to confirm that these regions are valid for forecasting using the Get Valid Regions tool.
[1m[38;2;236;154;60mTool[0m[1m[0m: Get Valid Regions
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: {'Oregon', 'California', 'Puerto Rico', 'Japan', 'Aleutian Islands', 'Hawaii', 'Chile', 'Tonga', 'Indonesia', 'Idaho', 'Greece', 'Wyoming', 'Papua New Guinea', 'Utah', 'Mexico', 'Russia', 'Baja California', 'Alaska', 'Washington', 'Oklahoma', 'Italy', 'Turkey', 'Nevada', 'Philippines', 'Montana'}
[1m[38;2;45;114;210mThought[0m[1m[0m: The regions requested by the user (Alaska and Idaho) are valid for forecasting as they are included in the output of the Get Valid Regions tool. Now, I can proceed to use the Forecast Earthqu

In [19]:
print(output.final_answer)

Yes, there are forecasts for potential earthquakes in both Alaska and Idaho tomorrow. 

In Alaska, the forecasted magnitude is approximately 1.10 with a depth of approximately 25.87 km. 

In Idaho, the forecasted magnitude is approximately 1.67 with a depth of approximately 8.33 km. 

Please note that these are forecasts and actual events may vary. It is always important to stay prepared and have a plan in place in case of an earthquake.
