# Climat change assistant

This notebook shows how to set up a Langchain LECL chain to answer user questions about climate in their area and provide assistance with related documentation from the US Federal Emergency Management Agency (FEMA).

# Setup

1. Get a [Probable futures API Key](https://docs.probablefutures.org/api-access/)
2. See [README](./README) to set up a conda environment and `.env` file

In [107]:
import nest_asyncio
import os
import openai
from dotenv import load_dotenv
import json
import os
import time
import pandas as pd
import openpyxl
import shutil
import requests

from langchain_community.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
#from langchain_chroma import Chroma
from langchain.vectorstores.chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain import hub
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel
from langchain_core.prompts import PromptTemplate

from langchain_core.messages import AIMessage
from langchain_core.runnables import (
    Runnable,
    RunnableLambda,
    RunnableMap,
    RunnablePassthrough,
)
from langchain.tools import Tool

from deepeval.metrics import AnswerRelevancyMetric, SummarizationMetric, FaithfulnessMetric, GEval
from deepeval.test_case import LLMTestCaseParams

from chromadb.errors import InvalidDimensionException

from deepeval import evaluate
from deepeval.metrics.ragas import RagasMetric
from deepeval.test_case import LLMTestCase
import deepeval

import assistant_tools as at

nest_asyncio.apply()

# Load environment variables from .env file
load_dotenv()

# Initialize the OpenAI API client
openai.api_key = os.getenv("OPENAI_API_KEY")

DATA_DIR = os.getenv("DATA_DIR")
DB_DIR = os.getenv("DB_DIR")
DB_COLLECTION_NAME = os.getenv("DB_COLLECTION_NAME")
CHAT_MODEL = os.getenv("OPENAI_CHAT_MODEL")
EMBEDDING_MODEL = os.getenv("OPENAI_EMBEDDING_MODEL")
DEEPEVAL_API_KEY = os.getenv("DEEPEVAL_API_KEY")
PDF_DIR = f"{DATA_DIR}/pdfs"

os.makedirs(PDF_DIR, exist_ok=True)
os.makedirs(DB_DIR, exist_ok=True)


embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL)

# Analysis

## Indexing FEMA Disaster preparedness documents

### Get FEMA PDF documents

In [5]:
df = pd.read_csv(f"{DATA_DIR}/fema_docs.csv")
display(df)

Unnamed: 0,Source,URL,Extra instructions,Document
0,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
1,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
2,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
3,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
4,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
5,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
6,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/docum...
7,FEMA,https://www.fema.gov/emergency-managers/risk-m...,"Selected ""Protect my home from natural hazards""",https://www.fema.gov/sites/default/files/2020-...
8,FEMA,https://www.fema.gov/emergency-managers/indivi...,,https://www.fema.gov/sites/default/files/2020-...
9,FEMA,https://www.fema.gov/emergency-managers/indivi...,,https://www.fema.gov/sites/default/files/2020-...


### Build Vector Database

First we will build a FEMA RAG chain for asnwering questions about preparing for disasters, using FEMA PDFs.

In [8]:
# Download all documents as defined in 'Documents' column
for doc_url in df["Document"]:
    print(f"Downloading {doc_url}")
    response = requests.get(doc_url)
    with open(f"{PDF_DIR}/{doc_url.split('/')[-1]}", "wb") as f:
        f.write(response.content)


Downloading https://www.fema.gov/sites/default/files/2020-10/fema_scenario_1-active_shooter-01102020.pdf
Downloading https://www.fema.gov/sites/default/files/2020-10/fema_scenario_1_active_shooter_TTX_answer_key-01102020.pdf
Downloading https://www.fema.gov/sites/default/files/2020-11/fema_protect-your-property_coastal-erosion.pdf
Downloading https://www.fema.gov/sites/default/files/2020-11/fema_protect-your-property_earthquakes.pdf
Downloading https://www.fema.gov/sites/default/files/2020-11/fema_protect-your-home_flooding.pdf
Downloading https://www.fema.gov/sites/default/files/2020-11/fema_protect-your-property_severe-wind.pdf
Downloading https://www.fema.gov/sites/default/files/documents/fema_protect-your-property-storm-surge.pdf
Downloading https://www.fema.gov/sites/default/files/2020-11/fema_protect-your-property_wildfire.pdf
Downloading https://www.fema.gov/sites/default/files/2020-10/fema_scenario_2_tornado-01102020.pdf
Downloading https://www.fema.gov/sites/default/files/2020

### Index documents

We will use a very simple parser and chunking methodology to ingest documents for this demo.

In [2]:
# Load the PDFs
docs = []
for pdf_file in os.listdir(PDF_DIR):
    if not pdf_file.endswith(".pdf"):
        continue
    print(f"Loading PDF: {pdf_file}")
    file_path = f"{PDF_DIR}/{pdf_file}"
    loader = PyPDFLoader(file_path)
    docs = docs + loader.load()
    print(f"Loaded {len(docs)} documents")

print(len(docs))

Loading PDF: fema_scenario_10_power_outage_answer_key_01102020.pdf
Loaded 2 documents
Loading PDF: fema_scenario_7-shelter_in_place_TTX_answer_key_01102020.pdf
Loaded 5 documents
Loading PDF: ready_12-ways-to-prepare_postcard.pdf
Loaded 7 documents
Loading PDF: fema_safeguard-critical-documents-and-valuables.pdf
Loaded 10 documents
Loading PDF: ready_document-and-insure-your-property.pdf
Loaded 16 documents
Loading PDF: fema_scenario_1-active_shooter-01102020.pdf
Loaded 18 documents
Loading PDF: fema_protect-your-property_wildfire.pdf
Loaded 26 documents
Loading PDF: fema_scenario_4-hurricane-01102020.pdf
Loaded 27 documents
Loading PDF: fema_scenario_10_power_outage_01102020.pdf
Loaded 28 documents
Loading PDF: fema_scenario_4_hurricane_flood_TTX_answer_key-01102020.pdf
Loaded 30 documents
Loading PDF: fema_scenario_11_winter_storm_01102020.pdf
Loaded 32 documents
Loading PDF: fema_protect-your-property_severe-wind.pdf
Loaded 44 documents
Loading PDF: fema_protect-your-property-storm-

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

In [4]:
if os.path.exists(DB_DIR):
    print("Removing existing DB directory")
    shutil.rmtree(DB_DIR)

print(f"Creating vector store and saving to {DB_DIR}")
vector_store = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=DB_DIR,
    collection_name=DB_COLLECTION_NAME
)

Removing existing DB directory
Creating vector store and saving to ./db


### Base retriever

In [3]:
vector_store = Chroma(persist_directory=DB_DIR, embedding_function=embeddings, collection_name=DB_COLLECTION_NAME)
retriever = vector_store.as_retriever()

  warn_deprecated(


In [4]:
results = retriever.invoke("How can I prepare my house for a flood?")
for doc in results[0:3]:
    print("=====================================")
    print(doc.metadata)
    print(doc.page_content)

{'page': 3, 'source': './data/pdfs/fema_protect-your-property-storm-surge.pdf'}
your agent to  
get coverage.
 
 PREPARE OR  
UPDATE A LIST OF 
 YOUR HOME’S 
CONTENTSDocument your belongings. This will give you peace of mind 
and help with the insurance process if you need to file a claim. Consider documenting your home’s contents visually. You can either take photos of high-value items or walk through your home and videotape each room’s belongings.
STORE  
VALUABLESStore valuables and important documents above the BFE (preferably on an upper floor). Place them in waterproof or 
water-resistant containers. Also, make copies and store them 
online or offsite.
ELEVATE 
APPLIANCES AND 
UTILITIES ABOVE 
THE BFEKeep appliances and utilities such as water heaters, washers, dryers, and 
electric panels on higher 
floors. It can prevent them from getting damaged or ruined by flood water.
Talk to your floodplain 
manager about how high to elevate your utilities. Many coastal
{'page': 3, 'source

Output a few results, the text seems reasonable for the query.

### Build FEMA RAG Chain

In [150]:
def output_results(results):
    """
    Output the results of a question-answering task.

    Args:
        results (dict): The results of the question-answering task.

    Returns:
        None
    """
    print("Question:")
    print(results["query"])
    print("Answer:")
    print(results["result"])
    print("\nContext:")
    for doc in results["source_documents"]:
        print(doc.page_content)
        print("\n\n========\n")
    print("\n")

system_prompt = (
    "You are an assistant helping people prepare for disasters."
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. "
    "You ONLY answer using the context provided. "
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{question}\n{context}"),
    ]
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_llm = ChatOpenAI(model=CHAT_MODEL, temperature=0.0)

# # This returns the context and the response, but breaks tracing on deepeval
# I changed to 'output' so deepeval tracing works
rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | rag_llm
    | StrOutputParser()
)
fema_rag_chain = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(output=rag_chain_from_docs)



Let's test it ...

In [14]:
query = "What can I do to prepare my house for floods?"
results = fema_rag_chain.invoke(query)
print(results["output"],"\n\n")
for doc in results["context"]:
    print(doc.metadata)


To prepare your house for floods, you can take the following steps:

1. **Get Flood Insurance**: Most homeowners insurance policies don’t cover flood damage. Protect your investment by purchasing flood insurance for your home and its contents, even if you do not live in a high-risk flood zone. For more information, visit FloodSmart.gov or contact your insurance agent.

2. **Prepare or Update a List of Your Home’s Contents**: Document your belongings to help with the insurance process if you need to file a claim. Consider taking photos of high-value items or doing a video walkthrough of your home to document its contents.

3. **Store Valuables**: Store valuables and important documents above the Base Flood Elevation (BFE), preferably on an upper floor. Place them in waterproof or water-resistant containers. Make copies and store them online or offsite.

4. **Elevate Appliances and Utilities**: Keep appliances and utilities such as water heaters, washers, dryers, and electric panels on h

Great! That works well. Obviously a production RAG system may need more sophisticated parsing and chunking strategies, as well as refinement of the summarization prompt to ensure no key information is ommited, but this at least provides the correct content.

## Getting data from probable futures for a location

The [Probable futures](https://probablefutures.org/) API is a graph API, so I have created a few helper functions in [./assistant_tools.py](./assistant_tools.py) to authenticate and extract information for a location.

Let's do a quick test ...

In [2]:
resp = at.get_pf_data(address="Burlington", country="United States", warming_scenario="1.5")
print(resp)


        mutation {
            getDatasetStatistics(input: { 
        country: "United States"
        address: "Burlington"
                         warmingScenario: "1.5" 
                }) {
                datasetStatisticsResponses{
                    datasetId
                    midValue
                    name
                    unit
                    warmingScenario
                    latitude
                    longitude
                    info
                }
            }
        }
    
{
  "data": {
    "getDatasetStatistics": {
      "datasetStatisticsResponses": [
        {
          "datasetId": 40107,
          "midValue": "0.0",
          "name": "Days above 45\u00b0C (113\u00b0F)",
          "unit": "days",
          "warmingScenario": "1.5",
          "latitude": 44.4,
          "longitude": -73.2,
          "info": {}
        },
        {
          "datasetId": 40206,
          "midValue": "20.0",
          "name": "10 hottest nights",
          "unit":

### Building a chain to classify climate predictions

Let's build a chain to classify the results from Probable futures. Basically, we want to identify any areas where a person might want toi prepare, and use that to get corresponding documentation from FEMA.

For this we will create a simple 

In [50]:
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

@tool
def get_pf_info(query):
    """
    Get information about a location's predicted future climate.
    
    Args:
        query (str): The location to query.

    Returns:
        dict: The predicted future climate information.
    """
    resp = at.get_pf_data(address=query, country="United States", warming_scenario="1.5")
    return resp

get_pf_info.run({"query": "Burlington vermont"})


        mutation {
            getDatasetStatistics(input: { 
        country: "United States"
        address: "Burlington vermont"
                         warmingScenario: "1.5" 
                }) {
                datasetStatisticsResponses{
                    datasetId
                    midValue
                    name
                    unit
                    warmingScenario
                    latitude
                    longitude
                    info
                }
            }
        }
    


'{\n  "data": {\n    "getDatasetStatistics": {\n      "datasetStatisticsResponses": [\n        {\n          "datasetId": 40107,\n          "midValue": "0.0",\n          "name": "Days above 45\\u00b0C (113\\u00b0F)",\n          "unit": "days",\n          "warmingScenario": "1.5",\n          "latitude": 44.4,\n          "longitude": -73.2,\n          "info": {}\n        },\n        {\n          "datasetId": 40206,\n          "midValue": "20.0",\n          "name": "10 hottest nights",\n          "unit": "\\u00b0C",\n          "warmingScenario": "1.5",\n          "latitude": 44.4,\n          "longitude": -73.2,\n          "info": {}\n        },\n        {\n          "datasetId": 40207,\n          "midValue": "-5.0",\n          "name": "Average winter temperature",\n          "unit": "\\u00b0C",\n          "warmingScenario": "1.5",\n          "latitude": 44.4,\n          "longitude": -73.2,\n          "info": {}\n        },\n        {\n          "datasetId": 40601,\n          "midValue": "6

In [217]:
from langchain.agents import AgentExecutor, create_tool_calling_agent

tools = [
    get_pf_info,  
]


system_prompt = (
"""
================================ System Message ================================

You are a an AI assistant that analyzes output from a climate change AI.
You assess whether the data indicates extreme changes in climate.
Advise on exactly what types of events people should prepare for, be explicit: floods, drought, fire etc
Be conservative, only advise on the most extreme events:
   - A few days extra of an event per year is not extreme
   - A percentage likelihood increase less than 30% is not extreme
If you don't know the answer, say that you don't know.
You ONLY answer using the context provided.


============================= Messages Placeholder =============================

{chat_history}

================================ Human Message =================================

{input}

============================= Messages Placeholder =============================

{agent_scratchpad}
""")


prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

llm = ChatOpenAI(model=CHAT_MODEL, temperature=0.0)
agent = create_tool_calling_agent(llm, tools, prompt)
climate_api_agent = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [218]:
api_result = climate_api_agent.invoke(
    {
        "input": "How will climate change affect Burlington vermont"
    }
)
print(api_result["output"])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_pf_info` with `{'query': 'Burlington, Vermont'}`


[0m
        mutation {
            getDatasetStatistics(input: { 
        country: "United States"
        address: "Burlington, Vermont"
                         warmingScenario: "1.5" 
                }) {
                datasetStatisticsResponses{
                    datasetId
                    midValue
                    name
                    unit
                    warmingScenario
                    latitude
                    longitude
                    info
                }
            }
        }
    
[36;1m[1;3m{
  "data": {
    "getDatasetStatistics": {
      "datasetStatisticsResponses": [
        {
          "datasetId": 40107,
          "midValue": "0.0",
          "name": "Days above 45\u00b0C (113\u00b0F)",
          "unit": "days",
          "warmingScenario": "1.5",
          "latitude": 44.4,
          "longitude": -73.2,
    

In [219]:

system_prompt = (
    "You are an assistant helping people prepare for increased climate risks in their area"
    "Use the following information to generate hypothetical user questions about climate change."
    "If the information mentions twice as many storms and high precipitation, the question might be: How can I prepare for floods?"
    "If the information indicates a large increase in wildfires (> 20 days a year), then the question might be: What are the risks of wildfires in my area?"
    "If the information indicates increases in heat, the question might be: Is there a way I can prepare for extreme heat?"
    "Do not generate questions for every scenario, only those where the information says there will be a significant increase"
    "Anything with a small increase or no increase, do not generate a question for"
    "Changes less than 20 days a year are not significant"
    "Questions should always be about actions"
    "ONLY output the questions"
    "You ONLY answer using the context provided. "
    "\n\n"
    "{input}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

climate_advice_llm = ChatOpenAI(model=CHAT_MODEL, temperature=0.0)

advice_chain = (
    RunnablePassthrough.assign(input=(lambda x: x["api_agent_result"]))
    | prompt
    | climate_advice_llm
    | RunnableLambda(lambda x: {"output": x.content})
)

advice = advice_chain.invoke({
    "api_agent_result": api_result["output"]
})
print(advice)

{'output': 'How can I prepare for floods?'}


We have an llm step to take the user input and call the tool to get predicted climate change. Now let's add an llm to generate a documentation query.

In [220]:
from langchain_core.runnables.passthrough import RunnablePick

# It's a bit cumbersome, but we will pass through variables from each stage for potential use in the UI
full_chain = RunnableParallel({
    "user_question": RunnablePick("input"),
    "api_agent_result": climate_api_agent | RunnablePick("output"),             # API Agent to get predictions
}) | RunnableParallel({
    "user_question": RunnablePick("user_question"),
    "api_agent_result": RunnablePick("api_agent_result"),
    "advice_chain_result": advice_chain | RunnablePick("output"),               # Advice chain to generate RAG question
}) | RunnableParallel({
   "user_question": RunnablePick("user_question"),
   "api_agent_result": RunnablePick("api_agent_result"),
   "advice_chain_result": RunnablePick("advice_chain_result"),
   "fema_rag_chain_result": RunnablePick("api_agent_result") | fema_rag_chain,  # FEMA RAG chain to answer RAG question
}) 

In [221]:
result = full_chain.invoke({
    "input": "How will climate change affect Burlington vermont"
})




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_pf_info` with `{'query': 'Burlington, Vermont'}`


[0m
        mutation {
            getDatasetStatistics(input: { 
        country: "United States"
        address: "Burlington, Vermont"
                         warmingScenario: "1.5" 
                }) {
                datasetStatisticsResponses{
                    datasetId
                    midValue
                    name
                    unit
                    warmingScenario
                    latitude
                    longitude
                    info
                }
            }
        }
    
[36;1m[1;3m{
  "data": {
    "getDatasetStatistics": {
      "datasetStatisticsResponses": [
        {
          "datasetId": 40107,
          "midValue": "0.0",
          "name": "Days above 45\u00b0C (113\u00b0F)",
          "unit": "days",
          "warmingScenario": "1.5",
          "latitude": 44.4,
          "longitude": -73.2,
    

/Users/matthewharris/opt/miniconda3/envs/climate-bot/lib/python3.11/site-packages/pydantic/main.py:1111: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/


In [222]:
for key in ["user_question", "api_agent_result", "advice_chain_result"]:
    print(f"\n\n{key}:\n\n {result[key]}")

print("\n\nFEMA RAG Chain Result:\n\n")
print(result["fema_rag_chain_result"]["output"])
print("\n\nFEMA Further Reading:\n\n")
for doc in result["fema_rag_chain_result"]["context"]:
    print(doc.metadata)




user_question:

 How will climate change affect Burlington vermont


api_agent_result:

 Based on the data provided, here are the most extreme climate change impacts expected for Burlington, Vermont:

1. **Increased Frequency of Extreme Storms**:
   - The frequency of "1-in-100-year" storms is expected to double (2 times as frequent). This suggests a significant increase in the likelihood of severe storms, which could lead to flooding and infrastructure damage.

2. **Increased Annual Precipitation**:
   - There is an expected increase of 64 mm in total annual precipitation. While this alone is not extreme, combined with more frequent extreme storms, it could exacerbate flooding risks.

3. **Decrease in Snowy Days**:
   - A reduction of 10 snowy days per year is expected. This could impact winter sports, water supply from snowmelt, and ecosystems dependent on snow cover.

4. **Increase in Wildfire Danger Days**:
   - There will be an increase of 9 wildfire danger days per year. This s