### Install aixplain SDK

In [38]:
!pip install aixplain



Set API Key for Authentication

In [39]:
import os
os.environ["TEAM_API_KEY"] = "64958c6dd8750fb6b244d70fca8a2197d92f88f1fe1bad1a7686a4ab30f1d14b"

Import aixplain Modules

In [40]:
from aixplain.factories import AgentFactory, ModelFactory, IndexFactory
from aixplain.modules.model.record import Record

### Clone Climate Policy Dataset from GitHub

In [41]:
!git clone https://github.com/HUANGZHIHAO1994/GCCMPD-climate-policy-dataset.git

fatal: destination path 'GCCMPD-climate-policy-dataset' already exists and is not an empty directory.


Load and Filter Climate Policy Data

In [42]:
import pandas as pd

file_path = "/content/GCCMPD-climate-policy-dataset/code and files/data/ALL_POLICIES_EN.xlsx"
df = pd.read_excel(file_path)
df = df[df["Policy_Content_raw"].notnull()]
df = df[df["ISO_code"] == "USA"]  # USA policies only

Create Formatted Text Field for Each Policy

In [43]:
# === Create Text Column ===
df["text"] = df.apply(
    lambda row: f"""Policy: {row['Policy_raw']}
Year: {row['Year']}
Scope: {row['Scope']}
Region: {row['IPCC_Region']}
Country Code: {row['ISO_code']}
Content: {row['Policy_Content_raw']}""",
    axis=1
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["text"] = df.apply(


### Scrape and Index EPA Pages

In [44]:
for idx in IndexFactory.list(query="EPA Climate Policies")["results"]:
    idx.delete()

epa_index = IndexFactory.create( # create new index for EPA pages
    "EPA Climate Policies",
    "Index of EPA climate change regulations and guidance"
)

epa_urls = [ # list of EPA URLs to scrape
    "https://www.epa.gov/climate-change",
    "https://www.epa.gov/statelocalenergy/state-climate-policy",
    "https://www.epa.gov/ghgemissions",
    "https://www.epa.gov/international-cooperation/climate-partnerships",
    "https://www.epa.gov/climate-indicators",
    "https://www.epa.gov/climate-adaptation"
]

scraper = ModelFactory.get("66f423426eb563fa213a3531") # scrape each URL and index content

for url in epa_urls:
    try:
        result = scraper.run({"text": url})

        if result.data and isinstance(result.data, str) and result.data.strip():
            epa_index.upsert([Record(value=result.data)])
            print(f"✅ Indexed: {url}")
        else:
            print(f"⚠️ Skipped (no content returned): {url}")

    except Exception as e:
        print(f"❌ Failed on {url}: {e}")

ERROR:root:Error in request: Expecting value: line 1 column 1 (char 0)
ERROR:root:Error in request: 504: NetworkError [ErrorCode.AX_NET_ERROR]: Gateway timeout: Please try again later. Details: unspecified error


⚠️ Skipped (no content returned): https://www.epa.gov/climate-change
✅ Indexed: https://www.epa.gov/statelocalenergy/state-climate-policy
✅ Indexed: https://www.epa.gov/ghgemissions
✅ Indexed: https://www.epa.gov/international-cooperation/climate-partnerships
✅ Indexed: https://www.epa.gov/climate-indicators
✅ Indexed: https://www.epa.gov/climate-adaptation


### Save Full and Lightweight CSV Versions for SQL Tool

In [45]:
csv_path = "/content/climate_policies.csv"
df.to_csv(csv_path, index=False)

light_df = df[[
    'Policy_raw', 'Year', 'Scope', 'IPCC_Region',
    'ISO_code', 'Income_Group', 'WB_Region', 'Annex'
]]
light_csv_path = "/content/climate_metadata_light.csv"
light_df.to_csv(light_csv_path, index=False)

csv_tool = AgentFactory.create_sql_tool(
    name="climate_policy_db",
    source=light_csv_path,
    source_type="csv",
    enable_commit=False,
    description="Lightweight metadata for USA climate policies"
)



### Create SQL Tool

In [46]:
import re

def clean_html(raw_text): # clean HTML Tags from Content Text
    if not isinstance(raw_text, str):
        return ""
    return re.sub('<[^<]+?>', '', raw_text)

df["text"] = df.apply(
    lambda row: f"""Policy: {row['Policy_raw']}
Year: {row['Year']}
Scope: {row['Scope']}
Region: {row['IPCC_Region']}
Country Code: {row['ISO_code']}
Content: {clean_html(row['Policy_Content_raw'])}""",
    axis=1
)

### Index Saudi Policies into a Searchable RAG Index

In [47]:
from aixplain.modules.model.record import Record

for idx in IndexFactory.list(query="gccmpd-policy-index")["results"]:
    idx.delete()

# create the index
policy_index = IndexFactory.create(
    "gccmpd-policy-index",
    "Index of USA climate mitigation policies"
)

# prepare records
records = [Record(value=text) for text in df["text"].tolist()]
chunk_size = 10  # Small and safe

# upload in small chunks
for i in range(0, len(records), chunk_size):
    batch = records[i:i + chunk_size]
    try:
        policy_index.upsert(batch)
        print(f"Uploaded {min(i + chunk_size, len(records))} / {len(records)}")
    except Exception as e:
        print(f"Failed at chunk {i}-{i+chunk_size}: {e}")
        break

Uploaded 10 / 1703
Uploaded 20 / 1703
Uploaded 30 / 1703
Uploaded 40 / 1703
Uploaded 50 / 1703
Uploaded 60 / 1703
Uploaded 70 / 1703
Uploaded 80 / 1703
Uploaded 90 / 1703
Uploaded 100 / 1703
Uploaded 110 / 1703
Uploaded 120 / 1703
Uploaded 130 / 1703
Uploaded 140 / 1703
Uploaded 150 / 1703
Uploaded 160 / 1703
Uploaded 170 / 1703
Uploaded 180 / 1703
Uploaded 190 / 1703
Uploaded 200 / 1703
Uploaded 210 / 1703
Uploaded 220 / 1703
Uploaded 230 / 1703
Uploaded 240 / 1703
Uploaded 250 / 1703
Uploaded 260 / 1703
Uploaded 270 / 1703
Uploaded 280 / 1703
Uploaded 290 / 1703
Uploaded 300 / 1703
Uploaded 310 / 1703
Uploaded 320 / 1703
Uploaded 330 / 1703
Uploaded 340 / 1703
Uploaded 350 / 1703
Uploaded 360 / 1703
Uploaded 370 / 1703
Uploaded 380 / 1703
Uploaded 390 / 1703
Uploaded 400 / 1703
Uploaded 410 / 1703
Uploaded 420 / 1703
Uploaded 430 / 1703
Uploaded 440 / 1703
Uploaded 450 / 1703
Uploaded 460 / 1703
Uploaded 470 / 1703
Uploaded 480 / 1703
Uploaded 490 / 1703
Uploaded 500 / 1703
Uploaded 

### Create agent

In [48]:
agent = AgentFactory.create(
    name="Climate Policy Knowledge Agent",
    description="Answers questions based on United States Of America climate policies and EPA regulations.",
    instructions="""
Use the tools below to answer climate-related policy questions:
- Use `gccmpd-policy-index` to search U.S. climate policy text.
- Use `EPA Climate Policies` for U.S. EPA regulations.
- Use `climate_policy_db` to query U.S. metadata like year or scope.
""",
    tools=[
        AgentFactory.create_model_tool(model=policy_index.id),
        AgentFactory.create_model_tool(model=epa_index.id),
        csv_tool
    ]
)



### Example Queries Using the Agent

In [49]:
response = agent.run("What climate-related policies exist for U.S.?")
print(response.data.output)

The U.S. has several climate-related policies, including: 1. **Bilateral Climate and Energy Partnerships (2001)**: Focuses on international cooperation to address climate change. 2. **State and Local Climate and Energy Program (2015)**: Assists states in developing energy efficiency and renewable energy policies. 3. **Climate Finance Plan (2021)**: Aims to double public climate finance to developing countries. 4. **US CLIMATE Act of 2021**: Establishes a foreign assistance program for forest management. 5. **American Public Lands and Waters Climate Solution Act (2019)**: Studies methods to meet emission reduction targets. 6. **Executive Order on Climate-Related Financial Risk (2021)**: Addresses climate-related financial risks in federal investments. 7. **The President’s Climate Action Plan (2013)**: A comprehensive approach to cut carbon pollution and prepare for climate impacts. 8. **USDA Climate-Smart Program Changes (2021)**: Seeks public input on climate-smart agriculture practice

In [50]:
response = agent.run("How many national vs subnational climate policies does U.S. have?")
print(response.data.output)

The U.S. has a total of 1703 climate policies, but there are currently no national or subnational policies recorded.


In [51]:
response = agent.run("Are there any policies that mention 'carbon capture'?")
print(response.data.output)

There are several policies that mention 'carbon capture':
1. **Infrastructure and Jobs Act (2021)**: Allocates $12 billion for carbon capture, utilization, and storage technology.
2. **H.R. 5883 (2020)**: Amends the Internal Revenue Code to provide increased credits for carbon oxide sequestration for direct air capture facilities.
3. **Program to Capture and Store CO2 (2022)**: $3.5 billion funding for capturing and storing carbon dioxide pollution directly from the air.
4. **Carbon Capture Modernization Act (2019)**: Modifies tax credits for carbon capture and utilization systems.
5. **Creation of a Carbon Capture Regulatory Framework (SB 905, 2022)**: Establishes a program for evaluating carbon capture technologies in California.
6. **Clean Economy Jobs and Innovation Act (H.R. 4447, 2020)**: Requires the DOE to establish a program for large-scale carbon dioxide removal from the atmosphere.


In [52]:
response = agent.run("Summarize all U.S. climate policies in under 300 words.")
print(response.data.output)

The U.S. climate policies encompass a range of initiatives aimed at reducing greenhouse gas emissions, enhancing climate resilience, and promoting clean energy. Key policies include the President's Climate Action Plan (2013), which focuses on cutting carbon pollution, preparing for climate impacts, and leading international efforts. The 2021 Executive Order on Tackling the Climate Crisis aims for carbon neutrality by 2050 and establishes various task forces to address climate-related issues. The U.S. Climate Finance Plan (2021) seeks to double public climate finance to developing countries and triple adaptation finance by 2024. Additionally, the Long-Term Strategy outlines five transformations to achieve net-zero emissions by 2050, including decarbonizing electricity and reducing methane emissions. The State and Local Climate and Energy Program supports states in developing energy efficiency and renewable energy policies. Overall, these policies reflect a comprehensive approach to tack