# AI tests: indicator suggestions. 
This notebook displays initial tests for an AI indicator suggestion system. The goal is to evaluate how well an AI model can suggest relevant indicators based on given data or context.  
The process involves providing sample data and instructions to an AI agent, and analyzing the responses for accuracy and relevance.

## 1. Setup. 
Import necessary libraries and load environment variables.

In [3]:
import json
import os

import dotenv
from langchain import tools
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
from typing_extensions import TypedDict


In [None]:
# set openai api key as environment variable
dotenv.load_dotenv(dotenv.find_dotenv())


True

## 2. Simple test.  
Make a small test with mock data and just passing it to the LLM directly.

In [7]:
test_data = [
    {
        "indicator_id": "deforestation_rate",
        "region": "Amazonas",
        "year": 2022,
        "description": "Rate of deforestation in square kilometers",
    },
    {
        "indicator_id": "biodiversity_index",
        "region": "Amazonas",
        "year": 2022,
        "description": "Biodiversity index score indicating species richness and rarity",
    },
    {
        "indicator_id": "carbon_emissions",
        "region": "Amazonas",
        "year": 2022,
        "description": "Total carbon emissions from deforestation in metric tons",
    },
    {
        "indicator_id": "forest_cover",
        "region": "Amazonas",
        "year": 2022,
        "description": "Total forest cover in square kilometers",
    },
    {
        "indicator_id": "carbon_sequestration",
        "region": "Amazonas",
        "year": 2022,
        "description": "Total carbon sequestration from forested areas in metric tons",
    },
    {"indicator_id": "fire_severity",
        "region": "Amazonas",
        "year": 2022,
        "description": "Average fire severity index for wildfires"
    },
    {"indicator_id": "burned_area_1990_2020",
        "region": "Amazonas",
        "year": 2022,
        "description": "Total area burned from 1990 to 2020 in square kilometers"
    },
    {"indicator_id": "protected_species_count",
        "region": "Amazonas",
        "year": 2022,
        "description": "Number of protected species in the region"
    }
]

test_data_json = json.dumps(test_data, indent=2)

In [8]:
system_template = f"""
You are an intelligent assistant that recomends indicators from the provided dataset list
 based on user's interest stated on the prompt.

DATA:
{test_data_json}

Be accurate and only use the information from the dataset above.
Provide as response just a list of indicator_ids.
"""



Set up a simple model and invoke it with a prompt prompt to check basic functionality.

In [None]:
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, max_tokens=7000)


In [7]:
prompt = "I want to monitor the effect of tree cover loss in the Amazonas region."
response = model.invoke([("system", system_template), ("user", prompt)])
print("Recommended Indicators:", response.content)

Recommended Indicators: ["deforestation_rate", "carbon_emissions", "forest_cover", "carbon_sequestration", "burned_area_1990_2020"]


## 3. Agent test with real data.
Make a test using an AI agent storing the instructions, to see if it improves results.  
Get the real list of indicators by parsin the indicators.json file to simplify it.  
Improve by adding capabilities later.

### 3.1 Load and prepare real indicator data


In [13]:
##Â Load full indicators data
indicators_path = '../../client/datum/indicators.json'
indicator_data = json.load(open(indicators_path, 'r'))


In [None]:
# Create a cleaned list of indicators with only name and short description in English
indicator_list_clean = []
for indicator in indicator_data:
    clean_indicator = {
        'name': indicator.get('name_en', ''),
        'description': indicator.get('description_short_en', '')
    }
    indicator_list_clean.append(clean_indicator)
print(f'Total indicators: {len(indicator_list_clean)}')
indicator_list_json = json.dumps(indicator_list_clean, indent=2)

Total indicators: 135


### 3.2 Set functions to process outputs. 
- To export a list of indicator ids from the AI response.  
- To get a dictionary with token usage.

In [22]:
class IndicatorSelection(TypedDict):
    """Represents the selection of indicators as a list."""
    indicator_ids: list[str]

In [17]:
def get_token_info(result):
    """ Extract token usage information from the agent result. """
    r = str(result['messages'][1])
    r = r.split('{')[3].split(',')
    token_dict = {}
    for item in r[:3]:
        k = item.split(':')[0].strip().replace("'", '')
        v = item.split(':')[1].strip().replace("'", '')
        token_dict[k] = v
    return token_dict

### 3.3 Set up agent with data and instructions.

In [18]:
system_template_full = f"""
You are an intelligent assistant that recomends indicators from the provided dataset list
 based on user's interest stated on the prompt. Take into account all the available indicators,
 based on the fields: 'name_en', 'description_en'.

DATA:
{indicator_list_json}

Be accurate and only use the information from the dataset above.
If the user's prompt is a sentence or text, return a list of indicator IDs.
If the user prompt is a list of indicators, infer the context and return a list of different
related indicators.
Recommend at least 3 indicators and at most 12 to match the user's interest.
If the prompt is not related to the data, respond with an empty list.
"""

In [19]:
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent_full = create_agent(model,
                     tools=[],
                     system_prompt=system_template_full,
                     response_format=IndicatorSelection)

### Full example

In [20]:
prompt = input("Enter your monitoring interest: ")
full_result = agent_full.invoke({'messages': [{"role": "user", "content": prompt}]})
print(prompt)
print(full_result['structured_response']['indicator_ids'])
print(get_token_info(full_result))

areas of biodiversity importance with deforestation and extractive activities or agriculture
['Protected Areas', 'Forest Cover Loss', 'Conservation Priority', 'Species Richness', 'Key Biodiversity Areas', 'Oil Extraction Areas', 'Croplands']
{'completion_tokens': '59', 'prompt_tokens': '6856', 'total_tokens': '6915'}
