In [1]:
from council.llm import LLMMessage, LLMBase
from council.llm.openai_llm import OpenAILLM
import dotenv
dotenv.load_dotenv()

True

In [2]:
llm = OpenAILLM.from_env()

gpt-4 may change over time. Returning num tokens assuming gpt-4-0613.


In [3]:
system_message = "You are an expert data scientist responsible for creating detailed data analysis plans."
main_prompt = "I want to identify which economic factors are the best predictors of food prices at an 18 month horizon. Can you help me with that?"

messages = [
    LLMMessage.system_message(system_message),
    LLMMessage.user_message(main_prompt),
]

response = llm.post_chat_request(messages).first_choice
print(response)

Absolutely, here's a detailed data analysis plan to identify the economic factors that best predict food prices at an 18 month horizon:

1. **Data Collection**: 
   - Collect historical data on food prices for at least the past 5 years. This data should include monthly prices of different food items.
   - Collect data on potential economic predictors such as inflation rates, GDP growth rates, unemployment rates, exchange rates, interest rates, agricultural production data, weather patterns, and commodity prices. 
   - Collect data on other potential factors such as government policies related to food and agriculture, global food supply and demand, and global crises (like pandemics or wars).

2. **Data Cleaning and Preprocessing**: 
   - Clean the data by handling missing values, outliers, and errors.
   - Normalize the data if necessary to bring all variables to the same scale.
   - Create a time-lagged dataset for the prediction horizon of 18 months.

3. **Exploratory Data Analysis (E

fred_chain = Chain(
            name="fred_data_specialist",
            description="Identify and access economic datasets from the Federal Reserve Economic Database (FRED). Use this chain when you need to generate or edit code for accessing or downloading data from FRED.",
            runners=[fred_skill, parse_python_skill]
        )

        data_scientist_chain = Chain(
            name="data_scientist",
            description="Write Python code related to data analysis and visualization, especially related to time series. Use this when you want to write code that will involve data analysis, manipulation, or plotting.",
            runners=[data_scientist_skill, parse_python_skill]
        )


In [4]:
messages = [
    LLMMessage.system_message(system_message),
    LLMMessage.user_message(main_prompt),
    LLMMessage.system_message(response),
    LLMMessage.user_message("""
    Turn this into an actionable plan that leverages two CHAINs:
    1. {"name": "fred_data_specialist", "description": "Identify and access economic datasets from the Federal Reserve Economic Database (FRED). Use this chain when you need to generate or edit code for accessing or downloading data from FRED."}
    2. {"name": "data_scientist", "description": "Write Python code related to data analysis and visualization, especially related to time series. Use this when you want to write code that will involve data analysis, manipulation, or plotting."}
    
    Format your response precisely as {name};{integer score between 0 and 10};{natural language message or instructions for the selected chain on a single line}
    """),
]

response = llm.post_chat_request(messages).first_choice
print(response)

fred_data_specialist;8;Identify and access historical data on food prices, inflation rates, GDP growth rates, unemployment rates, exchange rates, interest rates, agricultural production data, and commodity prices from the Federal Reserve Economic Database (FRED). Generate code for downloading this data for at least the past 5 years.

data_scientist;9;Once the data is downloaded, write Python code to clean and preprocess the data. This includes handling missing values, outliers, and errors, and normalizing the data if necessary. Create a time-lagged dataset for the prediction horizon of 18 months. Perform exploratory data analysis to understand the distribution, trends, and correlations among variables. Use visualizations to better understand the relationships among variables. 

data_scientist;9;Continue with feature selection and engineering. Identify the most relevant features using techniques like correlation analysis, mutual information, or machine learning feature importance. Engin

## Let's try that again but going straight to the actionable plan.

In [13]:
import toml 
from string import Template

work_dir="../src/python_agent"

fred_system_message = toml.load(
    f"{work_dir}/prompts/fred_data_specialist.toml"
)["system"]["prompt"]

fred_prompt_template = toml.load(f"{work_dir}/prompts/fred_data_specialist.toml")["main"][
        "prompt_template"
    ]


data_scientist_system_message = toml.load(
    f"{work_dir}/prompts/fred_data_scientist.toml"
)["system"]["prompt"]

data_scientist_prompt_template = toml.load(f"{work_dir}/prompts/fred_data_scientist.toml")["main"][
        "prompt_template"
    ]

In [17]:
### Without prompt templates

system_message = f"""You are an expert data scientist responsible for creating detailed data analysis plans.
First create a detailed plan.
Then distill it into an actionable plan consisting of very simple steps that leverages two CHAINs:
1. {{"name": "fred_data_specialist", "description": "Identify and access economic datasets from the Federal Reserve Economic Database (FRED). Use this chain when you need to generate or edit code for accessing or downloading data from FRED."}}
2. {{"name": "data_scientist", "description": "Write Python code related to data analysis and visualization, especially related to time series. Use this when you want to write code that will involve data analysis, manipulation, or plotting."}}
3. {{"name":"code_execution_chain", "description": "Execute the script. Use this to run Python code."}}
4. {{"name": "direct_to_user", "description": "Use this to use the message part of your response to just send a message directly to the user."}}
    
Format your response precisely as {{name}};{{natural language message or instructions for the selected chain on a single line}}"""

main_prompt = "I want to identify which economic factors are the best predictors of food prices at an 18 month horizon. Can you help me with that?"

messages = [
    LLMMessage.system_message(system_message),
    LLMMessage.user_message(main_prompt),
]

response = llm.post_chat_request(messages).first_choice
print(response)

Detailed Plan:
1. Identify relevant economic datasets from the Federal Reserve Economic Database (FRED) that could potentially influence food prices. These could include datasets on inflation, unemployment, GDP, interest rates, etc.
2. Access and download these datasets using the FRED API.
3. Load the datasets into a Python environment for analysis.
4. Clean and preprocess the data, ensuring it is in a suitable format for analysis.
5. Perform exploratory data analysis (EDA) to understand the data and identify any patterns or trends.
6. Create a time series model to predict food prices at an 18-month horizon.
7. Evaluate the model's performance using appropriate metrics.
8. Identify the most influential economic factors based on the model's feature importance.
9. Visualize the results and findings.
10. Communicate the findings to the user.

Actionable Plan:
1. {fred_data_specialist};Identify and access relevant economic datasets from FRED that could potentially influence food prices. Th

In [18]:
### With prompt templates

system_message = f"""You are an expert data scientist responsible for creating detailed data analysis plans.
First create a detailed plan.
Then distill it into an actionable plan consisting of very simple steps that leverages two CHAINs:
1. {{"name": "fred_data_specialist", "description": "Identify and access economic datasets from the Federal Reserve Economic Database (FRED). Use this chain when you need to generate or edit code for accessing or downloading data from FRED.", "prompt_template": {fred_system_message + fred_prompt_template}}}
2. {{"name": "data_scientist", "description": "Write Python code related to data analysis and visualization, especially related to time series. Use this when you want to write code that will involve data analysis, manipulation, or plotting.", "prompt_template": {data_scientist_system_message + data_scientist_prompt_template}}}
3. {{"name":"code_execution_chain", "description": "Execute the script. Use this to run Python code."}}
4. {{"name": "direct_to_user", "description": "Use this to use the message part of your response to just send a message directly to the user."}}
    
Format your response precisely as {{name}};{{natural language message or instructions for the selected chain on a single line}}"""

main_prompt = "I want to identify which economic factors are the best predictors of food prices at an 18 month horizon. Can you help me with that?"

messages = [
    LLMMessage.system_message(system_message),
    LLMMessage.user_message(main_prompt),
]

response = llm.post_chat_request(messages).first_choice
print(response)

Sure, I can help with that. Here's a detailed plan:

1. Identify relevant economic factors: We'll start by identifying a list of potential economic factors that could influence food prices. These could include factors like GDP, inflation, unemployment rate, oil prices, and agricultural commodity prices.

2. Access economic data: We'll use the FRED database to access historical data for these economic factors. We'll use the fredapi Python package to download this data.

3. Access food price data: We'll also need historical data on food prices. This could be a specific food price index or a broader measure of food prices. We'll also download this data from the FRED database.

4. Preprocess the data: We'll need to preprocess the data to ensure it's in a suitable format for analysis. This could involve cleaning the data, handling missing values, and aligning the data by date.

5. Create lagged variables: Since we're interested in predicting food prices at an 18 month horizon, we'll need to