**Packages**

In [25]:
import pandas as pd
import re
import numpy as np
from openai import OpenAI
import os
from dotenv import load_dotenv

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain.chains import LLMChain


**Helpers imports**

In [26]:
from helpers.models import adjusted_r2, run_xgboost_cv
from helpers.data_extraction import extract_section_7, load_automobile_df
from helpers.llm_helpers import get_schema_text_for_llm, make_summary
from helpers.general_helpers import get_next_run_idx, get_interesting_idxs

Parameters api key and base url

In [27]:
load_dotenv()
api_key = os.getenv("key")
base_url = os.getenv("api")

Get the dataset

In [28]:
data_path = "automobile/imports-85.data"
names_path = "automobile/imports-85.names"
df, informations = load_automobile_df(data_path, names_path)

In [29]:
door_map = {
    'four': 4, 
    'two': 2
}

cyl_map = {
    'eight': 8,
    'five': 5,
    'four': 4,
    'six': 6,
    'three': 3,
    'twelve': 12,
    'two': 2
}

# Convert to integer columns. I put Int64 because one variables has missing values.
df['num-of-doors'] = df['num-of-doors'].map(door_map).astype('Int64')
df['num-of-cylinders'] = df['num-of-cylinders'].map(cyl_map).astype('Int64')

df['curb-weight'] = df['curb-weight'].astype('float64')
df['engine-size'] = df['engine-size'].astype('float64')

In [30]:
df.head()

Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,,alfa-romero,gas,std,2,convertible,rwd,front,88.6,...,130.0,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,2,convertible,rwd,front,88.6,...,130.0,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,2,hatchback,rwd,front,94.5,...,152.0,mpfi,2.68,3.47,9.0,154.0,5000.0,19,26,16500.0
3,2,164.0,audi,gas,std,4,sedan,fwd,front,99.8,...,109.0,mpfi,3.19,3.4,10.0,102.0,5500.0,24,30,13950.0
4,2,164.0,audi,gas,std,4,sedan,4wd,front,99.4,...,136.0,mpfi,3.19,3.4,8.0,115.0,5500.0,18,22,17450.0


***THREE agent chain***

**1 - Analyst Agent**

**2 - Feature Engineering Agent**

**3 - Decision Support Agent**

After that, we apply the feature engineering, and we use the same Explanation Agent

In [31]:
os.environ["OPENAI_API_KEY"] = api_key
os.environ["OPENAI_API_BASE"] = base_url

In [32]:
chat = ChatOpenAI(model_name="deepseek/deepseek-r1-0528:free", 
                  temperature=0.2)

eda_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        f"""You are an automotive data analyst agent.

            You will receive a summary of an automobile dataset, including:
            - A list of all columns, with their data types, attribute ranges or value examples, and missing value counts
            - Sample records
            - Key statistics (e.g. mean, std, min, max, skew, etc.) for numerical and categorical variables
            - Optional EDA findings (distribution shape, outlier notes, trends, correlations)

            Your responsabilities are:
            1. For each variable/column, explain what it likely measures, its units or categories, and its practical meaning in the context of vehicle data. Be explicit and do not group variables—analyze each one individually.
            2. Detect possible semantic relationships, physical or business meanings, and likely associations with other columns (e.g. which pairs have strong correlations, which affect price/risk, etc).
            3. Call out notable trends, strong correlations, anomalies, and outliers. Reference variable names and statistics explicitly.
            4. Suggest ways the dataset could be enriched: for each, mention relevant external datasets, additional columns, or real-world data sources (e.g. safety ratings, CO2 emissions, vehicle class, market info) that could be joined or mapped by column name.

            Format your answer as:
            - A numbered list where each variable is explained with name, meaning, type, and any key distribution or outlier notes.
            - A section on key trends, correlations and anomalies
            - A section with enrichment recommendations, referencing specific variable names

            Please, focus on schema interpretation (variable meanings), EDA, and actionable suggestions for deeper analysis.
        """
    ),
    HumanMessagePromptTemplate.from_template("{schema_summary}")
])

eda_chain = LLMChain(llm=chat, prompt=eda_prompt)

feat_eng_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        """
        You are an automotive feature engineering agent.

        Your input is the complete output of a Data Analyst Agent, including:
        - The dataset schema with column names, attribute ranges, data types, and missing value counts.
        - Sample records, statistics, and EDA findings.
        - Explicit interpretations of each variable and summary of trends/outliers from data analysis.

        Your tasks are:
        1. Purpose new features for predictive modeling and interpretability by combining, transforming, enriching the columns provided or by suggesting realistic external enrichment.
        2. For each suggestion, specify:
            - Feature Name
            - Formula – Show how it is calculated (including mathematical formula or logic or code-like description referencing column names and attribute ranges)
            - Justification – Business or automotive domain reason for its value (e.g., why it matters for price, efficiency, safety, etc.)
            - Reasoning Chain – Step-by-step logic or EDA findings that motivate this feature (why it could improve prediction, segmentation, or insight)

        Guidelines:
            - Use only the columns in the schema/EDA unless you explicitly recommend a realistic external enrichment (note the external source in justification).
            - Cite variable names and attribute ranges where helpful.
            - Suggest both classic (e.g., ratios, interactions, indicators) and creative features, but keep formulas practical—no fictitious data or overly complex recipes.

        Output all suggestions as a markdown table:
        | Feature Name | Formula        | Justification | Reasoning Chain |
        |--------------|----------------|---------------|-----------------|

        Then, in a short section below, summarize the main rationale for your key suggestions (e.g., which features link to performance, efficiency, market segmentation, or domain knowledge).
        {analyst_output}
        """
    ),
    HumanMessagePromptTemplate.from_template("{analyst_output}")
])

feat_eng_chain = LLMChain(llm=chat, prompt=feat_eng_prompt)

decision_prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        f"""
        You are a decision support AI advisor for an automotive executive team.

        You will be given:
        - Data Analyst Agent output (variable schema, findings, key trends)
        - Feature Engineering Agent output (proposed features, formulas, justification)

        Your tasks are:
        - Come up with 4 to 6 clear, business-focused recommendations. These could be about pricing, product features, or ideas for improving how the company collects or uses data.
        - In each recommendation, point to specific variables or engineered features—and the findings that back up your advice.
        - Tell us which of the newly created features (the ones suggested by feature engineering agent) would be the most valuable or strategic to start using, and explain why.
        - If you spot any features that seem redundant or not particularly useful, mention them briefly.
        - Besides that, If you find any business risks or opportunities not currently covered by the features or data, mention them as potential priorities for further analysis or data collection.
        - Write without using too much technical terms. Focus on advice that leadership can actually use.
        """
    ),
    HumanMessagePromptTemplate.from_template("{analyst_output}\n\n{feature_output}")
])

decision_chain = LLMChain(llm=chat, prompt=decision_prompt)

summary = make_summary(df, informations)

eda_out = eda_chain.run(schema_summary=summary) 
print("\n=== Analyst Agent Output ===\n", eda_out[:1200], "...")

features_out = feat_eng_chain.run(analyst_output=eda_out) 
print("\n=== Feature Engineering Agent Output ===\n", features_out[:1200], "...")

decision_out = decision_chain.run(analyst_output=eda_out, feature_output=features_out)
print("\n=== Decision Support Agent Output ===\n", decision_out[:1200], "...\n")


=== Analyst Agent Output ===
 ### 1. Variable Explanations

1. **symboling** (int64)  
   - Meaning: Insurance risk rating (-3 = lowest risk, 3 = highest risk). Reflects vehicle safety/claim likelihood.  
   - Type: Ordinal integer.  
   - Distribution: Mean 0.83, std 1.25. Slight right skew (0.21). Most vehicles cluster at 0–2 (neutral risk).  

2. **normalized-losses** (float64)  
   - Meaning: Relative average loss payment (insurance claims) scaled to 100. Higher = worse safety/theft record.  
   - Type: Continuous numerical.  
   - Missing: 41 (20%). Range 65–256.  
   - Distribution: Mean 122, std 35.4. Right-skewed (0.77). Mode=161 suggests common high-loss vehicles.  

3. **make** (object)  
   - Meaning: Vehicle manufacturer (e.g., Toyota, BMW).  
   - Type: Nominal categorical.  
   - Distribution: Toyota most frequent (32/205), Nissan second (18).  

4. **fuel-type** (object)  
   - Meaning: Engine fuel type (diesel/gas).  
   - Type: Binary categorical.  
   - Distribution:

In [33]:
def ensure_dir(path):
    if not os.path.exists(path):
        os.makedirs(path)


def get_next_chain_run_idx(folder):
    existing = [fname for fname in os.listdir(folder) if fname.endswith(".txt")]
    idxs = []
    for fname in existing:
        try:
            idx = int(fname[:3])
            idxs.append(idx)
        except:
            pass
    next_idx = max(idxs) + 1 if idxs else 1
    return f"{next_idx:03d}"

def save_agent_output(agent_type, output, idx, folder="outputs/3_agents_chain/"):
    os.makedirs(folder, exist_ok=True)
    fname = os.path.join(folder, f"{idx}_{agent_type}.txt")
    with open(fname, "w", encoding="utf-8") as f:
        f.write(output)
    return fname

folder = "outputs/3_agents_chain/"
run_idx = get_next_chain_run_idx(folder)  # Call ONCE per workflow run

analyst_outfile = save_agent_output("analyst", eda_out, run_idx, folder)
feature_outfile = save_agent_output("feature", features_out, run_idx, folder)
decision_outfile = save_agent_output("decision", decision_out, run_idx, folder)

print("Saved:", analyst_outfile, feature_outfile, decision_outfile)

Saved: outputs/3_agents_chain/009_analyst.txt outputs/3_agents_chain/009_feature.txt outputs/3_agents_chain/009_decision.txt


In [34]:
print(features_out)

### Feature Engineering Suggestions

| Feature Name             | Formula                                                                 | Justification                                                                 | Reasoning Chain                                                                                                                                                               |
|--------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Power-to-Weight Ratio** | `horsepower / curb_weight`                                              | Critical performance metric: Higher ratios indicate better acceleration and handling, directly impacting desirability and price.             

In [35]:
print(decision_out)

### Business Recommendations for Automotive Leadership

#### 1. **Target Performance Enthusiasts with Turbocharged RWD Models**  
   - **Action**: Prioritize RWD + turbocharged vehicle development (e.g., sports sedans/SUVs) and market them as premium offerings.  
   - **Why**: The *Premium Drivetrain Flag* (RWD + turbo) correlates with 47% higher horsepower and premium pricing. EDA confirms RWD vehicles dominate high-price segments.  
   - **Data Backing**: Strong link between `drive-wheels=rwd`, `aspiration=turbo`, and high `price`/`horsepower`. Only 15% of current models have this combo—a whitespace opportunity.  

#### 2. **Optimize City-Focused Vehicles for Fuel Efficiency**  
   - **Action**: Improve city MPG in compact vehicles using lightweight materials and hybrid tech, especially for high-sales brands like Toyota.  
   - **Why**: The *Fuel Economy Gap* (highway MPG – city MPG) reveals most vehicles sacrifice city efficiency. EDA shows `city-mpg` drops sharply with higher `curb

**Training XGBOOST with all features and with the new features suggested by the the multi-agent system. I will only use the ones given by the decision support agent**

In [36]:
df.columns

Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',
       'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',
       'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',
       'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',
       'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',
       'highway-mpg', 'price'],
      dtype='object')

In [37]:
features = [col for col in df.columns if col != 'price']
xgboost_results_original = run_xgboost_cv(df, features, False)

Fitting 5 folds for each of 96 candidates, totalling 480 fits
Best parameters: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 400, 'subsample': 1}
Best CV RMSE: 2616.435608016184
Test RMSE: 2724.65
Test MAE: 1810.04
Test R²: 0.905
Adjusted R²: 0.748

Top Feature Importances:
make           0.280320
engine-size    0.167140
engine-type    0.133444
curb-weight    0.121030
highway-mpg    0.055936
fuel-system    0.036141
horsepower     0.031452
city-mpg       0.028791
width          0.024451
body-style     0.023890
dtype: float32


Feature engineering agent:

| Feature Name                 | Formula                                                                 | Justification                                                                 | Reasoning Chain                                                                                                                                 |
|------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| **Power-to-Weight Ratio**    | `horsepower / curb-weight`                                             | Measures performance efficiency. Higher ratios indicate better acceleration and driving dynamics, directly impacting consumer appeal.        | EDA shows strong correlations between horsepower, weight, and price. Outliers (e.g., 288 HP sports cars) highlight performance extremes.       |
| **Fuel Efficiency Gap**      | `highway-mpg - city-mpg`                                               | Quantifies drivetrain optimization. Larger gaps indicate highway-optimized designs (e.g., high-geared transmissions).                        | Highway MPG consistently exceeds city MPG. Diesel vehicles show distinct efficiency patterns; this feature captures engineering trade-offs.     |
| **Engine Specific Output**   | `horsepower / engine-size`                                              | Reflects engine technological advancement. Higher values = more power per unit displacement (e.g., turbocharging benefits).                  | Outliers in engine-size (326cc) and horsepower (288 HP) suggest performance engineering extremes. Turbocharged engines may boost this metric. |
| **Size Compactness Index**   | `curb-weight / (length * width * height)`                              | Indicates structural density. Higher values suggest robust/safety-focused designs; lower values imply spacious interiors.                    | EDA reveals correlations between size (length/width) and weight. Outliers in wheel-base (120.9") may skew this metric.                        |
| **Diesel Efficiency Factor** | `compression-ratio * (1 if fuel-type=='diesel' else 0)`                | Captures diesel engines' thermodynamic advantage. High compression ratios (∼23) improve efficiency but may increase manufacturing costs.    | Diesel engines show 2x higher compression ratios than gasoline. This isolates their unique efficiency profile.                                |
| **Turbo Performance Flag**   | `1 if aspiration=='turbo' else 0`                                      | Identifies forced-induction engines. Turbocharging boosts power but may increase risk (symboling) and complexity.                            | Only 18% of vehicles are turbocharged. EDA links turbo aspiration to high-risk (symboling) and premium pricing.                               |
| **Luxury Brand Indicator**   | `External: Binary flag (e.g., 1=BMW/Mercedes/Jaguar, 0=others) via make` | Segments premium manufacturers. Luxury brands command price premiums and exhibit distinct engineering (e.g., RWD bias, larger engines).      | EDA identifies make as a key price driver. External brand-tier data contextualizes pricing outliers (>$45k).                                  |
| **Safety Rating Score**      | `External: Numeric safety score (e.g., IIHS) mapped to make/body-style` | Validates insurance risk (symboling). High safety scores reduce claim likelihood and may lower ownership costs.                              | Symboling correlates with performance extremes. External safety data explains anomalies (e.g., convertible vs. sedan risk differences).        |
| **Vehicle Footprint**        | `wheel-base * track-width` (using `width` as proxy)                    | Regulatory metric for efficiency classes. Larger footprints improve stability but reduce maneuverability.                                    | EPA uses footprint for emissions standards. EDA shows wheel-base outliers (120.9") – this feature aids efficiency segmentation.              |
| **Cylinder Efficiency**      | `horsepower / num-of-cylinders`                                        | Measures per-cylinder effectiveness. High values indicate advanced combustion tech (e.g., DOHC), affecting performance and cost.             | 12-cylinder engines show extreme power but diminishing returns. This metric highlights engineering efficiency across cylinder counts.         |


Decision support agent said:

Strategic Engineered Features  
- **Most Valuable**: **Power-to-Weight Ratio** and **Luxury Brand Indicator**.  
  - **Why**: They directly link performance and brand prestige to pricing strategy—critical for targeting high-margin segments (e.g., sports/luxury cars).  
- **Less Useful**: **Cylinder Efficiency** (horsepower/num-of-cylinders).  
  - **Why**: Redundant with `engine-size` and `horsepower`; adds minimal new insight. 

In [38]:
df_feat = df.copy()

In [39]:
# Power-to-Weight Ratio
df_feat['power-to-weight-ratio'] = df_feat['horsepower'] / df_feat['curb-weight']

# Luxury Brand Indicator
luxury_brands = ['bmw', 'mercedes-benz', 'jaguar', 'audi', 'porsche', 'volvo']
df_feat['luxury-brand-indicator'] = df_feat['make'].isin(luxury_brands).astype(int)

In [40]:
features = [col for col in df_feat.columns if col != 'price']
xgboost_results_llm = run_xgboost_cv(df_feat, features, False)

Fitting 5 folds for each of 96 candidates, totalling 480 fits
Best parameters: {'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 400, 'subsample': 0.8}
Best CV RMSE: 2602.221461098943
Test RMSE: 2856.32
Test MAE: 1854.09
Test R²: 0.896
Adjusted R²: 0.680

Top Feature Importances:
engine-size         0.264905
highway-mpg         0.136300
num-of-cylinders    0.119709
make                0.118348
engine-type         0.075887
curb-weight         0.073893
width               0.038131
drive-wheels        0.029586
fuel-system         0.025755
horsepower          0.023456
dtype: float32


The decision support agent proactively recommended two variables that are not among the most important features according to the XGBoost model. As a result, the RMSE increased while the R squared value also went up. This could indicate that these recommended variables are introducing noise or complexity rather than improving predictive accuracy.

When I used only the feature engineering agent, the results were somewhat better, suggesting that a focused approach to feature selection may be more effective in this context. It's also worth noting that there is some randomness in the system's outputs—each time I run the multi-agent workflow, I may get slightly different suggestions and results. This variability is likely due to the stochastic nature of large language models and agentic pipelines.

**Explanation agent**

In [41]:
def explain_prediction_llm(row, y_true, y_pred, importances, rmse, r2, cv_rmse, client,
                           ml_model="XGBoost", model="deepseek/deepseek-r1-0528:free",
                           txt_filename="llm_explanation.txt", temperature=0.2
):
    
    top_features = importances.head(5).index.tolist()
    feature_vals = "\n".join([f"- **{feat}**: {row[feat]}" for feat in top_features])
    prompt = f"""
    You are an AI assistant for explaining car price predictions to non-technical stakeholders

    A machine learning model ({ml_model}) was trained to predict vehicle prices based on technical and categorical features.
    Here are its key performance metrics:
    - Test RMSE: ${rmse:.2f} — the typical error in predicted prices
    - Test R²: ${r2:.3f} — explanatory power on unseen data
    - CV RMSE: ${cv_rmse:.2f} — cross-validated training error

    Below is one car's prediction:
    - Actual price: ${y_true:.2f}
    - Predicted price: ${y_pred:.2f}

    Key features for this car:  
    {feature_vals}

    Top overall features driving model predictions:  
    {top_features}

    Your task (for non-technical stakeholders):

    - Write a clear, plain-language explanation of why the model predicted this car’s price as it did.  
        - Reference the most important features and their values for this specific car.
        - Highlight which features make it expensive or cheap.
        - Mention if the prediction is likely to be especially reliable or uncertain (for example, maybe the car has rare features or brand).
    - Briefly explain what the overall performance metrics (RMSE, R²) mean for buyers, sellers, or decision makers.

    Don’t use technical terms unless they are explained above. Always relate your explanation to real-world or business ideas.
    """
    messages = [
        {"role": "system", "content": "You are a helpful AI analytics explainer."},
        {"role": "user", "content": prompt}
    ]
    response = client.chat.completions.create(model=model, messages=messages, temperature=temperature)

    output = response.choices[0].message.content
    with open(txt_filename, "w", encoding="utf-8") as f:
        f.write(output)
    return output

In [42]:
y_test = xgboost_results_llm['y_test']
y_pred = xgboost_results_llm['y_pred']
importances = xgboost_results_llm['importances']
X_test = xgboost_results_llm['X_test']
cv_rmse = xgboost_results_llm['cv_rmse']
rmse = xgboost_results_llm['rmse']
r2 = xgboost_results_llm['r2']

In [43]:
interesting_idxs = get_interesting_idxs(y_test, y_pred)

In [44]:
def explain_interesting_cases(
    interesting_idxs, X_test, y_test, y_pred,
    importances, rmse, r2, cv_rmse, client, ml_model, next_idx,
    out_dir="outputs/explanation_agent",
    temperature=0.2
):
    """
    Function that runs all interesing indices and call explain_prediction_llm for each case, using a given next_idx.

    Inputs:

    interesting_idxs : list of int
        Indices to explain in the test set.
    X_test : pandas DataFrame
        Test set features.
    y_test : pandas Series or array
        True test values.
    y_pred : array
        Corresponding model predictions.
    importances : any
        Feature importance data to pass to explainer.
    rmse, r2, cv_rmse : float
        Metrics for the explainer.
    client : object
        LLM client object.
    ml_model : str
        Model name string.
    next_idx : str
        The zero-padded index string for this batch of explanations.
    out_dir : str
        Directory to save explanations.
    temperature : float
        LLM temperature setting.

    Output:
    
    outputs : dict
        Dictionary mapping idx to explanation string.
    """
    outputs = {}
    for idx in interesting_idxs:
        row = X_test.iloc[idx]
        y_true = y_test.iloc[idx]
        y_hat = y_pred[idx]
        txt_filename = os.path.join(out_dir, f"{next_idx}_three_agent_chain_explanation_{idx}.txt")
        output = explain_prediction_llm(
            row=row,
            y_true=y_true,
            y_pred=y_hat,
            importances=importances,
            rmse=rmse,
            r2=r2,
            cv_rmse=cv_rmse,
            client=client,
            ml_model=ml_model,
            txt_filename=txt_filename,
            temperature=temperature
        )
        outputs[idx] = output
        print(f"\n=== Explanation for test row {idx} ===\n")
        print(output)
    return outputs

In [45]:
out_dir = "outputs/explanation_agent"
next_idx = get_next_run_idx(out_dir, agent_name="three_agent_chain_explanation")
ml_model = "XGBoost"

In [46]:
client = OpenAI(
    base_url=base_url,
    api_key=api_key
)

In [47]:
explain_interesting_cases(interesting_idxs, X_test, y_test, y_pred, importances, rmse, r2, cv_rmse, client, ml_model,next_idx)


=== Explanation for test row 6 ===

### Explanation for the Specific Car Prediction  
**Actual Price**: $7,788.00  
**Predicted Price**: $7,915.62 (only **$127.62 higher** than actual)  

The model predicted this car's price very accurately because its key features align with typical patterns learned from historical data. Here’s why:  

1. **Key features influencing this prediction**:  
   - **Engine size (110.0)**: Smaller engines usually cost less. Yours is relatively small, pulling the price down.  
   - **Highway MPG (47)**: High fuel efficiency (47 MPG) suggests lower operating costs, often linked to more affordable cars.  
   - **Cylinders (4)**: 4-cylinder engines are common in budget-friendly vehicles, reducing the price.  
   - **Make (Toyota)**: Toyota is a reliable brand with moderate pricing (not luxury, not budget).  
   - **Engine type (OHC)**: Overhead cam engines are efficient but standard, not premium.  

**Why it’s affordable**:  
- The combination of a small engine 

{6: "### Explanation for the Specific Car Prediction  \n**Actual Price**: $7,788.00  \n**Predicted Price**: $7,915.62 (only **$127.62 higher** than actual)  \n\nThe model predicted this car's price very accurately because its key features align with typical patterns learned from historical data. Here’s why:  \n\n1. **Key features influencing this prediction**:  \n   - **Engine size (110.0)**: Smaller engines usually cost less. Yours is relatively small, pulling the price down.  \n   - **Highway MPG (47)**: High fuel efficiency (47 MPG) suggests lower operating costs, often linked to more affordable cars.  \n   - **Cylinders (4)**: 4-cylinder engines are common in budget-friendly vehicles, reducing the price.  \n   - **Make (Toyota)**: Toyota is a reliable brand with moderate pricing (not luxury, not budget).  \n   - **Engine type (OHC)**: Overhead cam engines are efficient but standard, not premium.  \n\n**Why it’s affordable**:  \n- The combination of a small engine (110.0), excellent

***FINAL REMARKS***

I decided to keep the Explainer Agent separate from the main agent workflow, because it relies on model outputs—things like predictions and feature importances—which only become available after the earlier steps, such as data cleaning, feature engineering, and training the model (for example, with XGBoost). So, my core process is:

Data Analyst Agent -> Feature Engineer Agent -> Decision Support Agent

These three agents work together to prepare the data and guide the overall modeling process. 
The Decision Support Agent pulls insights from both the Data Analyst and Feature Engineer to help set modeling priorities and give us business oriented recommendations.

Once the model is trained, the Explainer Agent comes into play. Its job is to turn the raw model results into clear, natural-language explanations that make sense to stakeholders who possibly don’t have a technical background.

Looking ahead, I would like to add another agent that can actually generate—and, if appropriate, execute—the code for creating new features. 

However, any automatic code execution needs to be handled with care, since mistakes or unexpected code could cause errors or affect the reliability of the workflow.