In [1]:
import os
import sys
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from pathlib import Path
module_path = os.path.abspath(os.path.join(os.sep.join(['..'])))
if module_path not in sys.path:
    sys.path.append(module_path)
from fedot_llm.data.loaders import PathDatasetLoader
from fedot_llm.main import FedotAI

In [2]:
def _set_env(var: str):
    if not os.environ.get(var):
        print(f"No {var} in env")

load_dotenv()
_set_env("LANGSMITH_API_KEY")
_set_env("OPENAI_TOKEN")
_set_env("VSEGPT_TOKEN")

In [3]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "FEDOT.LLM"

In [8]:
llm = ChatOpenAI(model='gpt-4o-mini', base_url='https://models.inference.ai.azure.com', api_key=os.environ['OPENAI_TOKEN'])
# llm = ChatOpenAI(model='gpt-4o', base_url='https://models.inference.ai.azure.com', api_key=os.environ['OPENAI_TOKEN'])
# llm = ChatOpenAI(model='openai/gpt-4o-mini', base_url='https://api.vsegpt.ru/v1/', api_key=os.environ['VSEGPT_TOKEN'])
# llm = ChatOpenAI(model='meta-llama/llama-3.1-70b-instruct', base_url='https://api.vsegpt.ru/v1/', api_key=os.environ['VSEGPT_TOKEN'])

msg="""Create a model that perform this task:
Our client is an insurance company that has provided health insurance to its customers.
They are interested in whether the policyholders (customers) from last year
will also be interested in the car insurance provided by the company."""

dataset_path = Path(module_path) / 'datasets' / 'Health_Insurance'
dataset = PathDatasetLoader.load(dataset_path)

In [5]:
from fedot_llm.output.jupyter import JupyterOutput
fedot_ai = FedotAI(dataset=dataset, 
                   model=llm,
                   handlers=JupyterOutput().subscribe)
async for _ in fedot_ai.ask("Fedot automl classification example?"):
    continue

================== HumanMessage ==================

Fedot automl classification example?

=================== Supervisor ===================

calls:
  ResearcherAgent:
    args:
      question: Fedot AutoML classification example


================ ResearcherAgent =================

To perform a classification task using FEDOT AutoML, you can follow these steps:

1. **Import the FEDOT API**:
   ```python
   from fedot.api.main import Fedot
   ```

2. **Load your data**:
   Load your training and test data from CSV files into Pandas dataframes:
   ```python
   train = pd.DataFrame('train.csv')
   test = pd.DataFrame('test.csv')
   ```

3. **Initialize the FEDOT model**:
   Define the type of problem as `classification` and specify the evaluation metric, for example, `roc_auc`:
   ```python
   model = Fedot(problem='classification', metric='roc_auc')
   ```

4. **Fit the model**:
   Begin the optimization process and obtain the resulting composite pipeline:
   ```python
   best_pipeline = model.fit(features=train, target='target')
   ```

5. **Inspect the pipeline**:
   After fitting, you can examine the structure of the resulting pipeline:
   ```python
   best_pipeline.print_structure()
   ```

This process automates the design of a machine learning pipeline for classification tasks using FEDOT [\[1\]](https://fedot.readthedocs.io/en/latest/basics/tabular_data.html).

2024-09-19 17:50:35,092 - HTTP Request: POST https://models.inference.ai.azure.com/chat/completions "HTTP/1.1 200 OK"


In [9]:
from fedot_llm.output.jupyter import JupyterOutput
fedot_ai = FedotAI(dataset=dataset, 
                   model=llm,
                   handlers=JupyterOutput().subscribe)
async for _ in fedot_ai.ask(msg):
    continue

================== HumanMessage ==================

Create a model that perform this task:
Our client is an insurance company that has provided health insurance to its customers.
They are interested in whether the policyholders (customers) from last year
will also be interested in the car insurance provided by the company.

=================== Supervisor ===================

calls:
  AutoMLAgent:
    args:
      dataset_description: The dataset consists of policyholder information from an
        insurance company, including features such as age, gender, previous health
        insurance policies, claim history, customer satisfaction ratings, and whether
        they expressed interest in car insurance. The task is to predict if these
        policyholders will be interested in purchasing car insurance based on their
        profiles.


================== AutoMLAgent ===================

Here is the pipeline of the model I built:

# Model Pipeline
The pipeline consists of two stages:
1. **Scaling**: This stage prepares the data by standardizing or normalizing it, ensuring that all features contribute equally to the model's performance.
2. **Random Forest (RF)**: The model is a Random Forest classifier configured to use 12 jobs for parallel processing, enhancing computational efficiency.

# Model Metrics:
| Metric   | Value  |
|----------|--------|
| ROC AUC  | 0.844  |
| Accuracy | 0.871  |

These metrics indicate that the model performs well. The **ROC AUC** score of 0.844 suggests a strong ability to distinguish between positive and negative classes, where a score of 1 indicates perfect classification. The **accuracy** of 0.871 indicates that 87.1% of the predictions made by the model are correct, reflecting a high level of overall correctness in the predictions.