In [1]:
import os
import sys
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
import asyncio
from pathlib import Path
module_path = os.path.abspath(os.path.join(os.sep.join(['..'])))
if module_path not in sys.path:
    sys.path.append(module_path)
from fedot_llm.data.loaders import PathDatasetLoader
from fedot_llm.main import FedotAI

In [3]:
def _set_env(var: str):
    if not os.environ.get(var):
        print(f"No {var} in env")

load_dotenv()
_set_env("LANGSMITH_API_KEY")
_set_env("OPENAI_TOKEN")
_set_env("VSEGPT_TOKEN")

In [4]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "AutoML-LLM"

In [5]:
# llm = ChatOpenAI(model='gpt-4o-mini', base_url='https://models.inference.ai.azure.com', api_key=os.environ['OPENAI_TOKEN'])
# llm = ChatOpenAI(model='gpt-4o', base_url='https://models.inference.ai.azure.com', api_key=os.environ['OPENAI_TOKEN'])
llm = ChatOpenAI(model='openai/gpt-4o-mini', base_url='https://api.vsegpt.ru/v1/', api_key=os.environ['VSEGPT_TOKEN'])

msg="""Create a model that perform this task:
Our client is an insurance company that has provided health insurance to its customers.
They are interested in whether the policyholders (customers) from last year
will also be interested in the car insurance provided by the company."""

dataset_path = Path(module_path) / 'datasets' / 'Health_Insurance'
dataset = PathDatasetLoader.load(dataset_path)

In [6]:
from fedot_llm.output.jupyter import JupyterOutput
fedot_ai = FedotAI(dataset=dataset, 
                   model=llm,
                   handlers=JupyterOutput().subscribe)
await fedot_ai.ask("Fedot automl classification example?")

================== HumanMessage ==================

Fedot automl classification example?

=================== Supervisor ===================

Calling ResearcherAgent for help. With args: {'question': 'Can you provide an example of a classification task using the Fedot AutoML framework?'}

================ ResearcherAgent =================

To perform a classification task using the Fedot AutoML framework, you can follow these steps:

1. **Import the necessary libraries**:
   ```python
   from fedot import Fedot
   from fedot.core.data.data import InputData
   ```

2. **Load your data** and split it into training and testing sets. You can create an `InputData` object from a DataFrame:
   ```python
   data = InputData.from_dataframe(features_df, target_df, task=Task(TaskTypesEnum.classification))
   train, test = train_test_data_setup(data)
   ```

3. **Initialize the Fedot model** for classification, specifying the metric and timeout:
   ```python
   model = Fedot(problem='classification', metric='roc_auc', timeout=5)
   ```

4. **Fit the model** to your training data:
   ```python
   best_pipeline = model.fit(features=train, target='target')
   ```

This example demonstrates how to set up a classification task using Fedot, including data preparation and model initialization. You can find more details in the [FEDOT documentation](https://fedot.readthedocs.io/en/latest/examples/classification_example.html) [\[1\]](https://fedot.readthedocs.io/en/latest/examples/classification_example.html).

2024-09-13 15:20:22,680 - HTTP Request: POST https://api.vsegpt.ru/v1/chat/completions "HTTP/1.1 200 OK"


In [7]:
from fedot_llm.output.jupyter import JupyterOutput
fedot_ai = FedotAI(dataset=dataset, 
                   model=llm,
                   handlers=JupyterOutput().subscribe)
await fedot_ai.ask(msg)

================== HumanMessage ==================

Create a model that perform this task:
Our client is an insurance company that has provided health insurance to its customers.
They are interested in whether the policyholders (customers) from last year
will also be interested in the car insurance provided by the company.

=================== Supervisor ===================

Calling AutoMLAgent for help. With args: {'dataset_description': 'The dataset includes information about policyholders who purchased health insurance last year. Features may include demographics (age, gender, income), health insurance details (policy type, duration, claim history), and any previous interest expressed in car insurance. The task is to predict whether these policyholders will be interested in purchasing car insurance this year.'}

================== AutoMLAgent ===================

Here is the pipeline of the model I built:

# Model Pipeline
The pipeline consists of two stages:
1. **Scaling**: This stage prepares the data for the model by transforming it, although specific scaling techniques are not defined in the configuration.
2. **Random Forest (rf)**: This is the model stage, which utilizes a Random Forest algorithm configured to use 12 jobs for parallel processing to enhance performance.

# Model Metrics:
| Metric    | Value  |
|-----------|--------|
| ROC AUC   | 0.844  |
| Accuracy  | 0.871  |

These metrics indicate that the model performs well in distinguishing between the classes. 

- **ROC AUC (Receiver Operating Characteristic Area Under the Curve)**: A value of 0.844 suggests that the model has a good ability to discriminate between positive and negative classes. The closer the value is to 1, the better the model is at making correct predictions.
  
- **Accuracy**: An accuracy of 0.871 indicates that the model correctly classified approximately 87.1% of the instances in the dataset. This reflects a high level of overall correctness in the model's predictions.