First, we can create a new project and prepare the dataset in `.jsonl` format.
We have already placed the required data files in advance. So now we can directly proceed.

In [None]:
from seed import *
CreateProject(name="amazon_google", workspace="../")
# Prepare data as `.jsonl` files if not exists

After creating the project, a defult `config.json` is located under the projects' root folder, we can modify it to fit for our entity resolution task.
The most important parameters are `name`, `task_desc`, `inputs`, `outputs`, `evaluation_metric` and `evaluation_path`, which should be defined by the user and should be application-specific.
Here we use only the `Amazon-Google_demo.jsonl` dataset for ease of running.
Here we directly choose a relatively balanced good config.

In [None]:
config = LoadJson("config.json")
config = config | {
    "name": "entity_resolution",
    "task_desc": "Given two products, determine whether they are identical product.",
    "inputs": [
        {
            "name": "entity1",
            "type": "dict",
            "desc": "It contains three attributes: `title`, `manufacturer`, `price`. `title` and `manufacturer` are strings, `price` is float."
        },
        {
            "name": "entity2",
            "type": "dict",
            "desc": "Same as entity1."
        }
    ],
    "outputs": [
        {
            "name": "is_same",
            "type": "bool",
            "desc": "0 if the two product are not identical, 1 of the two products are identical.",
            "default": 0
        }
    ],
    "evaluation_metric": "f1",
    "evaluation_path": "./data/Amazon-Google_demo.jsonl",
    "examples_path": "./data/Amazon-Google_valid.jsonl",
    "labelled_path": "./data/Amazon-Google_train.jsonl",
    
    "activate_cache": True,
    "cache_frozen_ckpt": "./ckpts/amazon_google",
    "cache_confidence_ratio": 0.8,
    
    "activate_model": True,
    "model_type": "AutoModelForSequenceClassification",
    "model_initial_ckpt": "./ckpts/amazon_google",
    "model_confidence_ratio": 0.6,
    "model_confidence_default": 0.8,
    "model_sync_off": True,
    "model_sync_confi": 512,
    
    "activate_codeg": False,
    
    "activate_tools": False,
    
    "activate_llmqa": True,
    "examples_mode": "balanced",
}

In this example, as we have labelled data, we can finetune a model using the labelled data, which can be then used by the `cache` and `model` agent.

In [None]:
! python train_model.py

Now, we can directly compile the config to get a working solution.

In [None]:
SaveJson(config, "config.json")
CompileProject("./")
from evaluation import *
evaluate_entity_resolution()

In [None]:
PrintJson(LoadJson("profile.json"))