First, we can create a new project and prepare the dataset in `.jsonl` format.
We have already placed the required data files in advance. So now we can directly proceed.

In [None]:
from seed import *
CreateProject(name="pubmed", workspace="../")
# Prepare data as `.jsonl` files if not exists

After creating the project, a defult `config.json` is located under the projects' root folder, we can modify it to fit for our entity resolution task.
The most important parameters are `name`, `task_desc`, `inputs`, `outputs`, `evaluation_metric` and `evaluation_path`, which should be defined by the user and should be application-specific.
Here we directly choose a relatively balanced good config.

In [None]:
config = LoadJson("config.json")
config = config | {
    "name": "medical_relation",
    "task_desc": "Given a medical sentence and two medical entities, determine whether a medical relation exists between the two entities.",
    "inputs": [
        {
            "name": "sentence",
            "type": "str",
            "desc": "The medical sentence."
        },
        {
            "name": "term1",
            "type": "str",
            "desc": "The subject of relation (if exists)."
        },
        {
            "name": "term2",
            "type": "str",
            "desc": "The object of relation (if exists)."
        }
    ],
    "outputs": [
        {
            "name": "relation",
            "type": "str",
            "desc": "The relation is one of 'treats', 'causes', 'contraindicates', 'is diagnosed by', 'diagnose_by_test_or_drug', 'location', 'is location of', and 'no_relation'.",
            "default": "no_relation",
            "verbalizer": [
                "treats",
                "causes",
                "contraindicates",
                "is diagnosed by",
                "diagnose_by_test_or_drug",
                "location",
                "is location of",
                "no_relation"
            ],
            "deverbalizer": {
                "treats": 0,
                "causes": 1,
                "contraindicates": 2,
                "is diagnosed by": 3,
                "diagnose_by_test_or_drug": 4,
                "location": 5,
                "is location of": 6,
                "no_relation": 7
            }
        }
    ],
    "evaluation_metric": "obj_accuracy",
    "evaluation_path": "./data/PubMed.jsonl",
    "examples_path": "./data/PubMed_valid.jsonl",
    "labelled_path": "./data/PubMed_train.jsonl",
    
    "activate_cache": True,
    "cache_frozen_ckpt": "./ckpts/pubmed",
    "cache_confidence_ratio": 0.8,
    
    "activate_model": True,
    "model_type": "AutoModelForSequenceClassification",
    "model_initial_ckpt": "./ckpts/pubmed",
    "model_confidence_ratio": 0.5,
    "model_confidence_default": 1.0,
    "model_sync_off": True,
    "model_args": {
        "num_labels": 8
    },
    
    "activate_codeg": False,
    
    "activate_tools": False,
    
    "activate_llmqa": True,
    "examples_mode": "balanced",
    "examples_count": 8,
}

In this example, as we have labelled data, we can finetune a model using the labelled data, which can be then used by the `cache` and `model` agent.

In [None]:
! python train_model.py

Now, we can directly compile the config to get a working solution.

In [None]:
SaveJson(config, "config.json")
CompileProject("./")
from evaluation import *
evaluate_medical_relation()

In [None]:
PrintJson(LoadJson("profile.json"))