# SKILL Benchmarking — Solution Generation
本 Notebook 聚焦于批量生成任务的解题步骤，使用 `skill_test.skill_bench` 模块中实现的 ReAct agent。
输入任务来自 `skill_tasks_and_reference_solutions.json`，辅助数据目录快照位于 `data_catalogs_snapshot.json`，
所有底层源数据位于仓库根目录的 `data/Data`。

## 1. 导入依赖

In [1]:
from pathlib import Path
import json
import sys
import os
from tqdm import tqdm
# Ensure repository root is on sys.path so 'skill_test' package can be imported
PROJECT_ROOT_CANDIDATES = [Path.cwd().resolve(), *Path.cwd().resolve().parents]
for candidate in PROJECT_ROOT_CANDIDATES:
    if (candidate / 'skill_test').exists():
        if str(candidate) not in sys.path:
            sys.path.append(str(candidate))
        break

from skill_test.skill_bench.dataclasses import TaskSet
from skill_test.skill_bench.generate_solutions import generate_solutions
from skill_test.skill_bench.constants import (
    DATA_FOLDER,
    TASKS_FILENAME,
    GENERATED_SOLUTIONS_FILENAME,
    RESULTS_FOLDER,
    DEFAULT_MODEL,
    DEFAULT_TEMPERATURE,
    MODEL_REGISTRY,
)
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)
print(f"Project root added to path: {project_root}")

Project root added to path: D:\code\GeoBenchX\GeoBenchX\skill_test


## 2. 加载任务集与数据目录

In [2]:
tasks_path = DATA_FOLDER / TASKS_FILENAME
tasks = TaskSet.read_from_file(TASKS_FILENAME, folder=DATA_FOLDER)
print(f"Loaded {len(tasks)} tasks from {tasks_path}")

catalog_path = DATA_FOLDER / 'data_catalogs_snapshot.json'
catalog = json.loads(catalog_path.read_text(encoding='utf-8'))
print(f"Loaded dataset catalog snapshot with {len(catalog)} entries from {catalog_path}")


Loaded 3 tasks from D:\code\GeoBenchX\GeoBenchX\skill_test\dataset\skill_tasks_and_reference_solutions.json
Loaded dataset catalog snapshot with 2 entries from D:\code\GeoBenchX\GeoBenchX\skill_test\dataset\data_catalogs_snapshot.json


## 3. 配置解题参数

In [3]:
model = DEFAULT_MODEL
temperature = DEFAULT_TEMPERATURE
max_steps = 15
capture_history = True
output_filename = GENERATED_SOLUTIONS_FILENAME

print(f"Model: {model}")
print(f"Temperature: {temperature}")
print(f"Max steps: {max_steps}")
print(f"Capture history: {capture_history}")
print(f"Output file: {output_filename}")


Model: gpt-4o
Temperature: 0.0
Max steps: 15
Capture history: True
Output file: skill_generated_solutions.json


## 4. 运行代理生成解决步骤

In [4]:
tasks, total_input, total_output, run_dir = generate_solutions(
    tasks,
    model=model,
    temperature=temperature,
    max_steps=max_steps,
    capture_history=capture_history,
    output_filename=output_filename,
    results_folder=RESULTS_FOLDER,
)
print("Generation completed.")
print(f"Total input tokens: {total_input}")
print(f"Total output tokens: {total_output}")
print(f"Results directory: {run_dir}")


Generating SKILL solutions:   0%|                                      | 0/3 [00:00<?, ?it/s]


=== Task SKILL_TASK_001 ===
Mark the centroid of Chengdu surface.
  Step 1: load_geodata -> {'geodataset': 'Chengdu Surface', 'output_geodataframe_name': 'chengdu_surface_gdf'}
  Step 2: get_centroids -> {'geodataframe_name': 'chengdu_surface_gdf', 'output_geodataframe_name': 'chengdu_surface_centroid', 'title': 'Centroid of Chengdu Surface', 'basemap_style': 'Carto Positron'}
  Final message: The centroid of the Chengdu Surface has been marked. You can find the shapefile and map TIFF in the following locations:

- Shapefile: `D:/code/GeoBenchX/GeoBenchX/skill_test/scratch/chengdu_surface_centroid.shp`
- Map TIFF: `D:/code/GeoBenchX/GeoBenchX/skill_test/scratch/chengdu_surface_centroid_map.tif`

Please note that there was an issue with adding the basemap due to a connection timeout.
  Tokens - input: 7178, output: 191
  HTML summary saved to: task_SKILL_TASK_001.html


Generating SKILL solutions:  33%|██████████                    | 1/3 [01:02<02:05, 62.93s/it]


=== Task SKILL_TASK_002 ===
Create a 1,000-meter dissolved buffer for Wuhan urban parks and calculate its total area.
  Step 1: load_geodata -> {'geodataset': 'park_in_wuhan.shp', 'output_geodataframe_name': 'wuhan_parks'}
  Step 2: create_dissolved_buffer -> {'geodataframe_name': 'wuhan_parks', 'buffer_size_meters': 1000, 'output_geodataframe_name': 'wuhan_parks_buffer', 'dissolve_by_attribute': None, 'output_file_path': None, 'overwrite_existing': True, 'basemap_style': 'Carto Positron', 'plot_title': 'Wuhan Urban Parks 1,000-meter Buffer'}
  Step 3: calculate_polygon_areas -> {'geodataframe_name': 'wuhan_parks_buffer', 'output_variable_name': 'wuhan_parks_buffer_area'}
  Final message: I have created a 1,000-meter dissolved buffer around the Wuhan urban parks and calculated its total area. The total area of the buffer is approximately 693.07 square kilometers.
  Tokens - input: 7686, output: 207
  HTML summary saved to: task_SKILL_TASK_002.html


Generating SKILL solutions:  67%|████████████████████          | 2/3 [02:01<01:00, 60.56s/it]


=== Task SKILL_TASK_003 ===
Use the DEM data of Guangming District and describe how high the highest point of this raster data is.
  Step 1: get_raster_path -> {'rasterdataset': 'Guangming District DEM'}
  Step 2: get_raster_path -> {'rasterdataset': 'Guangming District DEM (Shenzhen)'}
  Step 3: get_raster_description -> {'raster_path': 'D:/code/GeoBenchX/GeoBenchX/data/Data/GeoData/guangming.tif'}
  Final message: The highest point in the Guangming District DEM (Shenzhen) raster data is 345.28 meters.
  Tokens - input: 9216, output: 110
  HTML summary saved to: task_SKILL_TASK_003.html


Generating SKILL solutions: 100%|██████████████████████████████| 3/3 [02:39<00:00, 53.16s/it]

Generation completed.
Total input tokens: 24080
Total output tokens: 508
Results directory: D:\code\GeoBenchX\GeoBenchX\skill_test\results\2026-02-03_08-03_gpt-4o_temp0.0





## 5. 查看结果摘要

In [5]:
results_path = run_dir / output_filename
print(f"Results saved to: {results_path}")
print(f"Metadata: {tasks.metadata}")
solved = sum(1 for t in tasks if t.generated_solution is not None)
print(f"Tasks with generated solutions: {solved}/{len(tasks)}")


Results saved to: D:\code\GeoBenchX\GeoBenchX\skill_test\results\2026-02-02_07-35_gpt-4o_temp0.0\skill_generated_solutions.json
Metadata: {'author': 'Codex-SKILL', 'notes': 'Skill-oriented tasks leveraging GeoSpatialProcessingSkill wrappers.', 'model': 'gpt-4o', 'temperature': 0.0, 'generated_at': '2026-02-02T07:35:53.455873Z', 'total_input_tokens_for_generation': 16690, 'total_output_tokens_for_generation': 318}
Tasks with generated solutions: 2/2
