Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/autopdl.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ An AutoPDL configuration file describes the state-space and parameters for the s
--8<-- "./examples/optimizer/grammar_correction.yaml"
```

Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. In general, `dataset` could be a string pointing to Huggingface dataset (that would then be automatically downloaded). `demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function.
Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits.
`demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function.

`initial_validation_set_size` is the initial size of the validation set (i.e., the number of tests used initially to validate candidates). `max_validation_set_size` indicates the maximum to which this validation set will grow. For more details on the successive halving algorithm used in AutoPDL see [here](https://arxiv.org/abs/2504.04365). `max_test_set_size: 10` is the maximum of the test set used to evaluate at the end of the evaluation run. `num_candidates` indicates the number of candidates to consider (sampled randomly). `parallelism` indicates the level of parallelism used by the optimizer.

Expand Down