From fe50e7a91c7842872e9d23cb4ad41c53d8b8be5e Mon Sep 17 00:00:00 2001 From: Mandana Vaziri Date: Mon, 8 Sep 2025 10:29:05 -0400 Subject: [PATCH] change the autopdl tutorial Signed-off-by: Mandana Vaziri --- docs/autopdl.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/autopdl.md b/docs/autopdl.md index 52d95a612..d8faf6e18 100644 --- a/docs/autopdl.md +++ b/docs/autopdl.md @@ -50,7 +50,8 @@ An AutoPDL configuration file describes the state-space and parameters for the s --8<-- "./examples/optimizer/grammar_correction.yaml" ``` -Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. In general, `dataset` could be a string pointing to Huggingface dataset (that would then be automatically downloaded). `demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function. +Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. +`demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function. `initial_validation_set_size` is the initial size of the validation set (i.e., the number of tests used initially to validate candidates). `max_validation_set_size` indicates the maximum to which this validation set will grow. For more details on the successive halving algorithm used in AutoPDL see [here](https://arxiv.org/abs/2504.04365). `max_test_set_size: 10` is the maximum of the test set used to evaluate at the end of the evaluation run. `num_candidates` indicates the number of candidates to consider (sampled randomly). `parallelism` indicates the level of parallelism used by the optimizer.