diff --git a/.gitignore b/.gitignore index 29771e25e..541d7bd93 100644 --- a/.gitignore +++ b/.gitignore @@ -154,6 +154,9 @@ pdl-live/package-lock.json # Demo files pdl-rag-demo.db test.jsonl +train.jsonl +validation.jsonl +experiments/ # Built docs _site diff --git a/README.md b/README.md index bb0bc3c4a..d6f19467f 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,12 @@ To install the `pdl` command line tool: pip install prompt-declaration-language ``` +## What's New + +Check out AutoPDL, PDL's prompt optimizer tool [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR)! AutoPDL can be used to optimize any part of a PDL program. This includes few-shots examples and textual prompts, but also prompting patterns. It outputs an optimized PDL program with optimal values. + +For a tutorial on how to use AutoPDL, see [AutoPDL](https://ibm.github.io/prompt-declaration-language/autopdl/) + ## Example Program: A Basic LLM Call PDL GUI diff --git a/docs/autopdl.md b/docs/autopdl.md index 3019044a4..18cf3cc19 100644 --- a/docs/autopdl.md +++ b/docs/autopdl.md @@ -7,7 +7,11 @@ hide: # AutoPDL Tutorial -The following sections show how to use the AutoPDL optimizer introduced by [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR) in "AutoPDL: Automatic Prompt Optimization for LLM Agents" ([arXiv](https://arxiv.org/abs/2504.04365)), to produce optimized PDL programs for specific tasks. Please ensure PDL was installed with extras e.g. +This tutorial describes how to use AutoPDL, PDL's prompt optimizer tool [Spiess et al. (2025)](https://openreview.net/forum?id=CAeISyE3aR). AutoPDL can be used to optimize any part of a PDL program. This includes few-shots examples and textual prompts, but also prompting patterns. It outputs an optimized PDL program with optimal values. + +## Installing AutoPDL + +Please ensure PDL was installed with extras e.g. ``` { .bash .copy .annotate linenums="1" } pip install 'prompt-declaration-language[all]' @@ -17,172 +21,67 @@ cd prompt-declaration-language pip install -e '.[all]' ``` -To optimize a PDL program, we need the program, an optimizer configuration, a dataset, and an _evaluator_. An evaluator is a Python subclass of `OptimizerEvaluator` that evaluates a candidate, which is a generated configuration instance consisting of e.g. fewshot examples. The evaluator class follows this structure: - -```python title="src/pdl/optimize/optimizer_evaluator.py" linenums="1" -class OptimizerEvaluator(Thread): - """Evaluates a candidate (configuration, i.e. fewshots, style) against **one** test example.""" - - def __init__( - self, - pdl_program: Program, - example: dict, - candidate: dict, - index: int, - timeout: int, - yield_output: bool, - config: OptimizationConfig, - cwd: Path, - answer_key: str = "answer", - ) -> None: - super().__init__() - self.pdl_program = pdl_program - ... - - def get_scope(self) -> ScopeType: - """ - Constructs a PDL scope for the candidate, - can take self.candidate and self.config into account - """ - - def extract_answer(self, document: str) -> Any: - """ - Extracts the final answer from the PDL result document, - i.e. the string the PDL program returns - """ - - def answer_correct(self, document: str, answer: Any, truth: Any) -> bool: - """ - Checks the extracted answer against the groundtruth value, - in self.example[self.answer_key] - """ -``` +### Writing a PDL program to optimize -Let's go through an example for `GSM8K`. Our PDL program uses different prompt patterns from the prompt library, and the variables `prompt_pattern`, `question`, `model`, and `demonstrations` are inserted at runtime by the evaluator. +The first step in using AutoPDL is to write a PDL program that has free variables. Consider for example, the following PDL program, which queries an LLM to correct a sentence with grammatical errors ([file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction.pdl)): -```yaml title="examples/optimizer/gsm8k.pdl" linenums="1" ---8<-- "./examples/optimizer/gsm8k.pdl" +```yaml +--8<-- "./examples/optimizer/grammar_correction.pdl" ``` -We write a configuration file for the optimizer, and save it as `gsm8k_optimizer_config.yml`. See `src/pdl/optimize/config_parser.py` for all fields. Please note that this example uses the `watsonx` inference service, so an API key is required, although you can also use a local model or any other inference service. +This program starts with a definition section. Note that a `defs` section is necessary. This is followed by a `lastOf` sequence (a list of blocks to be executed where the result of the last block is returned as the result of the whole sequence). First, the program establishes some demonstrations obtained from a `demonstrations` variable. The `for` loop at lines 5 to 10 ensures that all demonstrations are formatted in a consistent way. On lines 11 to 16 the program formulates a prompt to correct a sentence stored in variable `input`. Lines 17 through 21 show a model call where the model id is given by variable `model`. Finally, lines 23 through 28 check if variable `verify` is set to `true`. If so, it makes another model to verify the previous response and to produce a new one if needed. -``` { .yaml .copy .annotate title="examples/optimizer/gsm8k_optimizer_config.yml" linenums="1" } ---8<-- "./examples/optimizer/gsm8k_optimizer_config.yml" -``` +Notice that variables `input`, `model`, `demonstrations`, `verify` are not defined. The first of these is an instance variable that will help in holding different instances when the optimizer is running. The rest of them are parameters to be optimized. We can pick among different models, different demonstrations, and especially different prompting patterns. PDL supports first-class functions, so the program could be made to pick the optimal function to be used, thereby choosing the prompting pattern. In this example, finding an optimal value for `verify` will determine whether it's best to call the model once or twice. + +In addition to the PDL program, AutoPDL also needs a dataset and a loss function as inputs. These will be used to perform the optimization, and as a source of demonstrations. In this example, we can use [process_grammar_correction.py](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/process_grammar_correction.py) to obtain a dataset split into train/validation/test. The train split will be used to draw instances and demonstrations, the validation for checking during the optimization, and test to evaluate and obtain a final score at the end of the optimization run. -```python title="examples/optimizer/gsm8k_evaluator.py" linenums="1" ---8<-- "./examples/optimizer/gsm8k_evaluator.py" +To obtain this dataset simply run: ``` +python process_grammar_correction.py +``` + +The final ingredient needed is a configuration file as explained in the next section. + +### Writing a configuration file -We can see an example of a script to run the optimization process in `examples/optimizer/optimize.py`. -Usage: +An AutoPDL configuration file describes the state-space and parameters for the search. In this example, the configuration is given in the following [file](https://github.com/IBM/prompt-declaration-language/blob/main/examples/optimizer/grammar_correction_example.yml): -```text -python optimize.py optimize -h -usage: optimize.py optimize [-h] --config CONFIG --dataset-path DATASET_PATH [--experiments-path EXPERIMENTS_PATH] - [--yield_output | --no-yield_output] [--dry | --no-dry] - pdl_file +```yaml +--8<-- "./examples/optimizer/grammar_correction_example.yml" ``` -We also need a dataset to optimize against, with `train`, `test`, and `validation` splits. To produce such a dataset, we can use HuggingFace Datasets `load_dataset` and `save_to_disk`. This example requires the dataset to have columns `question`, `reasoning`, and `answer`, which can be created from the original `openai/gsm8k` dataset. +Field `pdl_path` is the path to the PDL program to optimize. `dataset` points to the dataset to be used. In this case, it's an object with paths for train/validation/test splits. In general, `dataset` could be a string pointing to Huggingface dataset (that would then be automatically downloaded). `demonstrations_variable_name` gives the name of the PDL variable that will hold the demonstrations in the optimized program. `demonstration_columns` indicates the field names in the dataset that will be used to create demonstrations, and `instance_columns` are those fields that will be used to formulate an instance query (see the query in the PDL program above, which uses `input`). The `groundtruth_column` holds the field with the ground truth (in this case `output`). `eval_pdl` is the path of the PDL program that encapsulates the loss function. -We provide three scripts in `examples/optimizer` to create datasets, including the rule based agentic trajectories. These are `process_gsm8k.py`, `process_fever.py`, and `process_mbpp.py`. They load the original datasets, process them, and save them to disk in the required format. Dataset specific instructions may be found in the respective script files. Note that the scripts create a folder named `var` in the current directory, which contains the processed dataset in a format that can be used by the optimizer. Therefore, they should be run in the root of the PDL repository. +`initial_validation_set_size` is the initial size of the validation set (i.e., the number of tests used initially to validate candidates). `max_validation_set_size` indicates the maximum to which this validation set will grow. For more details on the successive halving algorithm used in AutoPDL see [here](https://arxiv.org/abs/2504.04365). `max_test_set_size: 10` is the maximum of the test set used to evaluate at the end of the evaluation run. `num_candidates` indicates the number of candidates to consider (sampled randomly). `parallelism` indicates the level of parallelism used by the optimizer. -Let's run the GSM8K dataset processing script: +Last but not least, `variables` indicates the domain of each variable that needs to be tuned. In this case, `model` can be either an Ollama Granite model or gpt-oss. `num_demonstration` is a special variable that the user can set to indicate how many demonstrations to consider. In this case, zero-shot is also included. Finally, the domain of variable `verify` can be `true` or `false`. -``` { .bash .copy .annotate linenums="1" } -python examples/optimizer/process_gsm8k.py -``` +Notice that variable `input` in the PDL program is not given a domain. This is because it will hold the different instances that will be evaluated (it was included in the `instance_columns` field). -Which should save the processed dataset in `var/gsm8k_trajectified` and output something like: - -```text -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 557195.73 examples/s] -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 363559.64 examples/s] -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 271472.56 examples/s] -Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 71242.31 examples/s] -Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 68826.30 examples/s] -Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 22520.85 examples/s] -Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 18186.53 examples/s] -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 6449/6449 [00:00<00:00, 698328.77 examples/s] -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 232468.57 examples/s] -Saving the dataset (1/1 shards): 100%|█████████████████████████████████████████████████████████████████| 1024/1024 [00:00<00:00, 413375.10 examples/s] -DatasetDict({ - train: Dataset({ - features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part', 'traj_keys', 'traj_values', 'rewoo_traj_keys', 'rewoo_traj_values'], - num_rows: 6449 - }) - test: Dataset({ - features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'], - num_rows: 1319 - }) - validation: Dataset({ - features: ['question', 'answer', 'reasoning', 'raw_answer', 'answer_part'], - num_rows: 1024 - }) -}) -``` +For a complete list of available fields in the configuration file, see [file](https://github.com/IBM/prompt-declaration-language/blob/main/src/pdl/optimize/config_parser.py): -Finally, we can run the example like so: -``` { .bash .copy .annotate linenums="1" } -cd examples/optimizer -python optimize.py optimize --config gsm8k_optimizer_config.yml --dataset-path ../../var/gsm8k_trajectified -``` +We are ready to run the optimizer! + +### Running AutoPDL + +To run the optimizer, execute the following command: -This will report details about the optimization process, such as the number of candidates evaluated. The output will look something like this: - -```text - PDL Optimizer pdl_optimizer.py:336 - ┌──────────────────────────────┬─────────────────────────────────────────────┐ - │ Config combinations │ 9 │ - │ Max candidates │ 100 │ - │ Num. candidates │ 100 │ - │ Starting validation set size │ 2 │ - │ Max validation set size │ 10 │ - │ Num. iterations │ 7 │ - │ Total evaluations │ 1,200 │ - │ Num. threads │ 1 │ - │ Validation set multiplier │ 2 │ - │ Shuffle validation set │ False │ - │ Budget policy │ None │ - ├──────────────────────────────┼─────────────────────────────────────────────┤ - │ model │ ['watsonx/meta-llama/llama-3-2-3b-instruct… │ - │ prompt_pattern │ ['cot', 'react', 'rewoo'] │ - │ num_demonstrations │ [0, 3, 5] │ - └──────────────────────────────┴─────────────────────────────────────────────┘ - Iteration pdl_optimizer.py:419 - ┌─────────────────────┬─────┐ - │ Index │ 0 │ - │ Validation set size │ 2 │ - │ Num. candidates │ 100 │ - └─────────────────────┴─────┘ - Evaluation pdl_optimizer.py:601 - ┌────────────────────────┬──────────────────────────────────────────┐ - │ Test set size │ 2 │ - ├────────────────────────┼──────────────────────────────────────────┤ - │ model │ watsonx/meta-llama/llama-3-2-3b-instruct │ - │ prompt_pattern │ cot │ - │ num_demonstrations │ 0 │ - │ uuid │ enl0ertp │ - │ demonstrations_indices │ 0 │ - │ demonstrations │ 0 │ - └────────────────────────┴──────────────────────────────────────────┘ - Running without parallelism util.py:74 - 0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1,200 [ 0:00:01 < -:--:-- , ? it/s ] +``` +pdl-optimize -c grammar_correction_example.yml ``` -Note that it is not unusual to observe PDL exceptions during the optimization process. +After a while, AutoPDL creates a new file `optimized_grammar_correction.pdl` with definitions for all the free variables. It determined that gtp-oss is the better model for the task at hand, and that `verify` is best set to False. The optimized program contains the selection of demonstrations. To run this program add a definition for `input`: -```text -[15:44:14] Type errors during spec checking: -../../contrib/prompt_library/ReAct.pdl:0 - should be an object -../../contrib/prompt_library/ReAct.pdl:0 - Type errors during spec checking: -../../contrib/prompt_library/ReAct.pdl:0 - should be an object -Retrying: False -Runtime FAILED and took seconds: 10.21 +``` +defs: + ... + input: This sentence have an error. ``` -Such exceptions, here for example in `ReAct.pdl`, are caused by the _typed_ model call in `ReAct.pdl:98`. If the model output does not result in a parsable JSON that matches the expected type `{ name: string, arguments: object }`, the PDL interpreter raises an exception. +To run the optimized program, execute the command: +``` +pdl optimized_grammar_correction.pdl +``` -Once the process is complete, a file `optimized_gsm8k.pdl` is written in same directory as the source PDL file. This file contains the optimal configuration and is directly executable by the standard PDL interpreter. A log of the optimization process is written to `experiments/` by default. +A log of the optimization process is written to experiments/ by default. diff --git a/examples/optimizer/bea19.pdl b/examples/optimizer/bea19.pdl deleted file mode 100644 index 291bab410..000000000 --- a/examples/optimizer/bea19.pdl +++ /dev/null @@ -1,17 +0,0 @@ -lastOf: - - "Here are examples of grammatically incorrect sentences and their corrected versions:\n\n" - - for: - example: ${ demonstrations } - repeat: - text: "${ example.broken } -> ${ example.sentence }" - join: - with: "\n\n" - - "Correct the following sentence:\n\n${ broken }\nHere's the corrected sentence:\n\n" - - model: ${ model } - parameters: - max_tokens: 1024 - temperature: 0 - stop: - - "<|endoftext|>" - - "Question:" - include_stop_sequence: false \ No newline at end of file diff --git a/examples/optimizer/bea19_example.yml b/examples/optimizer/bea19_example.yml deleted file mode 100644 index 7698bb6af..000000000 --- a/examples/optimizer/bea19_example.yml +++ /dev/null @@ -1,37 +0,0 @@ -pdl_path: examples/optimizer/bea19.pdl # Path to the PDL file to optimize -# benchmark: gretel-math # Name our benchmark -dataset: - train: bea19_jsonl/train.jsonl # Path to the training split in JSONL format - test: bea19_jsonl/test.jsonl # Path to the test split in JSONL format - validation: bea19_jsonl/validation.jsonl # Path to the validation split in JSONL format - -demonstrations_variable_name: demonstrations # variable name to insert demonstrations into -demonstration_columns: - - broken # column name for the question in the dataset - - sentence # column name for the answer in the dataset - -instance_columns: - - broken # column name for the question in the dataset - -groundtruth_column: sentence # column name for the ground truth in the dataset - -eval_pdl: examples/optimizer/eval_levenshtein.pdl # Path to the PDL file for evaluation - -budget: null # Set a budget, can be number of iterations, or a duration string e.g. "2h" -budget_growth: double # double validation set size each iteration -# or to_max: reach max_test_set_size by final iteration -initial_test_set_size: 1 # size of test set in first iteration -max_test_set_size: 1 # maximum test set size -num_candidates: 100 # how many candidates to evaluate -parallelism: 1 # how many threads to run evaluations across -shuffle_test: false # shuffling of test set -test_set_name: test # name of test set -train_set_name: train # name of train set -validation_set_name: validation # name of validation set -variables: # define discrete options to sample from - model: # set ${ model } variable - - watsonx/meta-llama/llama-3-2-3b-instruct - num_demonstrations: # overrides num demonstrations above - - 0 - - 3 - - 5 \ No newline at end of file diff --git a/examples/optimizer/grammar_correction.pdl b/examples/optimizer/grammar_correction.pdl new file mode 100644 index 000000000..b0731fad4 --- /dev/null +++ b/examples/optimizer/grammar_correction.pdl @@ -0,0 +1,28 @@ +defs: + max_tokens: 1024 +lastOf: + - "Here are examples of grammatically incorrect sentences and their corrected versions:\n\n" + - for: + example: ${ demonstrations } + repeat: + text: "${ example.input } -> ${ example.output }" + join: + with: "\n\n" + - |+ + Correct the following sentence: + + ${ input } + Here's the corrected sentence: + + - model: ${ model } + def: response + parameters: + max_tokens: ${ max_tokens } + temperature: 0 + + - if: ${ verify } + then: + lastOf: + - Do you think this was a correct answer? If not, generated a correct answer. + - model: ${ model } + else: ${ response } diff --git a/examples/optimizer/grammar_correction.yaml b/examples/optimizer/grammar_correction.yaml new file mode 100644 index 000000000..c22b2f112 --- /dev/null +++ b/examples/optimizer/grammar_correction.yaml @@ -0,0 +1,41 @@ +pdl_path: grammar_correction.pdl # Path to the PDL file to optimize +dataset: + train: grammar_correction_jsonl/train.jsonl # Path to the training split in JSONL format + test: grammar_correction_jsonl/test.jsonl # Path to the test split in JSONL format + validation: grammar_correction_jsonl/validation.jsonl # Path to the validation split in JSONL format + +demonstrations_variable_name: demonstrations # variable name to insert demonstrations into +demonstration_columns: + - input # column name for the question in the dataset + - output # column name for the answer in the dataset + +instance_columns: + - input # column name for the question in the dataset + +groundtruth_column: output # column name for the ground truth in the dataset + +eval_pdl: eval_levenshtein.pdl # Path to the PDL file for evaluation + +#budget: 2h # Set a budget, can be number of iterations, or a duration string e.g. "2h" +#budget_growth: double # double validation set size each iteration. ## +# or to_max: reach max_test_set_size by final iteration +initial_validation_set_size: 2 # size of test set in first iteration +max_validation_set_size: 10 # maximum test set size. +max_test_set_size: 10 +num_candidates: 10 # how many candidates to evaluate +parallelism: 5 # how many threads to run evaluations across +#shuffle_test: false # shuffling of test set +#test_set_name: test # name of test set +#train_set_name: train # name of train set +#validation_set_name: validation # name of validation set +variables: # define discrete options to sample from + model: # set ${ model } variable + - ollama_chat/granite3.3:8b + - ollama_chat/gpt-oss:20b + num_demonstrations: # overrides num demonstrations above + - 0 + - 3 + - 5 + verify: + - true + - false \ No newline at end of file diff --git a/examples/optimizer/optimized_grammar_correction.pdl b/examples/optimizer/optimized_grammar_correction.pdl new file mode 100644 index 000000000..a4a3d730b --- /dev/null +++ b/examples/optimizer/optimized_grammar_correction.pdl @@ -0,0 +1,48 @@ +defs: + max_tokens: 1024 + model: ollama_chat/gpt-oss:20b + num_demonstrations: + data: 5 + verify: + data: false + demonstrations: + data: + - input: Related and Entities found using configured use relation direction. and Relation Type. + output: Related Entities found using configured relation direction and Relation Type. + - input: Thanks to Naumann IT Security Consulting's for reporting challenging the XSS got vulnerability. + output: Thanks to Naumann IT Security Consulting for reporting the XSS vulnerability. + - input: Besides he hates school, he is exhausted all the time, has no appetite, he has penalty of violent, depression, was not happy. + output: He hated school, he was exhausted all the time, had no appetite, he had outbreaks of violence, depression, and he was not happy. + - input: If your primary ID does not contain a signature, you can present a supplemental ID with photo and signature or a supple government ID with a photograph, as long as they are in the same name you used when you registerd. + output: If your primary ID does not contain a signature, you can present a supplemental ID with photo and signature or a supplemental government-issued ID with a photograph, as long as they are in the same name you used when you registered. + - input: We want to begin consultatiaon with public-use organisations who are users of these services to help brings experience, knowledge and information on user needs to shape the solution. + output: We want to begin consultations with public sector organisations who are users of these services to help bring experience, knowledge and information on user needs to shape the solution. +lastOf: +- |+ + Here are examples of grammatically incorrect sentences and their corrected versions: + +- for: + example: ${ demonstrations } + repeat: + text: ${ example.input } -> ${ example.output } + join: + with: |2+ + + +- |+ + Correct the following sentence: + + ${ input } + Here's the corrected sentence: + +- def: response + model: ${ model } + parameters: + temperature: 0.0 + max_tokens: ${ max_tokens } +- if: ${ verify } + then: + lastOf: + - Do you think this was a correct answer? If not, generated a correct answer. + - model: ${ model } + else: ${ response } diff --git a/examples/optimizer/process_bea19.py b/examples/optimizer/process_bea19.py deleted file mode 100644 index 3a01b3e61..000000000 --- a/examples/optimizer/process_bea19.py +++ /dev/null @@ -1,33 +0,0 @@ -import json -from pathlib import Path - -from datasets.dataset_dict import DatasetDict -from datasets.load import load_dataset - -# Load dataset -bea19 = load_dataset("juancavallotti/bea-19-corruption") -if not isinstance(bea19, DatasetDict): - raise TypeError(f"Expected bea19 to be a DatasetDict, but got: {type(bea19)}") - -# Create validation split from train (1024 examples) -new_split = bea19["train"].train_test_split(test_size=1024) -bea19["test"] = new_split["test"] - -val_split = new_split["train"].train_test_split() -bea19["train"] = val_split["train"] -bea19["validation"] = val_split["test"] - -# Output dir -out_dir = Path("bea19_jsonl") -out_dir.mkdir(parents=True, exist_ok=True) - - -# Save to JSONL -def save_jsonl(dataset, path: Path) -> None: - with path.open("w") as f: - for item in dataset: - f.write(json.dumps(item) + "\n") - - -for split in ["train", "validation", "test"]: - save_jsonl(bea19[split], out_dir / f"{split}.jsonl") diff --git a/examples/optimizer/process_grammar_correction.py b/examples/optimizer/process_grammar_correction.py new file mode 100644 index 000000000..ae14961cf --- /dev/null +++ b/examples/optimizer/process_grammar_correction.py @@ -0,0 +1,35 @@ +import json +from pathlib import Path + +from datasets.dataset_dict import DatasetDict +from datasets.load import load_dataset + +# Load dataset +grammar_correction = load_dataset("agentlans/grammar-correction") +if not isinstance(grammar_correction, DatasetDict): + raise TypeError( + f"Expected grammar_correction to be a DatasetDict, but got: {type(grammar_correction)}" + ) + +# Create validation split from train (1024 examples) +new_split = grammar_correction["train"].train_test_split(test_size=1024) +grammar_correction["test"] = new_split["test"] + +val_split = new_split["train"].train_test_split() +grammar_correction["train"] = val_split["train"] +grammar_correction["validation"] = val_split["test"] + +# Output dir +out_dir = Path("grammar_correction_jsonl") +out_dir.mkdir(parents=True, exist_ok=True) + + +# Save to JSONL +def save_jsonl(dataset, path: Path) -> None: + with path.open("w") as f: + for item in dataset: + f.write(json.dumps(item) + "\n") + + +for split in ["train", "validation", "test"]: + save_jsonl(grammar_correction[split], out_dir / f"{split}.jsonl") diff --git a/pyproject.toml b/pyproject.toml index 5cfc26587..b107dd749 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -78,6 +78,7 @@ Issues = "https://github.com/IBM/prompt-declaration-language/issues" [project.scripts] pdl = "pdl.pdl:main" pdl-lint = "pdl.pdl_linter:run_linter" +pdl-optimize = "pdl.optimize.pdl_optimizer:run_optimizer" [tool.setuptools_scm] version_file = "src/pdl/_version.py" diff --git a/src/pdl/optimize/pdl_optimizer.py b/src/pdl/optimize/pdl_optimizer.py index 17b4ed011..d6b9ed808 100644 --- a/src/pdl/optimize/pdl_optimizer.py +++ b/src/pdl/optimize/pdl_optimizer.py @@ -1,3 +1,4 @@ +import argparse import itertools import json import logging @@ -11,6 +12,7 @@ from typing import Any import yaml +from datasets import load_dataset from datasets.arrow_dataset import Dataset from datasets.dataset_dict import DatasetDict from duration_parser import parse as parse_duration @@ -20,8 +22,9 @@ from tqdm import TqdmExperimentalWarning from tqdm.rich import tqdm -from pdl.optimize.config_parser import OptimizationConfig +from pdl.optimize.config_parser import JsonlDataset, OptimizationConfig from pdl.optimize.optimizer_evaluator import OptimizerEvaluator +from pdl.optimize.pdl_evaluator import PdlEvaluator from pdl.optimize.util import CandidateResult, TrialOutput, console, execute_threads from pdl.pdl_ast import AdvancedBlockType, DataBlock, Program from pdl.pdl_dumper import dump_program_exclude_internals @@ -763,3 +766,74 @@ def benchmark(self, test_set_size: int, candidate: dict | None = None): self.pbar.close() logger.info("Score: %.4f%%", scores[0].metric * 100) logger.info("Saved exp. log to %s", exp_file) + + +def run_optimizer(): + parser = argparse.ArgumentParser("") + + parser.add_argument( + "--config", + "-c", + help="Optimizer config file", + type=Path, + required=True, + ) + + parser.add_argument( + "--experiments-path", + help="Path where experiment results will be saved", + type=Path, + default=Path("experiments"), + ) + + parser.add_argument( + "--yield_output", + action=argparse.BooleanOptionalAction, + default=False, + ) + + args = parser.parse_args() + + if not args.config.exists(): + print("Config file doesn't exist:", args.config) + sys.exit(1) + + config_text = args.config.read_text() + + try: + config_dict = yaml.safe_load(config_text) + config = OptimizationConfig(**config_dict) + except Exception: + print("Couldn't load config:", args.config) + sys.exit(1) + + if not Path(config.pdl_path).exists(): + print("PDL file doesn't exist:", config.pdl_path) + sys.exit(1) + + # Set up dataset and trial thread based on benchmark + dataset: Any + + if isinstance(config.dataset, (dict, JsonlDataset)): + dataset = load_dataset( + "json", + data_files={ + "train": config.dataset.train, + "validation": config.dataset.validation, + "test": config.dataset.test, + }, + ) + else: + print(f"Unknown dataset: {config.dataset}") + sys.exit(1) + + # Create optimizer instance + optimizer = PDLOptimizer( + dataset=dataset, + trial_thread=PdlEvaluator, + yield_output=args.yield_output, + experiment_path=args.experiments_path, + config=config, + ) + optimizer.run() + return 0 diff --git a/tests/test_examples_run.yaml b/tests/test_examples_run.yaml index 2b9d2d210..e9bd6293c 100644 --- a/tests/test_examples_run.yaml +++ b/tests/test_examples_run.yaml @@ -30,7 +30,8 @@ skip: - examples/optimizer/mbpp.pdl - examples/optimizer/fever.pdl - examples/optimizer/gsm8k.pdl - - examples/optimizer/bea19.pdl + - examples/optimizer/grammar_correction.pdl + - examples/optimizer/optimized_grammar_correction.pdl - examples/optimizer/eval_levenshtein.pdl - examples/requirements/email.pdl - examples/skeleton-of-thought/tips.pdl