TSL LLM Benchmark

Code gen models struggle with long context generation, and have been shown to perform better when using code snippets to condition output. We propose a novel approach that uses LLMs to generate TSL specs which are then synthesized to code. The LLM then uses the synthesized code as a seed code to generate the desired program.

Benchmark Overview

We propose a set of benchmarks to test the capability of LLMs in generating TSL specs for reactive synthesis. The benchmarks use a simple TSL spec to generate complex state machines, and are designed to test the ability of LLMs to generate specs as opposed to generating entire state machines. In this way performance of long context high risk code gen applications can be simplified and made more transparent. The snake death is due to user error... The final death is due to illegal keypresses and not a bug.

Ball	Game of Life	Vending Machine

Snake Game	Space Invaders (ship only)	Rotating Cube

Expanding Cube

Using the benckmarks and overview

Each folder contains a set of files that make up a benchmark. Call run.py from the main dir to walk through the process of a benchmark.

Demo:

File Organization

Root Directory: The files in the root directory are used for the generation of any state machine. For each particular state machine, there is a folder with the model-specific files. The state machines created each serve as a benchmark.

Impl_template.prompt: The template to be filled in by wrapper_template.html and Headers.txt to create Impl.prompt.
- Impl_withoutFunctions_template.prompt: The template used when there is no functions under the 'Functions:' section in Headers.txt
run.py: Runs the program. Takes files from a benchmark folder which aren't in the computed folder and creates the files of the computed folder.
shotPrompt.txt: A text file which provides documentation for TSL and NL->TSL examples. This can help the model with ICL and improves NL to TSL translation.
Spec_template.prompt: The prompt template which is used as the query to the LLM in order to generate the TSL specification.
- Spec_withoutAssumptions_template.prompt: The template used when there is no assumptions under the 'Assumptions:' section in NL.txt
- Spec_withoutFunctions_template.prompt: The template used when there is no functions under the 'Functions:' section in Headers.txt
- Spec_withoutFA_template.prompt: The template used when there is no functions and assumptions under the 'Functions:' and 'Assumptions:' sections in Headers.txt and NL.txt

Benchmark Folders: The files within each folder (e.g. Ball, GameOfLife, etc.) are used to create that benchmark.

NL.summary.txt: The natural language, high-level summary of the benchmark.
NL.txt: The natural language description of the benchmark with a clear list of requirements (Assumptions & Guarantees).
Headers.txt: The function and predicate term header definitions.
wrapper_template.html: The template html file that the LLM will implement from the Impl.js and Synth.js files that it generates.
computed: This folder contains files generated by the LLM (this is where the benchmark part comes in).
- Spec.tsl: The TSL specification that the LLM generates.
- Spec.prompt: The prompt which is used to query the LLM for a TSL specification.
- Impl.prompt: The prompt which is used to query the LLM for a javascript implementation of the function and predicates.
- Synth.js: The javascript translation from the LLM generated TSL specification. This translation is performed by the TSL API.
- <BENCHMARK_NAME>.html: The LLM generated html implementation.

Benchmark Walkthrough

NL.txt, NL.summary.txt, and Headers.txt are handmade and used to fill in Spec_template.prompt and create Spec.prompt, which is fed into the LLM.
The LLM outputs Spec.tsl, its formulation of TSL.
This TSL specification is passed into the TSL API and a javascript translation is stored in Synth.js.
The handmade Headers.txt is used to fill in Impl_template.prompt and create Impl.prompt, which is fed into the LLM.
The LLM outputs Impl.js, its javascript implementation of the functions and predicates.
Finally, using wrapper_template.html, Synth.js, and Impl.js, the LLM fills in <BENCHMARK_NAME>.html for the final benchmark implementation.
OPTIONAL: Combine steps 4-6.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.vscode		.vscode
Ball		Ball
Code_Feedback		Code_Feedback
Counter		Counter
Cube_Expansion		Cube_Expansion
Cube_Rotation		Cube_Rotation
Cube_and_Polygon		Cube_and_Polygon
GameOfLife		GameOfLife
Heating		Heating
Notes		Notes
Old_Examples		Old_Examples
Shades		Shades
Simple		Simple
Trucks		Trucks
Vending		Vending
__pycache__		__pycache__
invaders		invaders
llm_testing		llm_testing
media		media
results/by_benchmark		results/by_benchmark
.gitignore		.gitignore
Impl_template.prompt		Impl_template.prompt
Impl_withoutFunctions_template.prompt		Impl_withoutFunctions_template.prompt
README		README
README.md		README.md
Spec_onlyNlAndSummary_template.prompt		Spec_onlyNlAndSummary_template.prompt
Spec_onlySummaryAndHeaders_template.prompt		Spec_onlySummaryAndHeaders_template.prompt
Spec_template.prompt		Spec_template.prompt
Spec_withoutAssumptions_template.prompt		Spec_withoutAssumptions_template.prompt
Spec_withoutFA_template.prompt		Spec_withoutFA_template.prompt
Spec_withoutFunctions_template.prompt		Spec_withoutFunctions_template.prompt
benchmarks.json		benchmarks.json
collect_results.py		collect_results.py
lopstr		lopstr
openai_helper.py		openai_helper.py
results.csv		results.csv
review_results.py		review_results.py
run.py		run.py
shotPrompt.txt		shotPrompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSL LLM Benchmark

Benchmark Overview

Using the benckmarks and overview

Demo:

File Organization

Benchmark Walkthrough

About

Releases

Packages

Contributors 5

Languages

Barnard-PL-Labs/TSL_LLM_Benchmark

Folders and files

Latest commit

History

Repository files navigation

TSL LLM Benchmark

Benchmark Overview

Using the benckmarks and overview

Demo:

File Organization

Benchmark Walkthrough

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages