This is the official repository for NESTFUL.
- Paper Title: NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
- Link: https://arxiv.org/abs/2409.03797
We have shared the NESTFUL evaluation set under data dir.
executable: contains data and spec with necessary information to execute them through RapidAPI.non-executable: contains the nested sequencing data from SGD and GLAIVE that are hand-picked by human annotators from data synthetically generated using an LLM.