[BFCL] Adds support for parallel inference and batching #498

TikZSZ · 2024-07-02T21:19:17Z

Parallel Inference Support for berkeley-function-call-leaderboard

This PR adds support for running berkeley-function-call-leaderboard inference in parallel, reducing running time by 4x or more depending on --batch-size.

Changes

Modifies `berkeley-function-call-leaderboard/model_handler/handler.py`

Modified write function to make it async using aiofiles
Added sort_results function to sort the results based on idx after each individual test_cate is over
sort_results function returns the indices after sorting, supporting resuming functionality

Modifies `berkeley-function-call-leaderboard/openfunctions_evaluation.py`

Added --batch-size arg, defaults to 1 -> controls number of parallel requests
Refactored processing and result-writing logic to fetch_and_process function
Added make_async function to wrap sync functions as async (used for handler.inference)
Added nested progress bar for tracking iterations
Refactored core logic for processing under main function
Implemented proper resume support, replacing num_existing_lines

Resume Support

Improved resuming functionality in async code:

Addresses potential issues where some test cases could complete earlier than others, leading to inconsistent resume.
Filters already saved test cases instead of using a simple line count
For saved test cases, adds None as a placeholder, which becomes the conditional for skipping test cases
This approach ensures consistent resuming even if execution is interrupted mid-test
This is really important for models that are expensive to run, where re-running the whole test again is undesirable
A test log screenshot is also uploaded bottom of this PR to confirm that it works as intended

Note: This PR automatically wraps inference calls as async to minimize code changes, but the calls are still synchronous and will block the event loop so we use loop.run_in_executor to run the calls in parallel on multiple threads instead it uses min(32, os.cpu_count() + 4) threads by default. If handlers are made async in future they would continue to work like normal async code

Testing

Tested on a custom OpenAI-compatible model running on vllm:

Completed simple test in 40 seconds
Hardware: RTX4090
Model: LLAMA 8B BF16
Batch size: 15-20

Benchmark Results

Debug Logs for new `Resume System`

ShishirPatil · 2024-07-05T07:19:43Z

Thanks for contributing to the Berkeley Function Calling Leaderboard @TikZSZ ! Appreciate the PR and Welcome! We are currently reviewing and testing this PR

…ch-size

TikZSZ · 2024-07-14T17:54:21Z

@ShishirPatil I've added proper resume support and updated the PR description with more details about the changes, Could you guys please take a look when you have a chance? Thank you!

TikZSZ changed the title ~~Adds support for parallel inference and batching~~ [BFCL] Adds support for parallel inference and batching Jul 2, 2024

TikZSZ force-pushed the parallel_infrence_2 branch from 6364cd3 to 3a35360 Compare July 3, 2024 22:24

TikZSZ added 2 commits July 14, 2024 22:49

Adds async infrence code for parallel infrence and argument for --bat…

ed1ae63

…ch-size

fixes resume not working properly when running batch_size > 1

fe729d7

TikZSZ force-pushed the parallel_infrence_2 branch from 3a35360 to fe729d7 Compare July 14, 2024 17:37

removed debug code

0ccef3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Adds support for parallel inference and batching #498

[BFCL] Adds support for parallel inference and batching #498

TikZSZ commented Jul 2, 2024 •

edited

Loading

ShishirPatil commented Jul 5, 2024

TikZSZ commented Jul 14, 2024 •

edited

Loading

[BFCL] Adds support for parallel inference and batching #498

Are you sure you want to change the base?

[BFCL] Adds support for parallel inference and batching #498

Conversation

TikZSZ commented Jul 2, 2024 • edited Loading

Parallel Inference Support for berkeley-function-call-leaderboard

Changes

Modifies berkeley-function-call-leaderboard/model_handler/handler.py

Modifies berkeley-function-call-leaderboard/openfunctions_evaluation.py

Resume Support

Testing

Benchmark Results

Debug Logs for new Resume System

ShishirPatil commented Jul 5, 2024

TikZSZ commented Jul 14, 2024 • edited Loading

TikZSZ commented Jul 2, 2024 •

edited

Loading

Modifies `berkeley-function-call-leaderboard/model_handler/handler.py`

Modifies `berkeley-function-call-leaderboard/openfunctions_evaluation.py`

Debug Logs for new `Resume System`

TikZSZ commented Jul 14, 2024 •

edited

Loading