# Parallel Inference with VLLM

In this example, we will be using ``Qwen/Qwen2.5-VL-72B-Instruct``. It's a very large model, and with the default settings, it takes about 310GB of VRAM. The following notebook was tested using 4xH100 GPUs.

## Set up
First, we need to create a vllm config file. You can set any vllm parameter in the config file. For a detailed description of what could be set, please refer to the VLLM documenation: [https://docs.vllm.ai/en/v0.7.3/serving/openai_compatible_server.html#](https://docs.vllm.ai/en/v0.7.3/serving/openai_compatible_server.html#)

Note: Don't change the host and port parameters, or octotools will not be able to talk to the LLM :(

``vllm_config.yaml`` looks like this. We need to specify the model we are running, and the tensor parallel size.
```yaml
model: Qwen/Qwen2.5-VL-72B-Instruct
tensor-parallel-size: 4
```

Now we can run our pipeline:

In [1]:
from octotools.solver import construct_solver

# Set the LLM engine name
model_name = "Qwen/Qwen2.5-VL-72B-Instruct"
llm_engine_name = f"vllm-{model_name}"

# Construct the solver
solver = construct_solver(
    llm_engine_name=llm_engine_name, 
    enabled_tools=["Generalist_Solution_Generator_Tool", "Image_Captioner_Tool", "Object_Detector_Tool"],
    verbose=True,
    vllm_config_path="vllm_config.yaml")


==> Initializing octotools...
Enabled tools: ['Generalist_Solution_Generator_Tool', 'Image_Captioner_Tool', 'Object_Detector_Tool']
LLM engine name: vllm-Qwen/Qwen2.5-VL-72B-Instruct

==> Setting up tools...
Loading tools and getting metadata...
Updated Python path: ['/workspace/new/octotools', '/workspace/new/octotools/octotools', '/workspace/new/octotools/examples/notebooks', '/root/miniforge3/envs/oct/lib/python310.zip', '/root/miniforge3/envs/oct/lib/python3.10', '/root/miniforge3/envs/oct/lib/python3.10/lib-dynload', '', '/root/miniforge3/envs/oct/lib/python3.10/site-packages', '__editable__.octotoolkit-0.3.0.finder.__path_hook__']

==> Attempting to import: tools.generalist_solution_generator.tool
Found tool class: Generalist_Solution_Generator_Tool
Metadata for Generalist_Solution_Generator_Tool: {'tool_name': 'Generalist_Solution_Generator_Tool', 'tool_description': 'A generalized tool that takes query from the user as prompt, and answers the question step by step to the best 

  from .autonotebook import tqdm as notebook_tqdm


INFO 05-26 03:54:54 [__init__.py:239] Automatically detected platform cuda.


2025-05-26 03:54:55,803	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


Error instantiating Image_Captioner_Tool: Connection error.

==> Attempting to import: tools.object_detector.tool
CUDA_HOME is not set
Found tool class: Object_Detector_Tool
Metadata for Object_Detector_Tool: {'tool_name': 'Object_Detector_Tool', 'tool_description': 'A tool that detects objects in an image using the Grounding DINO model and saves individual object images with empty padding.', 'tool_version': '1.0.0', 'input_types': {'image': 'str - The path to the image file.', 'labels': 'list - A list of object labels to detect.', 'threshold': 'float - The confidence threshold for detection (default: 0.35).', 'model_size': "str - The size of the model to use ('tiny' or 'base', default: 'tiny').", 'padding': 'int - The number of pixels to add as empty padding around detected objects (default: 20).'}, 'output_type': 'list - A list of detected objects with their scores, bounding boxes, and saved image paths.', 'demo_commands': [{'command': 'execution = tool.execute(image="path/to/image.p

In [2]:
# Solve the user query
output = solver.solve(question="How many baseballs are there?", image_path="baseball.png")


==> 🔍 Received Query: How many baseballs are there?

==> 🖼️ Received Image: baseball.png

==> 🐙 Reasoning Steps from OctoTools (Deep Thinking...)

==> 🔍 Step 0: Query Analysis

### Analysis of the Query and Accompanying Inputs

#### Summary of Main Points and Objectives
The query asks for the total number of baseballs present in the provided image (`baseball.png`). The image shows four buckets, each containing several baseballs. The objective is to count all the baseballs visible in the image.

#### Required Skills
1. **Image Analysis**: The ability to analyze the image and identify distinct objects (in this case, baseballs) is crucial. This involves recognizing patterns and distinguishing between similar objects.
   - *Explanation*: This skill ensures accurate identification and counting of baseballs in the image.

2. **Object Counting**: The capability to count the number of identified objects precisely.
   - *Explanation*: After identifying the baseballs, this skill helps in provid

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Device set to use cuda


In [5]:
print(output["final_output"])

### Summary:
The query asks about the number of baseballs present in the image. The Object_Detector_Tool was used to detect and count the baseballs in the provided image. The tool identified 20 baseballs across different positions within the image.

### Detailed Analysis:
1. **Tool Execution**:
   - **Tool Used**: Object_Detector_Tool
   - **Purpose**: To detect and count the number of baseballs in the image.
   - **Key Results**: The tool identified 20 baseballs in various locations within the image.

2. **Step-by-Step Process**:
   - The Object_Detector_Tool was applied to the image "baseball.png".
   - The tool detected multiple instances of baseballs, each with a confidence score above 0.6.
   - The detected baseballs were saved as separate images for reference.

3. **Contribution to Query**:
   - The detection and counting process helped identify the total number of baseballs present in the image.
   - Each detected baseball was confirmed by its position and confidence score, ensu

In [6]:
print(output["direct_output"])

There are 20 baseballs in the image.


In [7]:
print(f"Step count: {output['step_count']} step(s)")
print(f"Execution time: {output['execution_time']} seconds")

Step count: 1 step(s)
Execution time: 7.19 seconds
