In [7]:
# Remember to put your API keys in .env
import dotenv
dotenv.load_dotenv()

# Or, you can set the API keys directly
# import os
# os.environ["OPENAI_API_KEY"] = "your_api_key"

True

In [8]:
from octotools.solver import construct_solver

# Set the LLM engine name
llm_engine_name = "gpt-4o"

# Construct the solver
solver = construct_solver(
    llm_engine_name=llm_engine_name, 
    enabled_tools=["Generalist_Solution_Generator_Tool", "Image_Captioner_Tool", "Object_Detector_Tool"],
    verbose=True)


==> Initializing octotools...
Enabled tools: ['Generalist_Solution_Generator_Tool', 'Image_Captioner_Tool', 'Object_Detector_Tool']
LLM engine name: gpt-4o

==> Setting up tools...
Loading tools and getting metadata...
Updated Python path: ['/root/Projects/octotools', '/root/Projects/octotools/octotools', '/root/Projects/octotools', '/root/Projects/octotools/octotools', '/root/Projects/octotools', '/root/Projects/octotools/octotools', '/opt/conda/envs/octotools/lib/python310.zip', '/opt/conda/envs/octotools/lib/python3.10', '/opt/conda/envs/octotools/lib/python3.10/lib-dynload', '', '/opt/conda/envs/octotools/lib/python3.10/site-packages', '/root/Projects/octotools', '/opt/conda/envs/octotools/lib/python3.10/site-packages/setuptools/_vendor', '/tmp/tmph0ckjxdz']

==> Attempting to import: tools.generalist_solution_generator.tool
Found tool class: Generalist_Solution_Generator_Tool
Metadata for Generalist_Solution_Generator_Tool: {'tool_name': 'Generalist_Solution_Generator_Tool', 'too

In [9]:
# Solve the user query
output = solver.solve(question="How many baseballs are there?", image_path="baseball.png")


==> 🔍 Received Query: How many baseballs are there?

==> 🖼️ Received Image: baseball.png

==> 🐙 Reasoning Steps from OctoTools (Deep Thinking...)

==> 🔍 Step 0: Query Analysis

Concise Summary: The query asks to determine the number of baseballs in the provided image.

Required Skills:
Image analysis and object counting skills are needed to accurately identify and count the baseballs in the image.

Relevant Tools:
Object_Detector_Tool: This tool can detect and count the baseballs in the image by identifying objects labeled as 'baseball.'

Additional Considerations:
Ensure the confidence threshold is set appropriately to accurately detect all baseballs. Consider using the 'base' model size for better accuracy if needed.
[Time]: 5.38s

==> 🎯 Step 1: Action Prediction (Object_Detector_Tool)

[Context]: Image path: "baseball.png"
[Sub Goal]: Detect and count the number of baseballs in the image "baseball.png" using the Object_Detector_Tool with appropriate settings for confidence threshol

In [10]:
print(output["final_output"])

### 1. Summary:
The query was to determine the number of baseballs in the provided image. Using an object detection tool, a total of 20 baseballs were identified in the image.

### 2. Detailed Analysis:
- **Step 1: Tool Utilization**
  - **Tool Used:** Object_Detector_Tool
  - **Purpose:** To detect and count the number of baseballs in the image "baseball.png."
  - **Execution:** The tool was executed with settings to identify objects labeled as "baseball."
  - **Results:** The tool detected 20 baseballs with varying confidence scores ranging from 0.64 to 0.69.

- **Step 2: Result Compilation**
  - Each detected baseball was associated with a bounding box and a confidence score.
  - The results were compiled to ensure all detected objects were indeed baseballs.

### 3. Key Findings:
- A total of 20 baseballs were detected in the image.
- The confidence scores for the detections were consistently high, indicating reliable identification.

### 4. Answer to the Query:
The image contains 2

In [11]:
print(output["direct_output"])

To determine the number of baseballs in the image, the following steps were taken:

1. **Image Analysis**: The image was analyzed using an object detection tool to identify and count the baseballs.

2. **Detection Process**: The tool was set to detect objects labeled as 'baseball' with an appropriate confidence threshold.

3. **Results**: The tool identified and counted 20 baseballs in the image.

**Conclusion**: There are 20 baseballs in the image.


In [12]:
print(f"Step count: {output['step_count']} step(s)")
print(f"Execution time: {output['execution_time']} seconds")

Step count: 1 step(s)
Execution time: 20.22 seconds
