Skip to content

Conversation

@YassinNouh21
Copy link
Contributor

Related Issues

Proposed Changes:

The ComponentTool class was incorrectly handling complex types like dictionaries and unions when generating parameter schemas. Specifically:

  1. Fixed the schema generation for dictionary types (Dict[str, Any]) by:

    • Adding proper "type": "object" schema
    • Setting "additionalProperties": true to allow arbitrary key-value pairs
  2. Added proper handling for Union types by:

    • Implementing oneOf schema generation for Union types
    • Correctly handling Union[Dict[str, Any], List[Dict[str, Any]]] type hints
    • Removing redundant descriptions in nested schemas

These changes ensure that tools like HTMLToDocument's parameters are correctly represented in the schema, allowing LLMs to properly understand and use the parameters.

How did you test it?

  1. Added a new comprehensive unit test test_from_component_with_complex_types that verifies:

    • Correct schema generation for dictionary parameters
    • Proper handling of Union types with both dict and list[dict] variants
    • Parameter validation during tool invocation
    • Successful execution with both dictionary and list inputs
  2. Ran all existing tests to ensure no regressions

  3. Verified pre-commit hooks pass including:

    • Type checking
    • Code formatting
    • Linting

Notes for the reviewer

Key areas to review:

  • The _create_property_schema method in component_tool.py where the main changes were made
  • The new test case in test_component_tool.py that verifies the fix
  • The handling of Union types with the oneOf schema generation

Checklist

@YassinNouh21 YassinNouh21 requested a review from a team as a code owner April 24, 2025 12:24
@YassinNouh21 YassinNouh21 requested review from anakin87 and removed request for a team April 24, 2025 12:24
@anakin87 anakin87 requested a review from sjrl April 24, 2025 12:27
f"component's run method."
)
elif python_type is dict or (origin is dict):
schema = {"type": "object", "description": description, "additionalProperties": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anakin87 do you happen to know if by adding additionalProperties here we run the risk of breaking ChatGenerators that don't support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - it might be better to avoid it if not strictly necessary

I found this: https://platform.openai.com/docs/guides/structured-outputs/additionalproperties-false-must-always-be-set-in-objects

Anyway, it might be a good idea to involve @vblagoje who worked much on this, if we are not in a rush

Copy link
Contributor

@sjrl sjrl Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the structured output or if you set the JSON response to strict. But that might be a separate issue we need to open where if structured output or strict is set in OpenAIChatGenerator then it should probably go in and update all additionalProperties to false. I think this would need to be in the ChatGenerator since the ComponenTool (or any Tool) can't know yet if additionalProperties must be set to False.

f"Pydantic models (e.g. {python_type.__name__}) are not supported as input types for "
f"component's run method."
)
elif python_type is dict or (origin is dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are handling dict here now could we also remove dict from the _create_basic_type_schema function?

Comment on lines +363 to +376
# Check the parameter schema
assert "meta" in tool.parameters["properties"]
assert "extraction_kwargs" in tool.parameters["properties"]

# Meta should be oneOf with both object and array options
meta_schema = tool.parameters["properties"]["meta"]
assert meta_schema["description"].startswith("Optional metadata")
assert "oneOf" in meta_schema

# extraction_kwargs should be an object
kwargs_schema = tool.parameters["properties"]["extraction_kwargs"]
assert kwargs_schema["type"] == "object"
assert "additionalProperties" in kwargs_schema
assert kwargs_schema["additionalProperties"] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the additional comments and individual asserts, but could we move update this to a full dict comparison. So

assert tool.parameters["properties"] == {...}

I think that would make it easier to judge at a glance.

Copy link
Contributor Author

@YassinNouh21 YassinNouh21 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better for readability. But no problem, I will do it

@github-actions github-actions bot added the type:documentation Improvements on the docs label Apr 24, 2025
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Comment on lines +444 to +448
if origin is dict and args and args[0] is str and args[1] is Any:
if description and "meta" in description.lower():
schema = {"type": "string", "description": description}
else:
schema = {"type": "object", "description": description, "additionalProperties": True}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why do we need this special case?

Copy link
Contributor Author

@YassinNouh21 YassinNouh21 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The reason we single out pure Dict[str, Any] parameters whose description mentions “meta” is that metadata blobs by definition have an arbitrary, unknown shape, and we don’t want to bake an open-ended object schema (with unpredictable keys) into our function spec.

  • If I treated metadata as a normal object with additionalProperties: true, under tools_strict I would have to enumerate every possible metadata key (which we can’t know ahead of time), and LLMs often struggle to fill such free-form nested schemas.
  • All other dicts (that aren’t metadata) still fall back to an object schema with additionalProperties: true.

@YassinNouh21
Copy link
Contributor Author

@sjrl — Could you please take a look and let me know if this now aligns with your requirements? I’m happy to tweak it further if needed.

@anakin87 anakin87 removed their request for review May 2, 2025 08:55
@sjrl
Copy link
Contributor

sjrl commented May 15, 2025

Hey @YassinNouh21 thanks for your work on this! However, in the end we opted to go with a different approach outlined here #9342 which uses pydantic to handle the automatic schema generation rather than reimplementing this ourselves.

@sjrl sjrl closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The auto construction of Tool parameters in ComponentTool does not work for more complex types

3 participants