Skip to content

Preserve docstrings from underlying pipeline for SuperComponent runtime input parameters #9291

@sjrl

Description

@sjrl

It would be really helpful if SuperComponents could retain the underlying docstrings for the different input parameters that result from the wrapped pipeline.

For example in this web search component tool

import os

os.environ["SERPERDEV_API_KEY"] = "fake-key"

from haystack.components.converters.html import HTMLToDocument
from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.websearch.serper_dev import SerperDevWebSearch
from haystack.core.pipeline import Pipeline

from haystack.core.super_component import SuperComponent
from haystack.tools import ComponentTool


search_pipeline = Pipeline()

search_pipeline.add_component("search", SerperDevWebSearch(top_k=10))
search_pipeline.add_component("fetcher", LinkContentFetcher(timeout=3, raise_on_failure=False, retry_attempts=2))
search_pipeline.add_component("converter", HTMLToDocument())

search_pipeline.connect("search.links", "fetcher.urls")
search_pipeline.connect("fetcher.streams", "converter.sources")


search_component = SuperComponent(
    pipeline=search_pipeline,
    input_mapping={"query": ["search.query"], "extraction_kwargs": ["converter.extraction_kwargs"]},
    output_mapping={"converter.documents": "documents"}
)
search_tool = ComponentTool(
    name="search", description="Use this tool to search for information on the internet.", component=search_component
)

print(search_tool.parameters)
# {
#    "type": "object",
#    "properties": {
#        "query": {"type": "string", "description": "Input 'query' for the component."},
#        "extraction_kwargs": {"type": "string", "description": "Input 'extraction_kwargs' for the component."},
#    },
#    "required": ["query"],
#}

we can see that the auto generated descriptions for the variables are not descriptive and don't help the LLM understand what can go there.

The only reason this ends up working well is that we manually provide a description for the Tool with "Use this tool to search for information on the internet.", so the LLM is able to infer that the query is for searching the web.

If we directly use the SerperDevWebSearch component in ComponentTool we get

print(ComponentTool(name="serper", component=SerperDevWebSearch(top_k=10)).description)
# Uses [Serper](https://serper.dev/) to search the web for relevant documents.
# See the [Serper Dev website](https://serper.dev/) for more details.
# ...

print(ComponentTool(name="serper", component=SerperDevWebSearch(top_k=10)).parameters)
# {'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'Search query.'}}, 'required': ['query']}

Workaround

Of course we can workaround this by manually inputting parameters in ComponentTool, but it'd be better if in our auto generation that could use the docstrings from SerperDevWebSearch and HTMLToDocument for query and extraction_kwargs respectively.

Metadata

Metadata

Assignees

Labels

P1High priority, add to the next sprint

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions