-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
It would be really helpful if SuperComponents could retain the underlying docstrings for the different input parameters that result from the wrapped pipeline.
For example in this web search component tool
import os
os.environ["SERPERDEV_API_KEY"] = "fake-key"
from haystack.components.converters.html import HTMLToDocument
from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.websearch.serper_dev import SerperDevWebSearch
from haystack.core.pipeline import Pipeline
from haystack.core.super_component import SuperComponent
from haystack.tools import ComponentTool
search_pipeline = Pipeline()
search_pipeline.add_component("search", SerperDevWebSearch(top_k=10))
search_pipeline.add_component("fetcher", LinkContentFetcher(timeout=3, raise_on_failure=False, retry_attempts=2))
search_pipeline.add_component("converter", HTMLToDocument())
search_pipeline.connect("search.links", "fetcher.urls")
search_pipeline.connect("fetcher.streams", "converter.sources")
search_component = SuperComponent(
pipeline=search_pipeline,
input_mapping={"query": ["search.query"], "extraction_kwargs": ["converter.extraction_kwargs"]},
output_mapping={"converter.documents": "documents"}
)
search_tool = ComponentTool(
name="search", description="Use this tool to search for information on the internet.", component=search_component
)
print(search_tool.parameters)
# {
# "type": "object",
# "properties": {
# "query": {"type": "string", "description": "Input 'query' for the component."},
# "extraction_kwargs": {"type": "string", "description": "Input 'extraction_kwargs' for the component."},
# },
# "required": ["query"],
#}we can see that the auto generated descriptions for the variables are not descriptive and don't help the LLM understand what can go there.
The only reason this ends up working well is that we manually provide a description for the Tool with "Use this tool to search for information on the internet.", so the LLM is able to infer that the query is for searching the web.
If we directly use the SerperDevWebSearch component in ComponentTool we get
print(ComponentTool(name="serper", component=SerperDevWebSearch(top_k=10)).description)
# Uses [Serper](https://serper.dev/) to search the web for relevant documents.
# See the [Serper Dev website](https://serper.dev/) for more details.
# ...
print(ComponentTool(name="serper", component=SerperDevWebSearch(top_k=10)).parameters)
# {'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'Search query.'}}, 'required': ['query']}Workaround
Of course we can workaround this by manually inputting parameters in ComponentTool, but it'd be better if in our auto generation that could use the docstrings from SerperDevWebSearch and HTMLToDocument for query and extraction_kwargs respectively.