Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support file upload #22

Open
masci opened this issue May 25, 2024 · 1 comment
Open

Support file upload #22

masci opened this issue May 25, 2024 · 1 comment

Comments

@masci
Copy link
Contributor

masci commented May 25, 2024

At the moment it's extremely hard to use Hayhooks for indexing pipelines, as they either accept:

Since Hayhooks is in control of the request payload for pipeline endpoints, one possible solution might be accepting multipart form data whenever the input of a pipeline is of type path or bytestream. Hayhooks would receive the file and take care of temporarily storing it server-side, or passing bytes on-the-fly to the pipeline.

@OscarIntellico
Copy link

I confirm this issue.

I have an indexing pipeline. The pipeline accepts documents and then indexes them into an ElasticSearch database.

I created a test pipeline with a single DocumentCleaner that expects a list of documents and I have the following two problems:

  1. the /docs endpoint throws me an "Internal Server Error /openapi.json" error. The server logs are showing the following error:

pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'pandas.core.frame.DataFrame'>)

  1. When i call the endpoint localhost:1416/doc_cleaner with a list of documents with curl or with python request, I get this error:

TypeError: DocumentCleaner expects a List of Documents as input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants