New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend TransformersQueryClassifier
#2903
Extend TransformersQueryClassifier
#2903
Conversation
* Update docs of DeepsetCloudDocumentStore * Update Documentation & Code Style * Update docstring Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com> * Update Documentation & Code Style * move DEFAULT_API_ENDPOINT Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* ignore mypy issues regarding files param of requests.post * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add support for aliases in elasticsearch document store * Add alias support for OpenSearch * Missing variable index * Update Documentation & Code Style * Add unit test for elasticsearch alias support * Fix unit test when index is not compatible with haystack * Fix auto format conflict * Add comment explaining for loop for alias * Update Documentation & Code Style Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Linearize tables in EmbeddingRetriever * Update Documentation & Code Style * Fix typing * Update Documentation & Code Style * simplify table linearization method + make it private * Update Documentation & Code Style * fix typing Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Renaming the ElasticsearchFilterOnlyRetriever to FilterRetriever * adding missed init file * Update Documentation & Code Style * fixed docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add scale_scores_to_probabilities flag * Update Documentation & Code Style * fix tests * fix sql mypy * Update Documentation & Code Style * fix responses * Update Documentation & Code Style * rename to scale_score_to_probability + docstrings * use BaseDocumentStore.score_to_probability in elasticsearch and milvus2 * Update Documentation & Code Style * fix tests * Update Documentation & Code Style * add tests * improve naming * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add support for positional args in pipeline.get_config() * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…2487) * changing the name of the retrievers from es_retriever to retriever * Update Documentation & Code Style * name fix 2 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This Update will fix this exception `Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite. ` Now, converting PDFs wouldn't have any issues.
* Align TransformersReader defaults with vFARMReader * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…TF-8` (#2420) * Change default encoding for PDFToTextConverter * Update Documentation & Code Style * Improve docstring * Update Documentation & Code Style * Add list of ligatures to ignore and add the possibility to modify such list at need * Add docstring * Add tests * Rename parameter * Update Documentation & Code Style * Move implementation into the base converter to make mypy happier * Update Documentation & Code Style * mypy and pylint * mypy * move encoding parameter to init of PDFToTextConverter * Update Documentation & Code Style * make utf8 default and fix mypy * Update Documentation & Code Style * Update Documentation & Code Style * remove note on encoding in tutorial8 * Update Documentation & Code Style * skip OCRConverter and test converter.run * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Remove BasePipeline and make a module for RayPipeline * Can load pipelines from yaml, plenty of issues left * Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it * Fix pipeline tests * Move some tests out of test_pipeline.py and create MockDenseRetriever * myoy and pylint (silencing too-many-public-methods) * Fix issue found in some yaml files and in schema files * Fix paths to YAML and fix some typos in Ray * Fix eval tests * Simplify MockDenseRetriever * Fix Ray test * Accidentally pushed merge coinflict, fixed * Typo in schemas * Typo in _json_schema.py * Slightly reduce noisyness of version validation warnings * Fix version logs tests * Fix version logs tests again * remove seemingly unused file * Add check and test to avoid adding the same node to the pipeline twice * Update Documentation & Code Style * Revert config to pipeline_config * Remo0ve unused import * Complete reverting to pipeline_config * Some more stray config= * Update Documentation & Code Style * Feedback * Move back other_nodes tests into pipeline tests temporarily * Update Documentation & Code Style * Fixing tests * Update Documentation & Code Style * Fixing ray and standard pipeline tests * Rename colliding load() methods in dense retrievers and faiss * Update Documentation & Code Style * Fix mypy on ray.py as well * Add check for no root node * Fix tests to use load_from_directory and load_index * Try to workaround the disabled add_node of RayPipeline * Update Documentation & Code Style * Fix Ray test * Fix FAISS tests * Relax class check in _add_node_to_pipeline_graph * Update Documentation & Code Style * Try to fix mypy in ray.py * unused import * Try another fix for Ray * Fix connector tests * Update Documentation & Code Style * Fix ray * Update Documentation & Code Style * use BaseComponent.load() in pipelines/base.py * another round of feedback * stray BaseComponent.load() * Update Documentation & Code Style * Fix FAISS tests too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* Upgrade xpdf to 4.04 in Exception text * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* change milvus links from 2.0.0 to 2.0.x * Update Documentation & Code Style * fix two broken links * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…talled (#2486) * automatically download correct torch-scatter version * raise error if torch-scatter is not installed * Update Documentation & Code Style * catch all import errors and fix linter * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
) * replace TableTextRetriever with EmbeddingRetriever in Tutorial 15 * Update Documentation & Code Style * fix bug * Update Documentation & Code Style * update tutorial 15 outputs Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-20-212.eu-west-1.compute.internal>
* delete unneeded files of last release * add v1.4.0 docs with updated links * upgrade version number * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update version to 1.4.1rc0 * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add_member_class_prefix: true * Update Documentation & Code Style * Trigger redeploy * Trigger redeploy * Fix pydoc param * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Upgrade transformers version to 4.18.0 * Adapt tokenization test to upgrade * Adapt tokenization test to upgrade
* fix small typo in Document doc string Was going through the tutorial, then digging through the code and just noticed a small typo * generate markdown file changes from docstrings Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Update version to 1.4.1rc0 * Elasticsearch is not an optional dependency * Fix import path * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Update version to 1.4.1rc0 * Add hint of enabling action on the fork in the PR template * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add sort arg to JoinAnswers * Update Documentation & Code Style * Change naming and docstring * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
…loyments (#2918) * Extending the Ray Serve integration to allow attributes for Serve deployments This closes #2917 We should be able to set Ray Serve attributes for the nodes of pipelines, like amount of GPU to use, max_concurrent_queries, etc. Now this is possible from the pipeline yaml file for each node of the pipeline. * Ran black and regenerated the json schemas * Fixing the JSON Schema generation * Trying to fix the schema CI test issue * Fixing the test and the schemas Python 3.8 was generating a different schema than Python 3.7 is creating in the CI. You MUST use Python 3.7 to generate the schemas, otherwise the CIs will fail. * Merge the two Ray pipeline test cases * Generate the JSON schemas again after `$ pip install .[all]` * Removing `haystack/json-schemas/haystack-pipeline-1.16.schema.json` This was generated by the JSON generator, but based on @ZanSara's instructions, I am removing it. * Making changes based on @ZanSara's request - the newly requested test is failing * Fixing the JSON schema generation again * Renaming `replicas` and moving it under `serve_deployment_kwargs` * add extras validation, untested * Dcoumentation update * Black * [EMPTY] Re-trigger CI Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach a lot! Very clear and flexible. Thanks for the tests especially 😊 I would add a few more of them to test unhappy paths too (wrong labels given, no label, non-existent task names, etc...) but in general the PR is sound. Thank you! 🙌
def _get_edge_number(self, label): | ||
return self.labels.index(label) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're about to merge a PR that improves management of nodes with a dynamic number of output edges: #2850 Can you wait for it to be merged and adapt the code to use the new system? It's going to be merged today or tomorrow anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This little method (beautifully written by you 😃) simply matches a label to a specific output edge.
Does #2850 impact this? How?
I will try to figure out what to do for the CLAssistant bot in the meantime 😅 |
* Tutorial 06: Replace DPR with EmbeddingRetriever Closes #2887 * Add updated tutorials/6.md file Replace `DensePassageRetriever` with `EmbeddingRetriever` * Update Tutorial 06 based on PR feedback * Further updates to Tutorial-06 according to review feedback * [Tutorial 06] Put in review feedback for the py file
* Ability to run Ray Serve detached Fixes #2944 Ability to run Ray Serve detached - to allow running multiple instances of the app (HA). See https://docs.ray.io/en/latest/serve/package-ref.html#core-apis * Generating the docs * Re-trigger the CI pipeline * Retrigger the CI Pipeline * Typo in docstrings * Fixing docstring and typing issues * Regenerating docs * [EMPTY] Re-trigger CI * [EMPTY] Re-trigger CI * Refactoring to allow any number of args for the `serve.start()` method There seems to be additional arguments of the `serve.start()` method, so we should probably cover all of them at once, instead of only the `detached` option. * [EMPTY] Re-trigger CI * Test whether the ServeControllerClient in fact has the supplied `detached` parameter
) * fixed tokens in question generation * simplified assignment * same behavior also for pad and eos * use skip_special_tokens in batch_decode * fixed black error and update docs * fixed schemas ci error * JSON schemas * Add git diff to debug schema issues * opensearch schema was missing * Add missing instruction in the workflow error message * typo
One way around this is to squash all commits into a single one, so the history won't have trace of the non-existing email address and CLAssistant will stop bothering. There are a few ways to do that, let me know if you need help 🙂 |
* enable Opensearch unit tests under Win * move unit tests into a dedicated job * skip audio tests on missing dependencies * avoid failing test collection when soundfile is not available * Update .github/workflows/tests.yml Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
…rsQueryClassifier
…om/anakin87/haystack into extend_TransformersQueryClassifier
* fix validation for dynamic outgoing edges * Update Documentation & Code Style * use class outgoing_edges as fallback if no instance is provided * implement classmethod approach * readd comment * fix mypy * fix tests * set outgoing_edges for all components * set outgoing_edges for mocks too * set document store outgoing_edges to 1 * set last missing outgoing_edges * enforce BaseComponent subclasses to define outgoing_edges * override _calculate_outgoing_edges for FileTypeClassifier * remove superfluous test * set rest_api's custom component's outgoing_edges * Update docstring Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai> * remove unnecessary else Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
…rsQueryClassifier
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Sorry, I made a big mess with git. 🙁 |
Related Issue(s): #2587
Proposed changes:
TransformersQueryClassifier
, to support text classification models (even with non-binary output) and also zero-shot classificationPre-flight checklist
I had some little issues with pre-commit hooks and docs generation (hook id: pydoc-markdown)