Add `RouteDocuments` and `JoinAnswers` nodes #2256

bogdankostic · 2022-02-28T16:25:41Z

This PR adds two nodes: a RouteDocuments node and a JoinAnswers node. Both nodes are needed in order to do QA on the combination of text and tables as the source of information.

The RouteDocuments takes as input a list of Documents and splits them by either content_type or a metadata field and routes the resulting splits to different outputs. An alternative to this node would have been to allow both texts and tables for the readers, but make the FARMReader skip Documents of type tables and the TableReader skip Documents of type text. However, having a designated node makes this process more explicit and allows splitting not only by content_type, but also other metadata values. (For example, for routing Documents to Readers that are trained on a specific domain or on a specific language)

The JoinAnswers node takes as input the predicted Answers of two individual Reader nodes and joins them to a single list of Answers.

This graph shows how a Pipeline allowing QA on both text and tables would look like:

…it_tables_and_texts

review-notebook-app · 2022-02-28T16:25:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…it_tables_and_texts # Conflicts: # tutorials/Tutorial15_TableQA.ipynb

julian-risch · 2022-02-28T16:35:21Z

A JoinAnswers node would also be helpful for the use case described in this issue #1081 on combining FAQ and ExtractiveQA in a pipeline by @SasikiranJ

…it_tables_and_texts

julian-risch

Looks quite good to me. I am just missing a test case for the JoinAnswers node, I saw some typing errors in the CI and I would like to briefly talk about the names of the new nodes with you. Feel free to ping me anytime.

julian-risch · 2022-02-28T16:43:17Z

json-schemas/haystack-pipeline-1.2.0.schema.json

@@ -59,6 +59,9 @@
          {
            "$ref": "#/definitions/ImageToTextConverterComponent"
          },
+          {
+            "$ref": "#/definitions/JoinAnswersComponent"


As the update of the Haystack documentation website isn't completed yet, we haven't upgraded the Haystack version to 1.2.1rc0 yet. Before merging your PR, we need to make sure that json-schemas/haystack-pipeline-1.2.0.schema.json is unchanged and a json-schemas/haystack-pipeline-1.2.1rc0.schema.json is created.

@brandenchan we'll merge this PR now and then we'll need to correct the schema files once the website updates for the v1.2.0 are done.

julian-risch · 2022-02-28T16:45:49Z

test/test_pipeline.py

@@ -1041,6 +1043,35 @@ def test_documentsearch_document_store_authentication(retriever_with_docs, docum
        assert kwargs["headers"] == auth_headers


+def test_split_document_list_content_type(test_docs_xs):


We should add a test case to check both join modes of the JoinAnswers node.

Added a test case that covers both join modes.

julian-risch · 2022-02-28T16:49:00Z

tutorials/Tutorial15_TableQA.ipynb

+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [


Is it on purpose to include the outputs in the tutorial?

Yes, the outputs help the users to understand the output without having to run the tutorial.

julian-risch · 2022-02-28T16:53:55Z

haystack/nodes/other/__init__.py

@@ -1,2 +1,4 @@
 from haystack.nodes.other.docs2answers import Docs2Answers
 from haystack.nodes.other.join_docs import JoinDocuments
+from haystack.nodes.other.split_documents import SplitDocumentList


Seeing the different names of the other nodes, I am wondering whether we could have a more consistent naming scheme. Unfortunately, I don't have an alternative for SplitDocumentList in mind. Maybe we can briefly talk about it.

What about RouteDocuments? Similar to JoinDocuments and in theory there could later be a RouteAnswers node.

DocumentRouter would be more consistent with the other nodes (TableReader, Summarizer, Retriever) but then I am not immediately convinced by DocumentJoiner and AnswerJoiner.

RouteDocuments it is :)

julian-risch · 2022-02-28T16:55:23Z

haystack/nodes/other/split_documents.py

+
+    def __init__(self, split_by: str = "content_type", metadata_values: Optional[List[str]] = None):
+        """
+        :param split_by: Field to split the documents by. Either `"content_type"` or a metadata field name.


"by. Either" should become "by either"

…it_tables_and_texts

julian-risch · 2022-03-01T16:21:29Z

test/test_pipeline.py

@@ -1072,6 +1074,20 @@ def test_split_document_list_content_type(test_docs_xs):
    assert result["output_3"][0].meta["meta_field"] == "test5"


+@pytest.mark.parametrize("join_mode", ["concatenate", "merge"])
+def test_join_answers_concatenate(join_mode):


test_join_answers_concatenate is a little bit misleading as you test for "concatenate" and "merge".

bogdankostic and others added 5 commits February 25, 2022 17:46

Add SplitDocumentList and JoinAnswer nodes

594101d

Update Documentation & Code Style

840fede

Add tests + adapt tutorial

598af88

Merge remote-tracking branch 'origin/split_tables_and_texts' into spl…

511f16e

…it_tables_and_texts

Update Documentation & Code Style

e199546

bogdankostic requested a review from julian-risch February 28, 2022 16:25

bogdankostic and others added 4 commits February 28, 2022 17:27

Remove branch from installation path in Tutorial

d24fb22

Merge remote-tracking branch 'origin/split_tables_and_texts' into spl…

bf55469

…it_tables_and_texts # Conflicts: # tutorials/Tutorial15_TableQA.ipynb

Merge branch 'master' into split_tables_and_texts

a56532c

Update Documentation & Code Style

5674eff

bogdankostic and others added 3 commits February 28, 2022 17:41

Fix typing

48198b7

Merge remote-tracking branch 'origin/split_tables_and_texts' into spl…

e25834e

…it_tables_and_texts

Update Documentation & Code Style

665133e

julian-risch requested changes Feb 28, 2022

View reviewed changes

bogdankostic added topic:pipeline topic:tableQA type:feature New feature or request labels Mar 1, 2022

bogdankostic and others added 6 commits March 1, 2022 15:40

Change name of SplitDocumentList to RouteDocuments

867d5ef

Update Documentation & Code Style

4b4c6b0

Adapt tutorials to new name

1842da3

Add test for JoinAnswers

13b0297

Merge remote-tracking branch 'origin/split_tables_and_texts' into spl…

2dec1db

…it_tables_and_texts

Update Documentation & Code Style

a6042b6

bogdankostic changed the title ~~Add SplitDocumentList and JoinAnswers nodes~~ Add RouteDocuments and JoinAnswers nodes Mar 1, 2022

bogdankostic requested a review from julian-risch March 1, 2022 16:14

julian-risch approved these changes Mar 1, 2022

View reviewed changes

julian-risch reviewed Mar 1, 2022

View reviewed changes

Adapt name of test for JoinAnswers node

2ad75f5

bogdankostic merged commit c5542bd into master Mar 1, 2022

bogdankostic deleted the split_tables_and_texts branch March 1, 2022 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `RouteDocuments` and `JoinAnswers` nodes #2256

Add `RouteDocuments` and `JoinAnswers` nodes #2256

bogdankostic commented Feb 28, 2022 •

edited

review-notebook-app bot commented Feb 28, 2022

julian-risch commented Feb 28, 2022

julian-risch left a comment

julian-risch Feb 28, 2022

julian-risch Mar 1, 2022

julian-risch Feb 28, 2022

bogdankostic Mar 1, 2022

julian-risch Feb 28, 2022

bogdankostic Mar 1, 2022

julian-risch Feb 28, 2022

julian-risch Feb 28, 2022

julian-risch Feb 28, 2022

julian-risch Mar 1, 2022

julian-risch Feb 28, 2022

julian-risch Mar 1, 2022

		@@ -1041,6 +1043,35 @@ def test_documentsearch_document_store_authentication(retriever_with_docs, docum
		assert kwargs["headers"] == auth_headers


		def test_split_document_list_content_type(test_docs_xs):

Add RouteDocuments and JoinAnswers nodes #2256

Add RouteDocuments and JoinAnswers nodes #2256

Conversation

bogdankostic commented Feb 28, 2022 • edited

review-notebook-app bot commented Feb 28, 2022

julian-risch commented Feb 28, 2022

julian-risch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `RouteDocuments` and `JoinAnswers` nodes #2256

Add `RouteDocuments` and `JoinAnswers` nodes #2256

bogdankostic commented Feb 28, 2022 •

edited