Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implementation of Table Cell Proposal #4616

Merged
merged 42 commits into from
Apr 19, 2023
Merged

feat: Implementation of Table Cell Proposal #4616

merged 42 commits into from
Apr 19, 2023

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Apr 6, 2023

Related Issues

Proposed Changes:

  • Add TableCell data class following the code changes outline in the proposal here
  • Add boolean flag to toggle between Span and TableCell following the deprecation policy for 2 additional versions of Haystack

The identified Bug in the TableCell Proposal will not be handled here but in a separate PR. Initially, I tried adding the changes for this, but it quickly started to become quite complicated so I decided to split up the changes.

Documentation Changes (to do after merge)

  • Opened issue Update the Note under Table Question Answering to reflect the linearized offsets will be deprecated in favor of offsets that specify the row and column indices of the table cell
  • Opened issue Update the TableQa tutorial to reflect the linearized offsets will be deprecated in favor of offsets that specify the row and column indices of the table cell

How did you test it?

  • Added new unit tests to test/others/test_schema.py
  • Updated existing unit test in test/pipelines/test_eval.py

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added tests that demonstrate the correct behavior of the change
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@sjrl sjrl requested a review from bglearning April 6, 2023 12:26
haystack/schema.py Show resolved Hide resolved
haystack/schema.py Show resolved Hide resolved
@bglearning
Copy link
Contributor

bglearning commented Apr 14, 2023

Manually tried out the REST API with a TableQA pipeline [1] specifying return_table_cell=True.

ES through docker: docker-compose up elasticsearch

API through gunicorn: gunicorn rest_api.application:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker -t 300

Works as expected (with row-col):
row-col-response

With return_table_cell=False (current default)
start-end-response

File used/uploaded:
sample_input_file_str_v2.txt

[1] Pipeline:

version: ignore

components:
  - name: DocumentStore
    type: ElasticsearchDocumentStore
    params:
      host: localhost
  - name: Retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 5 
  - name: TableReader
    type: TableReader
    params:
      model_name_or_path: google/tapas-base-finetuned-wtq
      max_seq_len: 512
      return_table_cell: true
  - name: JsonConverter
    type: JsonConverter

pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: TableReader
        inputs: [Retriever]
  - name: indexing
    nodes:
      - name: JsonConverter
        inputs: [File]
      - name: Retriever
        inputs: [JsonConverter]
      - name: DocumentStore
        inputs: [Retriever]

@sjrl sjrl marked this pull request as ready for review April 14, 2023 12:50
@sjrl sjrl requested a review from a team as a code owner April 14, 2023 12:50
@sjrl sjrl requested review from ZanSara and agnieszka-m and removed request for a team April 14, 2023 12:50
@sjrl
Copy link
Contributor Author

sjrl commented Apr 14, 2023

@ZanSara, @bglearning and I wanted to ask if there is a good place in Haystack to add the test that @bglearning ran manually in this comment?

Copy link
Contributor

@agnieszka-m agnieszka-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commited super minor lg changes here.

@ZanSara ZanSara added this to the 1.16 milestone Apr 18, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Apr 18, 2023

@sjrl @bglearning I think a test like that could fit well in the REST API test suite: https://github.com/deepset-ai/haystack/blob/main/rest_api/test/test_rest_api.py

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! I left just a few questions but nothing serious. Good job! 😊

haystack/schema.py Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/nodes/reader/farm.py Show resolved Hide resolved
haystack/nodes/reader/table.py Show resolved Hide resolved
@sjrl
Copy link
Contributor Author

sjrl commented Apr 18, 2023

@sjrl @bglearning I think a test like that could fit well in the REST API test suite: https://github.com/deepset-ai/haystack/blob/main/rest_api/test/test_rest_api.py

It looks like there is a mock test for a pipeline that returns a table document here. Our test would include loading a HF model, which I believe we are trying to avoid in the REST API test suite to keep it light. Is that correct?

@sjrl
Copy link
Contributor Author

sjrl commented Apr 18, 2023

Is this an unrelated bug fix? Or the issue was introduced with the rest of this PR?

Sorry @ZanSara I'm not sure why I can't reply in the above comment. But yes this is an unrelated bug fix that only appeared as I was adding more tests for table documents in the schema tests.

@bglearning
Copy link
Contributor

bglearning commented Apr 18, 2023

@sjrl @bglearning I think a test like that could fit well in the REST API test suite: https://github.com/deepset-ai/haystack/blob/main/rest_api/test/test_rest_api.py

It looks like there is a mock test for a pipeline that returns a table document here. Our test would include loading a HF model, which I believe we are trying to avoid in the REST API test suite to keep it light. Is that correct?

Yes, the REST API tests are understandably only testing the API interface part, mocking out all internal pipeline runs. The one I ran manually is more of an end-to-end check.

@ZanSara
Copy link
Contributor

ZanSara commented Apr 18, 2023

Ok let's think about this test a bit more. What is it testing?

  1. That the REST API can return Answer containing TableCell? -> no need for any pipeline, should go in the REST API tests.
  2. That TableCell objects work with the Pipeline? -> no need for the REST API, let's extend the Haystack tests.

end2end tests make sense when they add value, because they're heavy. For example, we want to test that HF models actually work when not mocked, so we make an end2end test for them. This PR is about introducing a new primitive, not about integrating external libraries or services, so I don't believe we need an e2e. But if you have a valid usecase for testing the whole thing that adds something on top of well-done unit tests, let's consider it.

@sjrl
Copy link
Contributor Author

sjrl commented Apr 19, 2023

Thanks for the info @ZanSara!

That the REST API can return Answer containing TableCell? -> no need for any pipeline, should go in the REST API tests.

We really just want to make sure Answer containing TableCell can be returned through our REST API. I added a test to do just this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add new TableCell class described in Table Cell Proposal
4 participants