-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implementation of Table Cell Proposal #4616
Conversation
…ts for table reader
…ments. Also updated Doc.__eq__ to work for tables.
…serializable doc.id and doc.content. Updated schema test for Table label.
…ist] as valid type for context in Answer.
Manually tried out the REST API with a TableQA pipeline [1] specifying ES through docker: API through gunicorn: Works as expected (with row-col): With File used/uploaded: [1] Pipeline: version: ignore
components:
- name: DocumentStore
type: ElasticsearchDocumentStore
params:
host: localhost
- name: Retriever
type: BM25Retriever
params:
document_store: DocumentStore
top_k: 5
- name: TableReader
type: TableReader
params:
model_name_or_path: google/tapas-base-finetuned-wtq
max_seq_len: 512
return_table_cell: true
- name: JsonConverter
type: JsonConverter
pipelines:
- name: query
nodes:
- name: Retriever
inputs: [Query]
- name: TableReader
inputs: [Retriever]
- name: indexing
nodes:
- name: JsonConverter
inputs: [File]
- name: Retriever
inputs: [JsonConverter]
- name: DocumentStore
inputs: [Retriever]
|
@ZanSara, @bglearning and I wanted to ask if there is a good place in Haystack to add the test that @bglearning ran manually in this comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commited super minor lg changes here.
@sjrl @bglearning I think a test like that could fit well in the REST API test suite: https://github.com/deepset-ai/haystack/blob/main/rest_api/test/test_rest_api.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I left just a few questions but nothing serious. Good job! 😊
It looks like there is a mock test for a pipeline that returns a table document here. Our test would include loading a HF model, which I believe we are trying to avoid in the REST API test suite to keep it light. Is that correct? |
Sorry @ZanSara I'm not sure why I can't reply in the above comment. But yes this is an unrelated bug fix that only appeared as I was adding more tests for table documents in the schema tests. |
Yes, the REST API tests are understandably only testing the API interface part, mocking out all internal pipeline runs. The one I ran manually is more of an end-to-end check. |
Ok let's think about this test a bit more. What is it testing?
end2end tests make sense when they add value, because they're heavy. For example, we want to test that HF models actually work when not mocked, so we make an end2end test for them. This PR is about introducing a new primitive, not about integrating external libraries or services, so I don't believe we need an e2e. But if you have a valid usecase for testing the whole thing that adds something on top of well-done unit tests, let's consider it. |
Thanks for the info @ZanSara!
We really just want to make sure Answer containing TableCell can be returned through our REST API. I added a test to do just this. |
Related Issues
Proposed Changes:
Span
andTableCell
following the deprecation policy for 2 additional versions of HaystackThe identified Bug in the TableCell Proposal will not be handled here but in a separate PR. Initially, I tried adding the changes for this, but it quickly started to become quite complicated so I decided to split up the changes.
Documentation Changes (to do after merge)
How did you test it?
test/others/test_schema.py
test/pipelines/test_eval.py
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.