You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 2, 2019. It is now read-only.
To do that we need the following matching functions:
exact
boolean combinations
fuzzy
semantic (hackathon feature)
on the following fields:
Header(s), context strings
Document text (everything else not associated with tables)
Column labels
Column types and subtypes
Strings ('other') columns
I propose to use elasticsearch by indexing documents and tables as separate types. It is fast, scalable and allows to translate every requirement into queries that are native to elasticsearch.
Even semantic search (=ANN, #9) can be achieved by transforming vector representations into proxy "words" as done here: https://github.com/ascribe/image-match/blob/master/image_match/signature_database_base.py (although '5' based discrete vectors, not dense vectors).
The text was updated successfully, but these errors were encountered:
Querying tables and documents in a flexible, concise and precise way is important for two reasons:
To do that we need the following matching functions:
on the following fields:
I propose to use elasticsearch by indexing documents and tables as separate types. It is fast, scalable and allows to translate every requirement into queries that are native to elasticsearch.
Even semantic search (=ANN, #9) can be achieved by transforming vector representations into proxy "words" as done here: https://github.com/ascribe/image-match/blob/master/image_match/signature_database_base.py (although '5' based discrete vectors, not dense vectors).
The text was updated successfully, but these errors were encountered: