Skip to content

Conversation

@max-zilla
Copy link
Contributor

@max-zilla max-zilla commented Aug 4, 2023

This is for #603

THIS PR OVERHAULS ELASTICSEARCH INDEX. Recommend stopping your containers, deleting volumes, restarting, then creating a Metadata Definition and creating datasets/files. This should resolve duplicate results bug and re-enable Lucene search.

It was becoming difficult to make effective search while files/datasets/metadata were in separate Elasticsearch indexes, here they are merged into one ElasticsearchObject closer to v1 that includes metadata directly on object.

Examples:

File and dataset with metadata:
image
image

Use fancy lucene syntax:
image

Combine them with another dataset that has Orange as color:
image

Multiple criteria:
image
image

Copy link
Member

@lmarini lmarini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to work. I did drop the indexes. Documents show in elasticsearch but the UI returns "no results". I created a dataset with name/description of "foobar" and two metadata fields, one a text one with content of "foobar". Searching for "foobar" doesn't return documents. Searching with empty string also doesn't return documents.

@lmarini lmarini requested a review from ddey2 August 10, 2023 14:50
@tcnichol
Copy link
Contributor

With advanced search, I see datasets by name and from metadata entries, started with new db so didn't need to re-index.

And I don't see results if I don't have permission for a particular user.

Is there anything else that needs to be tested? That seems like everything.

Copy link
Contributor

@tcnichol tcnichol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with new datasets and metadata.
No duplicates, correct datasets returned, only datasets with permission returned.
Approved.

@longshuicy
Copy link
Member

what's the proper way to drop indexes? I delete the whole volume but still see below behavior

  • when turn on lucene i see double results
  • I can't search any metadata (file and dataset names works fine). I'm also missing the creator dropdowns
image image

Copy link
Member

@longshuicy longshuicy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
No duplicate results, and lucene search works with metadata.

Copy link
Member

@ddey2 ddey2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@max-zilla max-zilla requested a review from lmarini August 21, 2023 13:48
@max-zilla max-zilla merged commit 1bf0e59 into main Aug 21, 2023
@max-zilla max-zilla deleted the fix-search-result-duplicates branch August 21, 2023 14:26
longshuicy pushed a commit that referenced this pull request Aug 21, 2023
* change ESMetadata field names

* restore download filter

* re-enable lucene search

* add permissions to other endpoint

* merge ES indexes into single index and refactor metadata

* clean up GUI search behavior as well.

* include content_type in minio

* fix tests & codegen
lmarini added a commit that referenced this pull request Aug 28, 2023
* add sharing endpoint

* add pytest

* expires need to be converted to timedelta

* codegen and black

* connect to action

* share visualization

* add envrionmental variable to pass in minio external images

* lift the condition that only main and release get to push to docker temporarily

* allow pushing to docker.io

* turn off pushing

* turn secure to true

* only go by main type if there is no content type (#639)

* improve the logic of handling incomplete visualization configuration (#637)

* change ESMetadata field names (#626)

* change ESMetadata field names

* restore download filter

* re-enable lucene search

* add permissions to other endpoint

* merge ES indexes into single index and refactor metadata

* clean up GUI search behavior as well.

* include content_type in minio

* fix tests & codegen

* Edit Group Name no longer erases group description (#646)

* sending groupDescription to EditNameModal same as in EditDescriptionModal which sends both

* undefined or string

* breadcrumb back for dataset and folder and subfolder (#634)

* breadcrumb back for dataset and folder and subfolder

* no breadcrumb in dataset

* move breadcrumb on dataset folder page
only visible if we are in folder

* moving breadcrumb

* remove console log and add padding
not sure if padding is right

* update breadcrumb (#642)

* update breadcrumb

* update file layout the same way as dataset

---------

Co-authored-by: Chen Wang <cwang138@illinois.edu>

* add enpoint for presign url from minio

* need to toggle between http and https;environment variable cannot be boolean

* wire in minio secure env correctly

* remove not used redux stuff

* wire in sharing url for file

* use the correct icon

* wire it properly

* linting

* Removed unused variable.

* update default to be 7 days

---------

Co-authored-by: Todd Nicholson <40038535+tcnichol@users.noreply.github.com>
Co-authored-by: Max Burnette <mburnet2@illinois.edu>
Co-authored-by: Luigi Marini <lmarini@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants