Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexed search capabilities to uPortal #1598

Merged
merged 6 commits into from Mar 14, 2019

Conversation

drewwills
Copy link
Contributor

@drewwills drewwills commented Mar 7, 2019

Checklist
Description of change

This PR will add search indexing and indexed-based search to uPortal, together with indexing support for some content types (esp. SimpleContentPortlet).

@rodzi @laurenra7

…partial indexing in this commit (but not search yet)
@drewwills
Copy link
Contributor Author

Heads up -- this initial commit will definitely fail the Google format check.

@jgribonvald
Copy link
Contributor

Some questions, but maybe it's an other development step:

  • Why running a task consistently instead of running indexing when a reload of portlets informations happen like at the same time than inside cache (it will be more accurate in this case) ? On your way I fear that when someone change something into portlets user will need to wait the indexing search refresh to benefit of change, so there will be an unconsistent time between data's refresh and what users will obtains from portlets content.
  • Also is there a plan on how to share indexing between several uPortal load-balanced instance ? It seems that a shared directory will be needed and that only one uPortal instance do the indexing at once.

@ChristianMurphy
Copy link
Member

@drewwills testing this locally, I'm not spotting a difference between this, and the previous search.
Is it enabled? Is there a test case that would best demonstrate the new functionality?

@drewwills
Copy link
Contributor Author

Thank you @jgribonvald & @ChristianMurphy for your engagement on this effort!

[I]s there a plan on how to share indexing between several uPortal load-balanced instance?

I would love that. I think that's something we may want to try and improve down the road.

The trouble is that the out-of-the-box Directory implementations in Lucene appear to be focused on storing index data on the file system. I've seen a few mentions of a JDBCDirectory (that stores this data in a relational database) available from third parties, but there seem to be mixed opinions about weather this approach is "production ready."

If the indexing data is on the file system, I figure it has to be:

  • Per-server node

  • Able to recreate itself quickly & thoroughly after a re-deploy

    Why running a task consistently instead of running indexing when a reload of portlets informations happen?

Mostly for the reason(s) above.

If we index when the data changes, how will we...

  • Index data that was there before the application started?
  • Re-index data that changed on another server node?

This approach eliminates those concerns.

I'm not spotting a difference between this, and the previous search.
Is it enabled?

Kinda.

There are basically 3 parts to this work:

  1. Index portlet metadata
  2. Index portlet content where possible
  3. Implement searching based on the index

(1) and (2) are implemented in this PR and enabled; (3) is not implemented, yet.

@drewwills
Copy link
Contributor Author

drewwills commented Mar 9, 2019

Update

The latest commit implements query matching/search results based on indexed portlet data.

But there's (at least) one issue remaining...

I'm getting duplicates in the results. I need to deal with that.

@apetro
Copy link
Member

apetro commented Mar 11, 2019

This changeset appears to add no unit tests so far. Maybe the duplicate search results are an opportunity to express that bug as a failing test case and then make the test pass?

@rodzi
Copy link

rodzi commented Mar 11, 2019

Do you know if the duplicates are true duplicates (The same items in the index) or, are they different items in the index that look the same?

@drewwills
Copy link
Contributor Author

Do you know if the duplicates are true duplicates (The same items in the index) or, are they different items in the index that look the same?

I didn't have a chance to look closely, but I strongly suspect they are not really duplicates... but rather the same piece of content (a.k.a. portlet definition) matching the search term(s) more than once. (For example, matching in the title and also in the description fields.)

The trouble is that the current UI (which I have no time to update) has no way to show multiple matches to the user or to get value out of the extra result(s) in any way.

@rodzi
Copy link

rodzi commented Mar 11, 2019

Do you know if the duplicates are true duplicates (The same items in the index) or, are they different items in the index that look the same?

I didn't have a chance to look closely, but I strongly suspect they are not really duplicates... but rather the same piece of content (a.k.a. portlet definition) matching the search term(s) more than once. (For example, matching in the title and also in the description fields.)

The trouble is that the current UI (which I have no time to update) has no way to show multiple matches to the user or to get value out of the extra result(s) in any way.

How does the result ranking look. If the things that you expect are all at the top of the results then it should be possible to remove the duplicates by setting a threshold on the score.

@drewwills drewwills changed the title [WIP] Add indexed search capabilities to uPortal Add indexed search capabilities to uPortal Mar 12, 2019
@drewwills
Copy link
Contributor Author

@rodzi -- I handled the issue with duplicates. Do you have any advice about how we're using Lucene in this solution?

@drewwills drewwills merged commit b60903a into uPortal-Project:master Mar 14, 2019
@drewwills drewwills deleted the indexed-search branch March 14, 2019 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants