New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DS-4166 Index workspace, workflow and tasks in SOLR #65

Closed

abollini wants to merge 103 commits into DS-3851_workflow_new from DS-4166_mydspace

Member

abollini commented Feb 14, 2019 •

edited

Loading

This PR is to share the implementation of the indexing features required for the implementation of the MyDSpace.

Integration Tests have been added to the DiscoverControllerIT to demonstrate:

inprogress items and tasks not popup elsewhere
the special workspace configuration works as expected (the user only see her own submissions)
the special workflow configuration works as expected (reviewers see only tasks that they have claimed or can claim)

PR to highlight the small changes required to the Discover (search) endpoint DSpace/RestContract#55
Namely the change of the embedded dspaceObject to rObject (result object) as now also workspace, workflow and tasks are returned

TO DO:

move this PR to the DSpace project after than the DS-3851 will be merged

mwoodiupui and others added 30 commits

February 4, 2019 14:58


          [DS-3695] Upgrade Solr *client* to 7.3.0.

7b9bd50


          [DS-3695] Rip out lots of Solr config. that is no longer defined in v7.

3aa6b89


          [DS-3695] Start work on dspace-spring-rest

32a3c74


          Fix minor compilation errors in OAI

01b8002


          Revert Spring Boot updates until DS-3802 is solved. Solr core only fo…

ece6448

…r testing.


          Disable Solr autoconfiguration in Spring Boot. Minor config cleanup

5e78f40


          Update to Solr 7.5. Sync dependencies and cleanup spring-rest POM


          [DS-3695] Complete botched conflict fixup.

ee3b60c


          [DS-3695] Switch new class from SolrServer to SolrClient.

8d5de13


          [DS-3695] Exclude Jetty from solr-core and solr-cell: Solr and Spring

9d0483c

Boot are fighting over versions.


          Fix Solr startup errors by downgrading to 7.3.1

cc38ec8


          [DS-3695] Document what I puzzled out ot MockSolrServer, and small cl…

3e8e1aa

…eanups.


          [DS-3695] Start ripping out Solr server.

0792ba7


          [DS-3695] Make 'search' core load in stock Solr 7.2.1.

c375fe1

This should work without altering Solr, across Solr releases, as long
as Solr ships the necessary additional analyzers in /contrib.


          [DS-3695] Cure failing IT: the test was wrong.

fa9cc7c


          [DS-3695] Remaining minimal changes to make all cores load in Solr 7.

f2cfab2


          [DS-3695] We no longer configure Solr itself.

ba0edff


          [DS-3695] Upgrade indexes all the way to 7_x.

8c26a9a


          [DS-3695] See bf4ead40575f0b180fd6840373ef17d98a6e778e. We *do* confi…

cb17cb1

…gure Solr for testing.


          [DS-3695] We no longer control Solr's logging.

241fdbb


          [DS-3695] Rip out big handfuls of unused fieldtypes, commentary about

8fb1ac5

Solr not DSpace.  Break loooong tags into attribute-per-line format.


          [DS-3695] Remove unused "fieldType"s, dusty old comments from stock

sample schema.  Tidy indentation, break very long elements into
multiple lines.


          [DS-3695] Remove redundant types, irrelevant attributes; tidy layout.

13c0b9b


          [DS-3695] Whoops, missed a few Trie*Field references.

77328c3


          [DS-3695] Reintroduce "ignored" fieldType, even though the field is

be453a0

probably not used.


          [DS-3695] Give the Solr admin. a clue about the purpose of each core.

0fc979a


          [DS-3695] Add docValues to all *PointField by default to support face…

6254ced

…ting.

Some of these may be unnecessary, but I don't know on which we facet.


          [DS-3695] All tests should start with an empty Solr.

8c222b9


          Merge branch 'DS-3695' of https://github.com/mwoodiupui/DSpace into m…

ed41d85

…woodiupui-DS-3695


          Support Docker testing of externalized solr

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/IndexClient.java Outdated

                       options.addOption(OptionBuilder.isRequired(false).withDescription(
                           "optimize search core").create("o"));
+                      options.addOption("e", "readfile", true, "Read the identifier from a file");

KevinVdV Mar 26, 2019

These identifiers need to be items, best to alter the documentation here to make this clear.

Member Author

abollini Mar 28, 2019

option removed as it is out of scope of this PR

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/IndexClient.java Outdated

+                              }
+                              indexer.updateIndex(context, ids, line.hasOption("f"), type);
+                          } catch (Exception e) {
+                              log.error("Error: " + e.getMessage());

KevinVdV Mar 26, 2019

The error log here should also log the stacktrace.

Member Author

abollini Mar 28, 2019

option removed, the code doesn't exist anymore

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

-                              indexContent(context, community, force);
+                  public void updateIndex(Context context, List<UUID> ids, boolean force, int type) {
+                      if (type != Constants.ITEM) {
+                          throw new RuntimeException("Only ITEM is supported in this mode - type founded: " + type);

KevinVdV Mar 26, 2019

Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.

Member Author

abollini Mar 28, 2019

the code doesn't exist anymore

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                                  }
+                                  break;
+                              default:
+                                  throw new RuntimeException("No type known: " + type);

KevinVdV Mar 26, 2019

Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.

Member Author

abollini Mar 28, 2019

done, 5e3164b

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                              ids.add(wfi.getItem().getID());
+                          }
+                      }
+                      List<UUID>[] arrayIDList = Util.splitList(ids, numThreads);

KevinVdV Mar 26, 2019 •

edited

Loading

If this method is called upon multiple times from a single process the number of threads that can be active at the same time are not limited to the "numThreads" variable. This could really lead to memory leaks where we have way too many threads running at the same time. It would really be best to use a spring TaskExecutor: https://docs.spring.io/spring/docs/4.3.x/spring-framework-reference/html/scheduling.html. This will ensure that the number of threads started by the discovery process is limited to the "numThreads" variable

Member Author

abollini Mar 28, 2019

support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                          threads.add(thread);
+                      }
+                      boolean finished = false;
+                      while (!finished) {

KevinVdV Mar 26, 2019

USe the spring TaskExecutors to find a better way to handle this.

Member Author

abollini Mar 28, 2019

support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                      document.addField("namedresourcetype_keyword", fvalue);
+                  }
+                  class IndexerThread extends Thread {

KevinVdV Mar 26, 2019

Should really implement the Runnable class instead of extending thread as this is the preferred way to work with threads in java.

https://stackoverflow.com/questions/541487/implements-runnable-vs-extends-thread-in-java

Member Author

abollini Mar 28, 2019

I agree with the suggestion but support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java

+                  public void cleanIndex(boolean force) throws IOException, SQLException, SearchServiceException {
+                      if (force) {
+                          try {
+                              getSolr().deleteByQuery(

KevinVdV Mar 26, 2019

Why not use ":" as the query ?

Member Author

abollini Mar 28, 2019

it is safe in case the local installation has customized the search index in a way that we don't know so that it includes now things maybe managed and indexed by other classes/services.

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java

                       return result;
                   }
+                  public DiscoverResult.FacetResult getDiscoveryFacet(Context context, FacetField facetField,

KevinVdV Mar 26, 2019

Method isn't used.

Member Author

abollini Mar 28, 2019

removed cc11f49

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                                  System.out.println(head + ":" + (idx++) + " / " + size);
+                              }
+                          } catch (Exception e) {
+                              e.printStackTrace();

KevinVdV Mar 26, 2019

Can this be removed, as it is already logged below.

Member Author

abollini Mar 28, 2019

the code doesn't exist anymore

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                                      indexContent(context, item, force);
+                                      context.uncacheEntity(item);
+                                  } catch (Exception ex) {
+                                      log.error("ERROR: identifier item:" + id + " identifier thread:" + head);

KevinVdV Mar 26, 2019

Could you also log the stacktrace here ?

KevinVdV reviewed

View reviewed changes

dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java Outdated

+                                  } catch (Exception ex) {
+                                      log.error("ERROR: identifier item:" + id + " identifier thread:" + head);
+                                  }
+                                  System.out.println(head + ":" + (idx++) + " / " + size);

KevinVdV Mar 26, 2019

Can this end up in the logs instead of in the system out.

Member Author

abollini Mar 28, 2019

he code doesn't exist anymore

cwilper commented Mar 27, 2019

Regarding the multi-threaded indexing, does it makes sense to do that as a distinct JIRA/PR pair? It doesn't seem essential to the functionality being introduced here, and would seem worth dedicated consideration.

I agree that moving toward something like spring executors would probably result in more robust thread management. I also think some dedicated testing would be good...at least to show/report that there's a throughput improvement for a typical site for the chosen default number of threads.

abollini added 11 commits

March 27, 2019 20:31


          DS-4166 community feedback: rename BrowsableObject to IndexableObject

4b85bf4


          DS-4166 community feedback: rename resultObject to indexableObject in…

b98d8f4

… the discover REST result


          Merge pull request DSpace#2312 from 4Science/DS-3851_workflow_new

d25463f

DS-3851 Endpoint to interact with the workflow


          DS-4166 community feedback: use dedicated fields for workspace/workfl…

006b938

…ow searches


          DS-4166 community feedback: improve documentation

272f21a


          DS-4166 community feedback: remove unused methods and configurations

cc11f49


          DS-4166 community feedback: remove multithreads indexing support


          DS-4166 community feedback: implement the IndexableObject interface o…

689ac4e

…nly where really needed


          Merge branch 'master' of https://github.com/DSpace/DSpace into mydspa…

eba97f4

…ce_clean


          DS-4166 move IndexableObject to the discovery package

30899b0


          DS-3851 add test and fix for invalid task claiming

a8190fe

abollini force-pushed the DS-4166_mydspace branch from c1e2d95 to a8190fe Compare

March 28, 2019 13:56

abollini mentioned this pull request

DS-4166 Index workspace, workflow and tasks in SOLR DSpace/DSpace#2391

Merged

abollini added 2 commits

March 28, 2019 15:20


          DS-4166 community feedback: report about not existing uuid

8de7a50


          DS-4166 community feedback: use a more appropriate exception

5e3164b

Member Author

abollini commented Mar 28, 2019

Hi all, thanks to all of you @benbosman @KevinVdV @cwilper @tdonohue for your review. I tried to put response with links in all the comment to speedup the validation process.
Please note that the PR is now open on the official DSpace repository so we should continue here our discussion DSpace#2391

@benbosman about your latest general comment #65 (comment)

We've agreed to rename BrowsableObject to IndexableObject in Java

done, see 4b85bf4 and 30899b0

We've agreed to rename "resultObject" to "indexableObject" in the discover REST result

done, see b98d8f4

We've agreed to make sure IndexableObject is implemented in Item, Collection, Community; and not in DSpaceObject

done, see 689ac4e

the changes to the Constants class are currently under discussion

I have preferred to avoid any changes on that. If we will end in replace the search.resoucetype field with a search.resourcename field we can drop the new constants at this time instead than introduce a double way to deal with this things now.
I don't see any issue in the Constants class to have ID for not DSpaceObject, it is not stated in any place that the scope of the Constants class was limited to DSpaceObject, moreover also in the authorization infrastructure of the REST API we need constants to identify the different kind of objects regardless to be them dspaceobject or not, see
https://github.com/DSpace/DSpace/blob/master/dspace-spring-rest/src/main/java/org/dspace/app/rest/repository/PoolTaskRestRepository.java#L81

If a change to the current implementation of the Constants class is needed it can be done in my opinion after the preview release or later.

abollini closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment