Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS-4166 Index workspace, workflow and tasks in SOLR #65

Closed
wants to merge 103 commits into from

Conversation

Projects
None yet
9 participants
@abollini
Copy link
Member

abollini commented Feb 14, 2019

This PR is to share the implementation of the indexing features required for the implementation of the MyDSpace.

Integration Tests have been added to the DiscoverControllerIT to demonstrate:

  1. inprogress items and tasks not popup elsewhere
  2. the special workspace configuration works as expected (the user only see her own submissions)
  3. the special workflow configuration works as expected (reviewers see only tasks that they have claimed or can claim)

PR to highlight the small changes required to the Discover (search) endpoint DSpace/Rest7Contract#55
Namely the change of the embedded dspaceObject to rObject (result object) as now also workspace, workflow and tasks are returned

TO DO:

  • move this PR to the DSpace project after than the DS-3851 will be merged

mwoodiupui and others added some commits May 12, 2018

[DS-3695] Make 'search' core load in stock Solr 7.2.1.
This should work without altering Solr, across Solr releases, as long
as Solr ships the necessary additional analyzers in /contrib.
[DS-3695] Rip out big handfuls of unused fieldtypes, commentary about
Solr not DSpace.  Break loooong tags into attribute-per-line format.
[DS-3695] Remove unused "fieldType"s, dusty old comments from stock
sample schema.  Tidy indentation, break very long elements into
multiple lines.
[DS-3695] Add docValues to all *PointField by default to support face…
…ting.

Some of these may be unnecessary, but I don't know on which we facet.
@@ -111,6 +127,8 @@ public static void main(String[] args) throws SQLException, IOException, SearchS
options.addOption(OptionBuilder.isRequired(false).withDescription(
"optimize search core").create("o"));

options.addOption("e", "readfile", true, "Read the identifier from a file");

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

These identifiers need to be items, best to alter the documentation here to make this clear.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

option removed as it is out of scope of this PR

}
indexer.updateIndex(context, ids, line.hasOption("f"), type);
} catch (Exception e) {
log.error("Error: " + e.getMessage());

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

The error log here should also log the stacktrace.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

option removed, the code doesn't exist anymore

indexContent(context, community, force);
public void updateIndex(Context context, List<UUID> ids, boolean force, int type) {
if (type != Constants.ITEM) {
throw new RuntimeException("Only ITEM is supported in this mode - type founded: " + type);

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

the code doesn't exist anymore

}
break;
default:
throw new RuntimeException("No type known: " + type);

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

done, 5e3164b

ids.add(wfi.getItem().getID());
}
}
List<UUID>[] arrayIDList = Util.splitList(ids, numThreads);

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

If this method is called upon multiple times from a single process the number of threads that can be active at the same time are not limited to the "numThreads" variable. This could really lead to memory leaks where we have way too many threads running at the same time. It would really be best to use a spring TaskExecutor: https://docs.spring.io/spring/docs/4.3.x/spring-framework-reference/html/scheduling.html. This will ensure that the number of threads started by the discovery process is limited to the "numThreads" variable

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

threads.add(thread);
}
boolean finished = false;
while (!finished) {

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

USe the spring TaskExecutors to find a better way to handle this.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

document.addField("namedresourcetype_keyword", fvalue);
}

class IndexerThread extends Thread {

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Should really implement the Runnable class instead of extending thread as this is the preferred way to work with threads in java.

https://stackoverflow.com/questions/541487/implements-runnable-vs-extends-thread-in-java

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

I agree with the suggestion but support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket

public void cleanIndex(boolean force) throws IOException, SQLException, SearchServiceException {
if (force) {
try {
getSolr().deleteByQuery(

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Why not use ":" as the query ?

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

it is safe in case the local installation has customized the search index in a way that we don't know so that it includes now things maybe managed and indexed by other classes/services.

@@ -1931,6 +2182,20 @@ protected DiscoverResult retrieveResult(Context context, DiscoverQuery query, Qu
return result;
}

public DiscoverResult.FacetResult getDiscoveryFacet(Context context, FacetField facetField,

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Method isn't used.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

removed cc11f49

System.out.println(head + ":" + (idx++) + " / " + size);
}
} catch (Exception e) {
e.printStackTrace();

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Can this be removed, as it is already logged below.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

the code doesn't exist anymore

indexContent(context, item, force);
context.uncacheEntity(item);
} catch (Exception ex) {
log.error("ERROR: identifier item:" + id + " identifier thread:" + head);

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Could you also log the stacktrace here ?

} catch (Exception ex) {
log.error("ERROR: identifier item:" + id + " identifier thread:" + head);
}
System.out.println(head + ":" + (idx++) + " / " + size);

This comment has been minimized.

Copy link
@KevinVdV

KevinVdV Mar 26, 2019

Can this end up in the logs instead of in the system out.

This comment has been minimized.

Copy link
@abollini

abollini Mar 28, 2019

Author Member

he code doesn't exist anymore

@cwilper

This comment has been minimized.

Copy link

cwilper commented Mar 27, 2019

Regarding the multi-threaded indexing, does it makes sense to do that as a distinct JIRA/PR pair? It doesn't seem essential to the functionality being introduced here, and would seem worth dedicated consideration.

I agree that moving toward something like spring executors would probably result in more robust thread management. I also think some dedicated testing would be good...at least to show/report that there's a throughput improvement for a typical site for the chosen default number of threads.

@abollini

This comment has been minimized.

Copy link
Member Author

abollini commented Mar 28, 2019

Hi all, thanks to all of you @benbosman @KevinVdV @cwilper @tdonohue for your review. I tried to put response with links in all the comment to speedup the validation process.
Please note that the PR is now open on the official DSpace repository so we should continue here our discussion DSpace#2391

@benbosman about your latest general comment #65 (comment)

We've agreed to rename BrowsableObject to IndexableObject in Java

done, see 4b85bf4 and 30899b0

We've agreed to rename "resultObject" to "indexableObject" in the discover REST result

done, see b98d8f4

We've agreed to make sure IndexableObject is implemented in Item, Collection, Community; and not in DSpaceObject

done, see 689ac4e

the changes to the Constants class are currently under discussion

I have preferred to avoid any changes on that. If we will end in replace the search.resoucetype field with a search.resourcename field we can drop the new constants at this time instead than introduce a double way to deal with this things now.
I don't see any issue in the Constants class to have ID for not DSpaceObject, it is not stated in any place that the scope of the Constants class was limited to DSpaceObject, moreover also in the authorization infrastructure of the REST API we need constants to identify the different kind of objects regardless to be them dspaceobject or not, see
https://github.com/DSpace/DSpace/blob/master/dspace-spring-rest/src/main/java/org/dspace/app/rest/repository/PoolTaskRestRepository.java#L81

If a change to the current implementation of the Constants class is needed it can be done in my opinion after the preview release or later.

@abollini abollini closed this Mar 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.