-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS-4166 Index workspace, workflow and tasks in SOLR #65
Conversation
Boot are fighting over versions.
This should work without altering Solr, across Solr releases, as long as Solr ships the necessary additional analyzers in /contrib.
…gure Solr for testing.
Solr not DSpace. Break loooong tags into attribute-per-line format.
sample schema. Tidy indentation, break very long elements into multiple lines.
probably not used.
…ting. Some of these may be unnecessary, but I don't know on which we facet.
…woodiupui-DS-3695
@@ -111,6 +127,8 @@ public static void main(String[] args) throws SQLException, IOException, SearchS | |||
options.addOption(OptionBuilder.isRequired(false).withDescription( | |||
"optimize search core").create("o")); | |||
|
|||
options.addOption("e", "readfile", true, "Read the identifier from a file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These identifiers need to be items, best to alter the documentation here to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option removed as it is out of scope of this PR
} | ||
indexer.updateIndex(context, ids, line.hasOption("f"), type); | ||
} catch (Exception e) { | ||
log.error("Error: " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error log here should also log the stacktrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option removed, the code doesn't exist anymore
indexContent(context, community, force); | ||
public void updateIndex(Context context, List<UUID> ids, boolean force, int type) { | ||
if (type != Constants.ITEM) { | ||
throw new RuntimeException("Only ITEM is supported in this mode - type founded: " + type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code doesn't exist anymore
} | ||
break; | ||
default: | ||
throw new RuntimeException("No type known: " + type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an IllegalArgumentException not make more sense here instead of a generic "RuntimeException" as this is what it really is. An illegal Argument was passed along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 5e3164b
ids.add(wfi.getItem().getID()); | ||
} | ||
} | ||
List<UUID>[] arrayIDList = Util.splitList(ids, numThreads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is called upon multiple times from a single process the number of threads that can be active at the same time are not limited to the "numThreads" variable. This could really lead to memory leaks where we have way too many threads running at the same time. It would really be best to use a spring TaskExecutor: https://docs.spring.io/spring/docs/4.3.x/spring-framework-reference/html/scheduling.html. This will ensure that the number of threads started by the discovery process is limited to the "numThreads" variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket
threads.add(thread); | ||
} | ||
boolean finished = false; | ||
while (!finished) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USe the spring TaskExecutors to find a better way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket
document.addField("namedresourcetype_keyword", fvalue); | ||
} | ||
|
||
class IndexerThread extends Thread { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should really implement the Runnable class instead of extending thread as this is the preferred way to work with threads in java.
https://stackoverflow.com/questions/541487/implements-runnable-vs-extends-thread-in-java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the suggestion but support for multi-threads has been withdrawn from this PR as suggested by @cwilper #65 (comment) and will be eventually introduced in a separate PR/ticket
public void cleanIndex(boolean force) throws IOException, SQLException, SearchServiceException { | ||
if (force) { | ||
try { | ||
getSolr().deleteByQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use ":" as the query ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is safe in case the local installation has customized the search index in a way that we don't know so that it includes now things maybe managed and indexed by other classes/services.
@@ -1931,6 +2182,20 @@ protected DiscoverResult retrieveResult(Context context, DiscoverQuery query, Qu | |||
return result; | |||
} | |||
|
|||
public DiscoverResult.FacetResult getDiscoveryFacet(Context context, FacetField facetField, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method isn't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed cc11f49
System.out.println(head + ":" + (idx++) + " / " + size); | ||
} | ||
} catch (Exception e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed, as it is already logged below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code doesn't exist anymore
indexContent(context, item, force); | ||
context.uncacheEntity(item); | ||
} catch (Exception ex) { | ||
log.error("ERROR: identifier item:" + id + " identifier thread:" + head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also log the stacktrace here ?
} catch (Exception ex) { | ||
log.error("ERROR: identifier item:" + id + " identifier thread:" + head); | ||
} | ||
System.out.println(head + ":" + (idx++) + " / " + size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this end up in the logs instead of in the system out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
he code doesn't exist anymore
Regarding the multi-threaded indexing, does it makes sense to do that as a distinct JIRA/PR pair? It doesn't seem essential to the functionality being introduced here, and would seem worth dedicated consideration. I agree that moving toward something like spring executors would probably result in more robust thread management. I also think some dedicated testing would be good...at least to show/report that there's a throughput improvement for a typical site for the chosen default number of threads. |
… the discover REST result
DS-3851 Endpoint to interact with the workflow
…nly where really needed
c1e2d95
to
a8190fe
Compare
Hi all, thanks to all of you @benbosman @KevinVdV @cwilper @tdonohue for your review. I tried to put response with links in all the comment to speedup the validation process. @benbosman about your latest general comment #65 (comment)
done, see b98d8f4
done, see 689ac4e
I have preferred to avoid any changes on that. If we will end in replace the search.resoucetype field with a search.resourcename field we can drop the new constants at this time instead than introduce a double way to deal with this things now. If a change to the current implementation of the Constants class is needed it can be done in my opinion after the preview release or later. |
This PR is to share the implementation of the indexing features required for the implementation of the MyDSpace.
Integration Tests have been added to the DiscoverControllerIT to demonstrate:
PR to highlight the small changes required to the Discover (search) endpoint DSpace/RestContract#55
Namely the change of the embedded dspaceObject to rObject (result object) as now also workspace, workflow and tasks are returned
TO DO: