announcement of curation 3.0 #40

wowasa · 2019-03-12T11:30:54Z

We started the development of curation 3.0 which will have the following features:

Replacement of vaadin
Beside the isolated cases of instance- and the profile analysis, the web-app of curation uses the vaadin framwork to display static content (reports generated 2-times per week as xml-files) in a dynamic way: means most of the view are created in the moment when the user access a certain page with the help of the framework, although the displayed content is static. This approach wastes resources and time.
Hence we want the core module not only to generate the reports in xml format but also the HTML views (static pages for static content!). The two cases where we need to create the pages dynamically will be covered by a servlet which transforms XML (the report) to HTML, user-interaction like sorting and filtering is done by jquery, layout by CSS.
Optimization of memory usage
Currently the curation-core module needs between 2-4GB of heap space while generating the collection reports, since it accumulates the information of each singe CMDI instance in memory to generate the collection report in a final iteration. With some redesign we can pass the required information of each instance directly to the collection report, which would decrease the amount of memory dramatically.
Establishment of multi-threading on the Java-level
In the collection mode the current version of curation-core takes the path to a single collection directory as input parameter and generates a single collection report by analyzing all the files from this collection directory. Means the program has to be called for each collection, which is in our case done by a shell script. Multi-threading is established by the shell script, which runs a configurable number of processes. And for large collections (>10000 files) by the use of stream parallelization in Java.
In curation 3.0 the multi-threading will be established in a configurable way on the Java level.
This includes that coration-core further on is not processing one single collection anymore but it processes all collections descending from a given root.
This approach has also the advantage that it enables curation-core to generate an overview of all collection results as it is needed on the collections view without the need re-read the collection reports from the file system again.

coy123 closed this as completed Jun 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

announcement of curation 3.0 #40

announcement of curation 3.0 #40

wowasa commented Mar 12, 2019

announcement of curation 3.0 #40

announcement of curation 3.0 #40

Comments

wowasa commented Mar 12, 2019