Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

announcement of curation 3.0 #40

Closed
wowasa opened this issue Mar 12, 2019 · 0 comments
Closed

announcement of curation 3.0 #40

wowasa opened this issue Mar 12, 2019 · 0 comments

Comments

@wowasa
Copy link
Collaborator

wowasa commented Mar 12, 2019

We started the development of curation 3.0 which will have the following features:

  1. Replacement of vaadin
    Beside the isolated cases of instance- and the profile analysis, the web-app of curation uses the vaadin framwork to display static content (reports generated 2-times per week as xml-files) in a dynamic way: means most of the view are created in the moment when the user access a certain page with the help of the framework, although the displayed content is static. This approach wastes resources and time.
    Hence we want the core module not only to generate the reports in xml format but also the HTML views (static pages for static content!). The two cases where we need to create the pages dynamically will be covered by a servlet which transforms XML (the report) to HTML, user-interaction like sorting and filtering is done by jquery, layout by CSS.

  2. Optimization of memory usage
    Currently the curation-core module needs between 2-4GB of heap space while generating the collection reports, since it accumulates the information of each singe CMDI instance in memory to generate the collection report in a final iteration. With some redesign we can pass the required information of each instance directly to the collection report, which would decrease the amount of memory dramatically.

  3. Establishment of multi-threading on the Java-level
    In the collection mode the current version of curation-core takes the path to a single collection directory as input parameter and generates a single collection report by analyzing all the files from this collection directory. Means the program has to be called for each collection, which is in our case done by a shell script. Multi-threading is established by the shell script, which runs a configurable number of processes. And for large collections (>10000 files) by the use of stream parallelization in Java.
    In curation 3.0 the multi-threading will be established in a configurable way on the Java level.
    This includes that coration-core further on is not processing one single collection anymore but it processes all collections descending from a given root.
    This approach has also the advantage that it enables curation-core to generate an overview of all collection results as it is needed on the collections view without the need re-read the collection reports from the file system again.

@coy123 coy123 closed this as completed Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants