Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 40 million developers.Sign up
- upgrade to apache poi 3.17 -> 4.0.1.
- upgrade vertx 3.5.4 -> 3.6.3
- new theme.
- new docker image (upgraded from 1.3.3 -> 1.3.5)
- import index can be locked in the web interface through configuration.
- fixes some issues with line endings for CSV imports.
Verified support for ElasticSearch 7.0.0-alpha1.
- support all configuration options as environment variables for Docker image.
- the default index is now configurable.
- support for basepath/reverse proxy - updated resources/websock URLs to relative.
- now uses LF instead of CR to identify line breaks in CSV.
- exceptions when there is no desktop environment are logged as a warning instead of a severe/stack trace.
CSV imports should now work much better, tested with sample insurance portfolio here:
- now shows "verifying" in the website when running the verification task (*).
- improved performance of CSV verification
- no longer deletes excel files after import is complete.
- hide excel options on the website.
- show registered file extensions on the website.
- new colorful theme.
- fixed some bugs in the new CSV parser that caused buffer overflows etc.
- removed upper limit of CSV file size, uses an array of memory maps.
- when running the verification task the system parses the whole file once as fast as it can to validate that
the file is properly formatted. This is done before the import starts, to make sure its able to parse the whole file.
- docker support - now on dockerhub codingchili/excelastic
- deletes xlsx/xls files on the server after parsing. *
- closes workbooks/files when import is completed.
- csv files cannot be deleted yet - see https://bugs.java.com/view_bug.do?bug_id=4724038
- support for importing CSV files.
- support for registering custom parsers through ParserFactory.
- upgraded vertx dependency from 3.5.1 to 3.5.4.
Changes to performance
Slightly reduces the memory consumption required by not parsing the full excel into JSON objects at once. With this release we will parse the excel file two times. The first time is to make sure that the file is well formatted before we start importing it. This does not actually create any JSON objects on the heap to save memory. The yield is minimal however, as Apache POI which is used to parse xlsx/xls files consume the majority of the available memory.
Other performance improvements includes concurrent parsing and indexing. While we are waiting for a response we will parse the next N number of rows to be indexed. When the request completes (for each 128 imported items) we check the response code and start indexing the next N items which will already be parsed. Additionally, while parsing each produced JSON object will be streamed into a chunked connection to the ElasticSearch server. This means we can parse the excel file in buckets and still only need to reference 1 JSON object at a time. Additionally (again), we generate the header that is required for each imported element only once per import.
In order to accomplish this we have significantly simplified the source code and documented it accordingly. We added RxJava and turned the FileParser into an observable. Added a new event bus codec so that we can pass the new ImportEvent type over the event bus without serializing it.
A summary of changes
- added RxJava and turned FileParser into an observable
- Renamed 'parsing' into 'uploading' in the UI - 90% of the time is actually spent uploading the file.
- Moved all logging statements into a single class for readability.
- Replaced custom Atomic reference with java's AtomicReference
- Moved the CommandLine importer into its own controller.
- Encapsulate a request into an ImportEvent and pass it over the bus with a custom codec.
- Cleaned up the code and added javadoc to all classes.
And a summary of the summary
This release includes performance improvements as well as improvements to code quality.