Skip to content

@codingchili codingchili released this Nov 26, 2019

Fixes

  • #86 Feature to specify ES pipeline contributed by @octaavio

Now tested on ElasticSearch 7.4.0 / 7.4.2.

Assets 3

@codingchili codingchili released this Feb 19, 2019 · 11 commits to master since this release

Fixes

  • CSVParser: cast ByteBuffers to 'Buffer' to avoid JDK8/9 incompatibilities.

Now works on JRE8 -> JRE11.

Assets 3

@codingchili codingchili released this Feb 11, 2019 · 15 commits to master since this release

New Features

  • none.

Fixes

  • upgrade to apache poi 3.17 -> 4.0.1.
  • upgrade vertx 3.5.4 -> 3.6.3
  • new theme.
  • new docker image (upgraded from 1.3.3 -> 1.3.5)
Assets 3

@codingchili codingchili released this Nov 29, 2018 · 20 commits to master since this release

New Features

  • import index can be locked in the web interface through configuration.

Fixes

  • fixes some issues with line endings for CSV imports.
Assets 3

@codingchili codingchili released this Nov 18, 2018 · 23 commits to master since this release

Verified support for ElasticSearch 7.0.0-alpha1.

New features

  • support all configuration options as environment variables for Docker image.
  • the default index is now configurable.
  • support for basepath/reverse proxy - updated resources/websock URLs to relative.

Fixes

  • now uses LF instead of CR to identify line breaks in CSV.
  • exceptions when there is no desktop environment are logged as a warning instead of a severe/stack trace.

CSV imports should now work much better, tested with sample insurance portfolio here:

Assets 4

@codingchili codingchili released this Oct 30, 2018 · 39 commits to master since this release

New features

  • now shows "verifying" in the website when running the verification task (*).
  • improved performance of CSV verification
  • no longer deletes excel files after import is complete.
  • hide excel options on the website.
  • show registered file extensions on the website.
  • new colorful theme.

Issues resolved

  • fixed some bugs in the new CSV parser that caused buffer overflows etc.
  • removed upper limit of CSV file size, uses an array of memory maps.
  • when running the verification task the system parses the whole file once as fast as it can to validate that
    the file is properly formatted. This is done before the import starts, to make sure its able to parse the whole file.
Assets 4

@codingchili codingchili released this Oct 28, 2018 · 52 commits to master since this release

New features

  • docker support - now on dockerhub codingchili/excelastic
  • deletes xlsx/xls files on the server after parsing. *
  • closes workbooks/files when import is completed.
Assets 4

@codingchili codingchili released this Oct 28, 2018 · 58 commits to master since this release

New Features

  • support for importing CSV files.
  • support for registering custom parsers through ParserFactory.
  • upgraded vertx dependency from 3.5.1 to 3.5.4.
Assets 4

@codingchili codingchili released this Apr 28, 2018 · 60 commits to master since this release

Background

Changes to performance

Slightly reduces the memory consumption required by not parsing the full excel into JSON objects at once. With this release we will parse the excel file two times. The first time is to make sure that the file is well formatted before we start importing it. This does not actually create any JSON objects on the heap to save memory. The yield is minimal however, as Apache POI which is used to parse xlsx/xls files consume the majority of the available memory.

Other performance improvements includes concurrent parsing and indexing. While we are waiting for a response we will parse the next N number of rows to be indexed. When the request completes (for each 128 imported items) we check the response code and start indexing the next N items which will already be parsed. Additionally, while parsing each produced JSON object will be streamed into a chunked connection to the ElasticSearch server. This means we can parse the excel file in buckets and still only need to reference 1 JSON object at a time. Additionally (again), we generate the header that is required for each imported element only once per import.

In order to accomplish this we have significantly simplified the source code and documented it accordingly. We added RxJava and turned the FileParser into an observable. Added a new event bus codec so that we can pass the new ImportEvent type over the event bus without serializing it.

A summary of changes

  • added RxJava and turned FileParser into an observable
  • Renamed 'parsing' into 'uploading' in the UI - 90% of the time is actually spent uploading the file.
  • Moved all logging statements into a single class for readability.
  • Replaced custom Atomic reference with java's AtomicReference
  • Moved the CommandLine importer into its own controller.
  • Encapsulate a request into an ImportEvent and pass it over the bus with a custom codec.
  • Cleaned up the code and added javadoc to all classes.

And a summary of the summary

This release includes performance improvements as well as improvements to code quality.

Assets 4

@codingchili codingchili released this Mar 31, 2018 · 79 commits to master since this release

Changes

  • support for enabling TLS when indexing to elasticsearch. (see: elastic_tls).
Assets 3
You can’t perform that action at this time.