Skip to content

0.3.0

Compare
Choose a tag to compare
@sagebind sagebind released this 19 Sep 21:40
· 28 commits to master since this release
2e2de6a

Added

  • RowReader can now be transformed into an async reactive stream of incoming rows by calling the appropriately-named rows() method. This can be used to implement map/reduce, transforms, and parallelism into your data processing with a few simple operators. These operations are provided by RxJava, which is now a dependency of Tabitha.
  • Readers now take a ReaderOptions which make it easier to customize runtime options for reading.
  • The page-related methods have been removed from RowReader and files are treated as a continuous stream of rows across all pages. To get data for specific pages, you can emulate the old behavior easily with rows() and either grouping by or filtering on the page number.
  • All Rows from a reader now "remember" their position in the source file. Check the page index and row index of the row using the page() and index() methods, respectively.

Changed

  • Quite a few classes have been renamed or moved around packages. The "entrypoint" classes RowReaderFactory and RowWriterFactory, have been shortened to RowReaders and RowWriters.
  • Row writers no longer work in terms of Rows, but instead write List<Variant> as rows. This makes it much easier to generate data in the right format for writing.
  • Creating a writer with an ambiguous format no longer assumes CSV; the format must be explicit.

Fixed

  • Fixed a bug in the XLSX reader for text cells with inline data instead of using the string table.

Removed

  • DataFrame has been removed.
  • Parallel processing utilities have been removed. This can be done using rows(), which exposes RxJava's much more powerful parallel processing abilities.