Skip to content

Releases: Widen/tabitha

0.5.2

30 Apr 18:35
Compare
Choose a tag to compare

Changed

  • Change group ID that JARs are published under back to com.widen instead of com.widen.oss, which will be the standard group ID used going forward. Legacy JARs are available in the JCenter read-only archive under com.widen and in Maven Central under com.widen.oss.

0.5.1

29 Apr 15:18
Compare
Choose a tag to compare

Changed

  • Add toString as an explicit abstract method to Variant to make its usage and behavior more clear. (#35)
  • Upgrade Gradle from 4 to 6.
  • Update release publishing to publish to Maven Central directly instead of Bintray since Bintray is being retired by JFrog. All existing releases are also available in Maven Central independent of JCenter. (#36)
  • Migrate CI to GitHub Actions.

0.5.0

11 Oct 16:53
1e485ea
Compare
Choose a tag to compare

Added

  • You can now convert a Row to a Header with Header.fromRow(Row).
  • You can now get the name of the page a row was found in with Row.pageName(), if the source file supports it.
  • Added RowReader.withSequentialIndexes().
  • Added RowReader.withBlankRows().
  • New tabitha-json module, which provides a plugin that allows Tabitha to read newline separated JSON streams.

Changed

  • Renamed Row.page() to Row.pageIndex().
  • Multiple unnamed columns are now permitted inside a header, and the "inline headers" option will no longer throw a DuplicateColumnException when multiple blank column names are encountered.
  • The "inline headers" option no longer shifts row indexes by one.
  • The Row API has changed, with most constructors being replaced with easier to use static factory methods.

0.4.0

04 Oct 00:48
b19a652
Compare
Choose a tag to compare

Changed

  • Split Tabitha into multiple modules. Tabitha is now distributed as a tabitha-core module and additional plugin modules that add support for additional file formats. The old tabitha package is now deprecated and will not be updated. The new packages are distributed under the com.widen group ID.

0.3.0

19 Sep 21:40
2e2de6a
Compare
Choose a tag to compare

Added

  • RowReader can now be transformed into an async reactive stream of incoming rows by calling the appropriately-named rows() method. This can be used to implement map/reduce, transforms, and parallelism into your data processing with a few simple operators. These operations are provided by RxJava, which is now a dependency of Tabitha.
  • Readers now take a ReaderOptions which make it easier to customize runtime options for reading.
  • The page-related methods have been removed from RowReader and files are treated as a continuous stream of rows across all pages. To get data for specific pages, you can emulate the old behavior easily with rows() and either grouping by or filtering on the page number.
  • All Rows from a reader now "remember" their position in the source file. Check the page index and row index of the row using the page() and index() methods, respectively.

Changed

  • Quite a few classes have been renamed or moved around packages. The "entrypoint" classes RowReaderFactory and RowWriterFactory, have been shortened to RowReaders and RowWriters.
  • Row writers no longer work in terms of Rows, but instead write List<Variant> as rows. This makes it much easier to generate data in the right format for writing.
  • Creating a writer with an ambiguous format no longer assumes CSV; the format must be explicit.

Fixed

  • Fixed a bug in the XLSX reader for text cells with inline data instead of using the string table.

Removed

  • DataFrame has been removed.
  • Parallel processing utilities have been removed. This can be done using rows(), which exposes RxJava's much more powerful parallel processing abilities.

0.2.1

14 Sep 21:10
16d3f3d
Compare
Choose a tag to compare

Fixed

  • Fix an issue where opening a reader from an InputStream would cause the reader to start reading from a alter point in the file after format detection. (#20)

0.2.0

13 Sep 19:28
fb5bd04
Compare
Choose a tag to compare

Added

  • Add a new pagination API for navigating between multiple pages for formats that have pages, such as Excel workbooks.
  • Added transform() method for applying general transformations to row readers.

Changed

  • Renamed take() to limit().
  • Row API cleanup.

Working runner + bugfix

15 Jun 21:40
9dedc57
Compare
Choose a tag to compare

Not much changed in this release in regards to lines-of-code, but the changes are pretty important.

  • Bugfix: DelimitedRowReader and DelimitedRowWriter were not handling the close() method properly. This especially was an issue for writing, which did not guarantee to flush all rows written when closed.
  • Feature: tabitha-runner is now versioned and set up correctly for distribution. The runner is now packaged as a shadow jar and can be run independently. Distribution zips will now also be included here for regular releases.

First non-alpha release

07 Jun 20:42
2c8808e
Compare
Choose a tag to compare

A few things were cleaned up before the full 0.1.0 release, as well as a few features added that were in progress.

  • Added a command-line script runner. The runner can run any Groovy script, which will be able to use all Tabitha classes.
  • Rows can be copied much easier with the addition of Row#copyOf().
  • It is now easier to apply a function to a whole row with Row#map().
  • RowReader#EMPTY was renamed to RowReader#VOID and RowWriter#NULL renamed to RowWriter#VOID to improve consistency.
  • Added RowWriter#tee() for writting to multiple outputs simultaneously.
  • Fix errors when reading from boolean and blank Excel cell types.
  • Excel reader gives much more helpful error messages.
  • Updated code styling and JavaDoc comments.

This release is meant to be used to gather interest in Tabitha's development, though using it for critical applications is not recommended.

First alpha release

27 Apr 21:35
b97340e
Compare
Choose a tag to compare
First alpha release Pre-release
Pre-release

First Tabitha release! This release includes the following features implemented:

  • Buffered and in-memory data creation and reading using DataFrame
  • Column and row schema types
  • Reading from and writing to multiple types of data sets using RowReader and RowWriter
  • Functional combinators for row readers
  • Multithreaded row reader processing
  • Support for the following formats: CSV, TSV, XLSX, XLS

This is a development release and is not recommended for production environments. There could be significant issues in the API or bugs.