Skip to content

Major DB -> Engine module cleanup work#1473

Merged
rcaudy merged 158 commits intodeephaven:mainfrom
rcaudy:rwc-dbsplit-2
Nov 30, 2021
Merged

Major DB -> Engine module cleanup work#1473
rcaudy merged 158 commits intodeephaven:mainfrom
rcaudy:rwc-dbsplit-2

Conversation

@rcaudy
Copy link
Member

@rcaudy rcaudy commented Oct 22, 2021

This is the bulkiest part of #261

Includes:

  • Migration away from io.deephaven.db package name and "Db" in class names
  • Standardization for the benchmark source set
  • Standardization for replicators and change to most Java code replication to rely on relative paths to .java files, rather than Class objects
  • Class and interface renames
  • Further package splitting for logical code organization
  • Multiple modules under engine/ and extensions/ with slimmed down dependencies
  • Fix the "null boolean as byte" representation (fixes Eliminate NULL_BOOLEAN_AS_BYTE != NULL_BYTE discrepancy, simplify code #927)
  • Rename Index to RowSet and fix the type hierarchy. Similarly OrderedKeys to RowSequence and RedirectionIndex to RowRedirection.
  • DbArray -> Vector
  • Migrate away from legacy table params and towards QST params where possible. Update python wrappers accordingly.
  • Make it possible to reference Chunk, Vector, DateTime, StringSet, with minimal dependencies.
  • Deleted some legacy dependencies. Cleaned up gradle somewhat.

@rcaudy rcaudy added this to the Oct 2021 milestone Oct 22, 2021
@rcaudy rcaudy self-assigned this Oct 22, 2021
@rcaudy rcaudy force-pushed the rwc-dbsplit-2 branch 3 times, most recently from 8e199de to f81c069 Compare November 8, 2021 20:45
…engine -> io.deephaven.engine, io.deephaven.benchmark.db -> io.deephaven.benchmark.engine
…es relative to the repository root instead of Class objects, in order to allow for module and package name mismatches.
…uto-refactoring removes but that replication produces)
… JavaDocs, and variable names. Some first draft package moves for RowSequence and related classes.
@rcaudy rcaudy modified the milestones: Oct 2021, Nov 2021 Nov 30, 2021
JamesXNelson
JamesXNelson previously approved these changes Nov 30, 2021
Copy link
Member

@JamesXNelson JamesXNelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the .github/workflows/nightly-benchmarks that I am a code-owner for, and it LGTM (simple :path:rename). Did not view the rest of it.

Copy link
Contributor

@nbauernfeind nbauernfeind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, I cannot review the entire thing. The glances here and there that I've taken all look great. The work I rebased on top is also going well -- so LGTM for merging..

Copy link
Member

@JamesXNelson JamesXNelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-adding approval on the .github file, now that nate (and CI) has given a more comprehensive "lgtm"

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very hard to review. Happy to triage issues later, given all checks are currently passing.

@rcaudy rcaudy merged commit 5c1b52c into deephaven:main Nov 30, 2021
@rcaudy
Copy link
Member Author

rcaudy commented Nov 30, 2021

  • Split the DB module into multiple sub-modules:
    • engine-chunk: Chunks and their utilities, now in package io.deephaven.chunk
    • engine-vector: Vectors (formerly DbArrays) and their utilities, now in package io.deephaven.vector
    • engine-function: Function libraries, now in package io.deephaven.function
    • engine-datetime: DateTime (formerly DbDateTime), Period (formerly DbPeriod) and their utilities, now in package io.deephaven.time
    • engine-stringset: StringSet, now in package io.deephaven.stringset
    • engine-tuple: Tuples, now in package io.deephaven.tuple
    • engine-updategraph: The UpdateGraphProcessor (formerly LiveTableMonitor), LogicalClock, liveness tracking system, and related sub-systems for DAG-based update processing, in package io.deephaven.engine.updategraph. UGP "requestRefresh" functionality has been greatly simplified as a "speed up the next cycle" feature.
    • engine-rowset: RowSet (formerly Index), RowSequence (formerly OrderedKeys) and their implementations and utilities in package io.deephaven.engine.rowset. TreeIndexImpl -> OrderedLongSet. Major refactoring of the Index class hierarchy with some new usage patterns.
    • engine-api: The core interfaces, including Table, TableMap, ColumnSource, and their superinterfaces and parameter structures in package io.deephaven.engine.table. QueryScope, QueryLibrary, etc. This is the primary compile-time dependency for engine usage. Table now favors table-api parameter structures in the public interface. Some overloads have been cleaned up, some deprecated methods have been deleted. by->groupBy and aggby. Legacy listeners are now called ShiftObvliviousListeners, and the Listener name applies to shift-aware listeners. Direct listeners have been retired. The public parts of DynamicTable are now methods on Table.
    • engine-base: A few implementation classes needed by engine-tuplesource and engine-table
    • engine-tuplesource: Non-ColumnSource TupleSource implementations, used as a runtime dependency by engine-table
    • engine-table: The implementation classes for Table, TableMap, and all existing operations, as well as many utilitity classes. This is the primary runtime dependency for engine usage. RedirectionIndex renamed RowRedirection. Many package moves and file renames.
    • engine-benchmark: JMH benchmarks for the engine and its utilitities
    • extensions-parquet-table: The Parquet integration with engine tables
    • extensions-csv: CSV integration with engine tables
  • Move other modules for standardization:
    • Kafka -> extensions-kafka
    • Parquet -> extensions-parquet-base
  • All replication code concentrated in new modules:
    • replication-util: The utility classes for replication
    • replication-static: Static replicators, which no longer have any compile-time dependency on their input classes, and instead rely entirely on file contents
    • replication-reflective: Reflective replications, which continue to have compile-time dependencies on their input classes
  • Null Booleans are now encoded as the NULL_BYTE instead of (byte) -1
  • Some utilities have been moved down to Util
  • Fix to Parquet grouping/indexing table output code
  • Get rid of dependencies on ant, forms_rt, packer, infoNode, mxj, janino, jacksonDatabind, and scala, and snakeyaml. Delete legacy gradle configurations formsRt, getDown, packer, infoNode, mxj, fishData, fishDataTest, dhDb, and dhDbTest.
  • Updated protos for ARRAY->GROUP agg name
  • Updated C++ API for ARRAY->GROUP agg name
  • Updated Python APIs for Table API changes
  • Pick up latest version of web UI to match DateTime name and package change
  • Too many small things to list...

Co-authored-by: Colin Alworth colinalworth@deephaven.io
Co-authored-by: Corey Kosak coreykosak@deephaven.io
Co-authored-by: Devin Smith devinsmith@deephaven.io

@github-actions github-actions bot locked and limited conversation to collaborators Nov 30, 2021
@rcaudy rcaudy deleted the rwc-dbsplit-2 branch November 30, 2021 19:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eliminate NULL_BOOLEAN_AS_BYTE != NULL_BYTE discrepancy, simplify code

6 participants