Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic caching of bounding boxes #136

Merged
merged 3 commits into from
Feb 1, 2021
Merged

Conversation

sudiptoguha
Copy link
Contributor

Description of changes: adds capability of dynamically adjusting the fraction of bounding boxes to saved. The remainder of boxes are recomputed as needed. Setting a low number reduces the in memory footprint but reduces throughput.

@jotok jotok added this to In progress in Random Cut Forest v2 via automation Jan 31, 2021
@jotok jotok merged commit b5b974b into aws:compact-rcf Feb 1, 2021
Random Cut Forest v2 automation moved this from In progress to Done Feb 1, 2021
sudiptoguha added a commit that referenced this pull request Jun 9, 2021
* Sampler tree (#56)

changes

* Cleanup of State and ser/de test (#60)

This is a change to CompactRCF branch. Moves the construction of state into the forest class. Updates ser/de tests to reflect that.

* convertor to compact + compression (#61)

* Changing both samplers to use floats for weights (#63)

* Add new interfaces and classes for mapping between models and states

* Modify the forest constructor to create a list of component models (#65)

This is a first step toward removing state from executor classes.
Eventually all state should be owned by the forest, the update
coordinator, or the component models. This change modifies the forest
constructor to initialize component models, but those models are not yet
in the execution path.

* Update executors to use ComponentList (#66)

Update the executor classes to operate on the generic ComponentList
class instead of a list of SamplerPlusTree.

* Implement forest deserialization in RandomCutForestMapper (#67)

Complete the updated serialiazation implemenation using
IConextualStateMapper for the RandomCutForest class.

* Compact rcf (#69)

Create the INodeView interface to decouple visitors from the concrete Node class

* Add benchmark class for serialization

* Add INodeView implementation for compact trees (#71)

Add INodeView implementation for compact trees and change the delete method to be more efficient.

* Optimizations for the addPoint operation in compact trees (#72)

* Compact rcf (#74)

refactor tree

* Introducing AbstractCompactRandomCutTree to enable float[] alongside double[] (#75)

* Add lombok.config to repository (#94)

Add lombok.config and set the property addLombokGeneratedAnnotation to
true. This will make it so that jacoco ignores lombok generated code
when computing test coverage.

* Add support for single and double precision in compact trees (#76)

* Fix javadoc warnings

Update javadocs in classes that were producing warnings.

Closes #89

* Improvements to sampler package

* Standardize field names in CompactSampler
* Update Javadocs
* Add ability to validate the heap property in CompactSampler
* Move common methods to the IStreamSampler interface
* Add @DaTa annotation to Weighted class and remove unused methods
* Improve package test coverage
* Set fields in CompactSamplerState to private

* Updates to forest/builder configuration

* Change the singlePrecisionEnabled flag to use a new Precision enum
* Remove logic from the builder that overrides user options. Instead
  we will fail fast in the constructor.

Closes #77 #82

* Remove the Sequential interface

This change makes the sequence index an explicit argument to the sampler
and tree interfaces. As a result we can remove the Sequential container
class. Other changes:

* There is a new interface ISampled that which is used in the return type
  of getSamples() and getEvictedPoint().
* The Weighted container class implements ISampled
* The signatures of acceptSample and addSample are changed to remove
  redundancy and to avoid leaking implementation details. In particular,
  acceptSample no longer returns a weight, because the use of weights is a
  feature of time-decayed sampling, but not a requirement for stream
  sampling.
* Updates to method names in IStreamSampler to more clearly reflect use.

Closes #80

* Unify pointer-based and compact trees (#103)

Create new interfaces and abstract base classes to contain common logic between the pointer-based and compact tree implementations.

* removing arbitrary nodeview (#108)

Remove the getNodeView method from the INodeView interface and add it to the abstract tree base class. Additionally, change the visibility of tree methods related to tree structure from package-private to protected.

Closes #106

* Change IUpdatable interface to not return nested Optional (#110)

Rename UpdateReturn to UpdateResult, and add a new method
isStateChange() that can be called to determine if the update changed
the model.

Closes #81

* Normalize constructors for state and mapper classes

This change makes it so that all state and mapper classes have a single
no-argument constructor and settable fields. Additionally, new test
cases are added to improve test coverage in the state package.

Closes #90

* Various changes to make mapper classes work in a consistent way

* Have samplers serialize their own lambda and sample size values
* Create a new mapper for CompactNodeManager
* Update the CompactRandomCutTreeState class to contain a node manager
  state instead of leaf and node store states
* Revert CompactNodeManager to work with ILeafStore and INodeStore
  interfaces instead of implementations
* PointStoreCoordinator returns an IPointStore instead of a concrete
  class
* Update mapper classes to respect the bounding box cache setting

Closes #115

* Fix bounding box calculation and update caching algorithm (#118)

* Fix bounding box calculation in compact trees
* Add bounding box caching to pointer-based trees

* Remove compact node manager class

In a previous revision, we introducted the CompactNodeManager class to
unify the handling of leaf nodes and internal nodes. The initial
implementation was a leaky abstraction, because the tree class was still
responsible for managing some node properties, like bounding boxes.
The change moves node management back into the tree class.

* Add support for serializing single-precision points

Closes #78 #87

* RandomCutForest mapper should keep single precision when mapping to state

* Refactor RandomCutForestSerDe

Refactor the RandomCutForestSerDe class to operate directly on
RandomCutForest objects instead of RandomCutForestState objects.
Additionally, refactor the RandomCutForestSerDe unit tests to remove
assertions that make more sense as a functional test in the core package
or as a benchmark.

* Support mapping CompactSampler to state class without copy

In addition, the copy flag is added to other applicable mapper classes,
but the behavior (mapping without copy) is only implemented for
CompactSamplerMapper in this commit.

* fixing size restrictions (#134)

* fixing size restrictions

* cleanup

* Add new examples module

Add new module with serialization examples and runner.

* Add main method to examples

Add a main method to examples to they can be easily invoked from an IDE.

* Dynamic caching of bounding boxes (#136)

Add support for setting the bounding box cache fraction at runtime.

* Remove the serialization module from the project.

Closes #128

* Add protostuff benchmark

* Reduce the sensitivy of the single-precision consistency test

Reduce the sensitivity of testConsistentScoringSinglePrecision by
increasing the delta used when comparing score values.

* dynamic samplers (#141)

* smart copy (#143)

* smart copy

* fixes

* cleanup of INodeView (#145)

* Return a valid int value in getRootIndex

Explicitly coerce null to an int value in getRootIndex

Closes #142

* Split imputation into two methods and fix bounding box computation (#149)

* Split the imputation method into two methods: the first method collects the imputation proposal from each tree, and the second method uses that result to construct the point estimate. The list of proposals may be of independent interest to users, hence we are separating it into its own public API call.
* Fixed a bug in the conditional logic for constructing bounding boxes.

Closes #148

* updated pointstore and shingle aware RCF (#152)

* updated pointstore

* adding tests

* cleanup

* changes

* spotless

* Exposing sequence information to PointStore internally (#154)

Updating forest and point store APIs to take a sequence index parameter.

* cleanup of point sum and allowing sequenceIndices to be non-unique (#155)

* cleanup of point sum and allowing sequenceIndices to be non-unique

* fixes

* Add generic parameter to IUpdateCoordinator for point type (#157)

* type fixes

* spotless woes

* cleanup

* javadocs fix

* Add support for dynamically configurable models

Add support for configuring component models by setting name. This
change allows us to add new configurable settings in a backwards-compatible way.

* Add convenience methods for short, long, and boolean

* Rename IUpdateCoordinator to IStateCoordinator (#160)

* updating samplers (#161)

* updating samplers

* fixes

* fixes + spotless

* changing small stores (#163)

* changing small stores

* spotless

* fixes

* Automatically apply spotless formatting instead of failing the build (#164)

* simplification of trees (#165)

* simplification of trees

* fixing builders and builderers

* fixes

* PointStore update (#166)

Update the point store to support internal shingling and compaction.

* Tree compression added (#168)

Additional logical compression to serialized tree states.

* Remove unused classes (#171)

Removed several classes that were no longer being used, like SmallIndexManager, SmallLeafStore, and related classes.

* Store point data as bytes in PointStoreState (#174)

* Store point data as bytes in PointStoreState

* Fix upper bound in for-loop in unpackInts

* single precision cuts (#175)

* single precision cuts

* adding shingling tests

* fixes

* Propagate initial accept fraction through array sampler converter (#177)

* Add object graph size profiler (#180)

* tree constructors and changing enableCache to numeric parameter (#182)

* refactoring caching logic (#183)

* refactoring caching logic

* fixes

* Update the `isOutputReady()` method to check component models (#184)

Add a new isOutputReady method to the ITraversable interface. The forest
will now check all component models for readiness, instead of waiting
for a fixed number of updates.

Closes #181

* Update CompactNodeView to operate directly on box cache and node store (#185)

* Standardize property names in model classes (#191)

- Change property `compactEnabled` to `compact`.
- Change Preceision enum values from `SINGLE` and `DOUBLE` to `FLOAT_32`
  and `FLOAT_64`.
- Use `centerOfMassEnabled` instead of `enableCenterOfMass`
- Change `lambda` property to `timeDecay`.
- Use `rng` instead of `random` for instance fields of type `Random`.
- Change `getSequenceIndexOfMostRecentLambdaUpdate` to
  `mostRecentTimeDecayUpdate`.

* Normalize names in the state package (#193)

- Match names in the corresponding model classes.
- Use the "Enabled" suffix for boolean flags where it improves
  readability.
- Use "compressed" in state classes versus "isCompressionEnabled" in
  mapper classes.
- Use "isPartialTreeState" in state classes versus
  "isPartialTreeStateEnabled" in mapper classes.
- Store precision as a String instead of a boolean.

* Change field visibility in BoxCache and NodeStore (#197)

- Lower the visibility of fields in BoxCache and NodeStore to private or
  protected
- Rename methods in the NodeStore class: change `getParentIndex` to
  `deriveParentIndex`, and change `getParent` to `getParentIndex`.

* Standardize on `random` instead of `rng` (#202)

Closes #199

* reference test in compact trees (#203)

* Rename ExecutionContext and add mode string (#204)

Closes #200
Closes #201

* using point reference for two point boxes (#205)

* enabling projections (#208)

* enabling projections

* Fixes to tree projections and V1 JSON converter (#211)

* Add utility for converting from V1 JSON to V2 State class

* Change package names and add tests for the JSON converter

* fixes

* fixes to converter

* cleanup

Co-authored-by: Joshua Tokle <joshtok@amazon.com>

* Multiple small changes (#214)

* Mutiple small changes

- Update version number to 2.0
- Make compact mode the default
- Add a placeholder "mode" string to RandomCutForest for forward
  compatibility.
- Change license header back to using 2020 until we decide what to do
  about spotless configuration. We don't want to update every existing
  source fill to 2021.

Closes #130
Closes #209
Closes #212

* pointsum fix

Co-authored-by: Joshua Tokle <joshtok@amazon.com>

* Create an IVisitorFactory interface (#216)

In a previous commit (#208) we added support for projections, which
added a new method to the VisitorFactory class. This change creates an
IVisitorFactory interface which is implemented VisitorFactory, giving
users the option to create specialized implementations.

* full reproducibility on serialization (#220)

* full reproducibility on serialization

* fixes

* Rename remaining occurrences of lambda to timeDecay (#228)

* Remove alternative constructors from classes with builders (#229)

Closes #189

Co-authored-by: Sudipto Guha <sudipto@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants