-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact rcf #103
Compact rcf #103
Conversation
3db2113
to
d3bf1b7
Compare
// ITree<double[]> tree = | ||
// RandomCutTree.builder().storeSequenceIndexesEnabled(storeSequenceIndexesEnabled) | ||
// .centerOfMassEnabled(centerOfMassEnabled).randomSeed(rng.nextLong()).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// checkArgument(builder.boundingBoxCachingEnabled || builder.compactEnabled, | ||
// "bounding box caching is only supported for compact trees"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* @param index node | ||
*/ | ||
void increaseMassOfAncestorsAndItselfRecursively(int index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably just be increaseMass
(or incrementMass
if we always only increase by 1). The fact that increasing the mass of a child node should also increase the mass of a parent is implicit in the tree structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - agree with the incrementMass. But for decrement, we check for 0 (at leaf) and only proceed one node at a time primarily because the delete is simpler. Do we want to say adjustMass for the increment?
*/ | ||
|
||
@Override | ||
public abstract boolean pointEquals(int index, Point point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the abstract methods in this class, are they already defined in the INodeStore interface? I think that if you define an abstract class that implements an interface, any of the interface methods that aren't implemented will automatically be treated as abstract and don't need to be repeated.
rangeSum = sum; | ||
} | ||
|
||
public abstract AbstractBoundingBox<Point> copyBox(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be copy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
@Override | ||
public BoundingBox addBox(AbstractBoundingBox<double[]> otherBox) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this a void
method to clearly indicate that it mutates the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look into this. Right now the minValues, maxValues are final, and for a point those are shared. If we change the final, then we can easily do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually addBox is not even used anywhere, so deleting this function.
@@ -300,7 +188,7 @@ public boolean equals(Object other) { | |||
return false; | |||
} | |||
|
|||
BoundingBox otherBox = (BoundingBox) other; | |||
AbstractBoundingBox<double[]> otherBox = (AbstractBoundingBox<double[]>) other; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw a class cast exception if you try to compare an AbstractBoundingBox<double[]>
to an AbstractBoundingBox<float[]>
.
We already checked that other
was an instance of BoundingBox
, so we can cast to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -303,7 +203,7 @@ public boolean equals(Object other) { | |||
return false; | |||
} | |||
|
|||
BoundingBoxFloat otherBox = (BoundingBoxFloat) other; | |||
AbstractBoundingBox<float[]> otherBox = (AbstractBoundingBox<float[]>) other; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast to BoundingBoxFloat instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
public Integer addNode(Integer parent, Integer leftChild, Integer rightChild, int cutDimension, double cutValue, | ||
int mass) { | ||
int parentIndex = (parent != null) ? parent.intValue() : NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: define a method
protected static int intValue(Integer ref) {
return ref == null ? ref.intValue() : NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref != null? Done
* An interface for describing access to the node store for different types of | ||
* trees. | ||
*/ | ||
public class CompactNodeManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mixing of null
and NULL
makes this code hard to understand. We should try to reduce or eliminate the use of null
. I'll create an issue to follow up after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. We need the pointer type and null to support PointerTrees (or equivalent name)
* Sampler tree (#56) changes * Cleanup of State and ser/de test (#60) This is a change to CompactRCF branch. Moves the construction of state into the forest class. Updates ser/de tests to reflect that. * convertor to compact + compression (#61) * Changing both samplers to use floats for weights (#63) * Add new interfaces and classes for mapping between models and states * Modify the forest constructor to create a list of component models (#65) This is a first step toward removing state from executor classes. Eventually all state should be owned by the forest, the update coordinator, or the component models. This change modifies the forest constructor to initialize component models, but those models are not yet in the execution path. * Update executors to use ComponentList (#66) Update the executor classes to operate on the generic ComponentList class instead of a list of SamplerPlusTree. * Implement forest deserialization in RandomCutForestMapper (#67) Complete the updated serialiazation implemenation using IConextualStateMapper for the RandomCutForest class. * Compact rcf (#69) Create the INodeView interface to decouple visitors from the concrete Node class * Add benchmark class for serialization * Add INodeView implementation for compact trees (#71) Add INodeView implementation for compact trees and change the delete method to be more efficient. * Optimizations for the addPoint operation in compact trees (#72) * Compact rcf (#74) refactor tree * Introducing AbstractCompactRandomCutTree to enable float[] alongside double[] (#75) * Add lombok.config to repository (#94) Add lombok.config and set the property addLombokGeneratedAnnotation to true. This will make it so that jacoco ignores lombok generated code when computing test coverage. * Add support for single and double precision in compact trees (#76) * Fix javadoc warnings Update javadocs in classes that were producing warnings. Closes #89 * Improvements to sampler package * Standardize field names in CompactSampler * Update Javadocs * Add ability to validate the heap property in CompactSampler * Move common methods to the IStreamSampler interface * Add @DaTa annotation to Weighted class and remove unused methods * Improve package test coverage * Set fields in CompactSamplerState to private * Updates to forest/builder configuration * Change the singlePrecisionEnabled flag to use a new Precision enum * Remove logic from the builder that overrides user options. Instead we will fail fast in the constructor. Closes #77 #82 * Remove the Sequential interface This change makes the sequence index an explicit argument to the sampler and tree interfaces. As a result we can remove the Sequential container class. Other changes: * There is a new interface ISampled that which is used in the return type of getSamples() and getEvictedPoint(). * The Weighted container class implements ISampled * The signatures of acceptSample and addSample are changed to remove redundancy and to avoid leaking implementation details. In particular, acceptSample no longer returns a weight, because the use of weights is a feature of time-decayed sampling, but not a requirement for stream sampling. * Updates to method names in IStreamSampler to more clearly reflect use. Closes #80 * Unify pointer-based and compact trees (#103) Create new interfaces and abstract base classes to contain common logic between the pointer-based and compact tree implementations. * removing arbitrary nodeview (#108) Remove the getNodeView method from the INodeView interface and add it to the abstract tree base class. Additionally, change the visibility of tree methods related to tree structure from package-private to protected. Closes #106 * Change IUpdatable interface to not return nested Optional (#110) Rename UpdateReturn to UpdateResult, and add a new method isStateChange() that can be called to determine if the update changed the model. Closes #81 * Normalize constructors for state and mapper classes This change makes it so that all state and mapper classes have a single no-argument constructor and settable fields. Additionally, new test cases are added to improve test coverage in the state package. Closes #90 * Various changes to make mapper classes work in a consistent way * Have samplers serialize their own lambda and sample size values * Create a new mapper for CompactNodeManager * Update the CompactRandomCutTreeState class to contain a node manager state instead of leaf and node store states * Revert CompactNodeManager to work with ILeafStore and INodeStore interfaces instead of implementations * PointStoreCoordinator returns an IPointStore instead of a concrete class * Update mapper classes to respect the bounding box cache setting Closes #115 * Fix bounding box calculation and update caching algorithm (#118) * Fix bounding box calculation in compact trees * Add bounding box caching to pointer-based trees * Remove compact node manager class In a previous revision, we introducted the CompactNodeManager class to unify the handling of leaf nodes and internal nodes. The initial implementation was a leaky abstraction, because the tree class was still responsible for managing some node properties, like bounding boxes. The change moves node management back into the tree class. * Add support for serializing single-precision points Closes #78 #87 * RandomCutForest mapper should keep single precision when mapping to state * Refactor RandomCutForestSerDe Refactor the RandomCutForestSerDe class to operate directly on RandomCutForest objects instead of RandomCutForestState objects. Additionally, refactor the RandomCutForestSerDe unit tests to remove assertions that make more sense as a functional test in the core package or as a benchmark. * Support mapping CompactSampler to state class without copy In addition, the copy flag is added to other applicable mapper classes, but the behavior (mapping without copy) is only implemented for CompactSamplerMapper in this commit. * fixing size restrictions (#134) * fixing size restrictions * cleanup * Add new examples module Add new module with serialization examples and runner. * Add main method to examples Add a main method to examples to they can be easily invoked from an IDE. * Dynamic caching of bounding boxes (#136) Add support for setting the bounding box cache fraction at runtime. * Remove the serialization module from the project. Closes #128 * Add protostuff benchmark * Reduce the sensitivy of the single-precision consistency test Reduce the sensitivity of testConsistentScoringSinglePrecision by increasing the delta used when comparing score values. * dynamic samplers (#141) * smart copy (#143) * smart copy * fixes * cleanup of INodeView (#145) * Return a valid int value in getRootIndex Explicitly coerce null to an int value in getRootIndex Closes #142 * Split imputation into two methods and fix bounding box computation (#149) * Split the imputation method into two methods: the first method collects the imputation proposal from each tree, and the second method uses that result to construct the point estimate. The list of proposals may be of independent interest to users, hence we are separating it into its own public API call. * Fixed a bug in the conditional logic for constructing bounding boxes. Closes #148 * updated pointstore and shingle aware RCF (#152) * updated pointstore * adding tests * cleanup * changes * spotless * Exposing sequence information to PointStore internally (#154) Updating forest and point store APIs to take a sequence index parameter. * cleanup of point sum and allowing sequenceIndices to be non-unique (#155) * cleanup of point sum and allowing sequenceIndices to be non-unique * fixes * Add generic parameter to IUpdateCoordinator for point type (#157) * type fixes * spotless woes * cleanup * javadocs fix * Add support for dynamically configurable models Add support for configuring component models by setting name. This change allows us to add new configurable settings in a backwards-compatible way. * Add convenience methods for short, long, and boolean * Rename IUpdateCoordinator to IStateCoordinator (#160) * updating samplers (#161) * updating samplers * fixes * fixes + spotless * changing small stores (#163) * changing small stores * spotless * fixes * Automatically apply spotless formatting instead of failing the build (#164) * simplification of trees (#165) * simplification of trees * fixing builders and builderers * fixes * PointStore update (#166) Update the point store to support internal shingling and compaction. * Tree compression added (#168) Additional logical compression to serialized tree states. * Remove unused classes (#171) Removed several classes that were no longer being used, like SmallIndexManager, SmallLeafStore, and related classes. * Store point data as bytes in PointStoreState (#174) * Store point data as bytes in PointStoreState * Fix upper bound in for-loop in unpackInts * single precision cuts (#175) * single precision cuts * adding shingling tests * fixes * Propagate initial accept fraction through array sampler converter (#177) * Add object graph size profiler (#180) * tree constructors and changing enableCache to numeric parameter (#182) * refactoring caching logic (#183) * refactoring caching logic * fixes * Update the `isOutputReady()` method to check component models (#184) Add a new isOutputReady method to the ITraversable interface. The forest will now check all component models for readiness, instead of waiting for a fixed number of updates. Closes #181 * Update CompactNodeView to operate directly on box cache and node store (#185) * Standardize property names in model classes (#191) - Change property `compactEnabled` to `compact`. - Change Preceision enum values from `SINGLE` and `DOUBLE` to `FLOAT_32` and `FLOAT_64`. - Use `centerOfMassEnabled` instead of `enableCenterOfMass` - Change `lambda` property to `timeDecay`. - Use `rng` instead of `random` for instance fields of type `Random`. - Change `getSequenceIndexOfMostRecentLambdaUpdate` to `mostRecentTimeDecayUpdate`. * Normalize names in the state package (#193) - Match names in the corresponding model classes. - Use the "Enabled" suffix for boolean flags where it improves readability. - Use "compressed" in state classes versus "isCompressionEnabled" in mapper classes. - Use "isPartialTreeState" in state classes versus "isPartialTreeStateEnabled" in mapper classes. - Store precision as a String instead of a boolean. * Change field visibility in BoxCache and NodeStore (#197) - Lower the visibility of fields in BoxCache and NodeStore to private or protected - Rename methods in the NodeStore class: change `getParentIndex` to `deriveParentIndex`, and change `getParent` to `getParentIndex`. * Standardize on `random` instead of `rng` (#202) Closes #199 * reference test in compact trees (#203) * Rename ExecutionContext and add mode string (#204) Closes #200 Closes #201 * using point reference for two point boxes (#205) * enabling projections (#208) * enabling projections * Fixes to tree projections and V1 JSON converter (#211) * Add utility for converting from V1 JSON to V2 State class * Change package names and add tests for the JSON converter * fixes * fixes to converter * cleanup Co-authored-by: Joshua Tokle <joshtok@amazon.com> * Multiple small changes (#214) * Mutiple small changes - Update version number to 2.0 - Make compact mode the default - Add a placeholder "mode" string to RandomCutForest for forward compatibility. - Change license header back to using 2020 until we decide what to do about spotless configuration. We don't want to update every existing source fill to 2021. Closes #130 Closes #209 Closes #212 * pointsum fix Co-authored-by: Joshua Tokle <joshtok@amazon.com> * Create an IVisitorFactory interface (#216) In a previous commit (#208) we added support for projections, which added a new method to the VisitorFactory class. This change creates an IVisitorFactory interface which is implemented VisitorFactory, giving users the option to create specialized implementations. * full reproducibility on serialization (#220) * full reproducibility on serialization * fixes * Rename remaining occurrences of lambda to timeDecay (#228) * Remove alternative constructors from classes with builders (#229) Closes #189 Co-authored-by: Sudipto Guha <sudipto@amazon.com>
Issue #, if available: 99, separating read only aspects of Bounding Boxes and internal functions
Description of changes: produces an abstract RCF using generics that can simulate the existing pointer based trees and compact trees via the same code.