-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing size restrictions #134
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jotok
reviewed
Jan 29, 2021
Java/core/src/main/java/com/amazon/randomcutforest/tree/AbstractCompactRandomCutTree.java
Outdated
Show resolved
Hide resolved
...ation-json/src/test/java/com/amazon/randomcutforest/serialize/RandomCutForestSerDeTests.java
Show resolved
Hide resolved
Java/core/src/test/java/com/amazon/randomcutforest/store/LeafStoreTest.java
Show resolved
Hide resolved
jotok
approved these changes
Jan 29, 2021
sudiptoguha
added a commit
that referenced
this pull request
Jun 9, 2021
* Sampler tree (#56) changes * Cleanup of State and ser/de test (#60) This is a change to CompactRCF branch. Moves the construction of state into the forest class. Updates ser/de tests to reflect that. * convertor to compact + compression (#61) * Changing both samplers to use floats for weights (#63) * Add new interfaces and classes for mapping between models and states * Modify the forest constructor to create a list of component models (#65) This is a first step toward removing state from executor classes. Eventually all state should be owned by the forest, the update coordinator, or the component models. This change modifies the forest constructor to initialize component models, but those models are not yet in the execution path. * Update executors to use ComponentList (#66) Update the executor classes to operate on the generic ComponentList class instead of a list of SamplerPlusTree. * Implement forest deserialization in RandomCutForestMapper (#67) Complete the updated serialiazation implemenation using IConextualStateMapper for the RandomCutForest class. * Compact rcf (#69) Create the INodeView interface to decouple visitors from the concrete Node class * Add benchmark class for serialization * Add INodeView implementation for compact trees (#71) Add INodeView implementation for compact trees and change the delete method to be more efficient. * Optimizations for the addPoint operation in compact trees (#72) * Compact rcf (#74) refactor tree * Introducing AbstractCompactRandomCutTree to enable float[] alongside double[] (#75) * Add lombok.config to repository (#94) Add lombok.config and set the property addLombokGeneratedAnnotation to true. This will make it so that jacoco ignores lombok generated code when computing test coverage. * Add support for single and double precision in compact trees (#76) * Fix javadoc warnings Update javadocs in classes that were producing warnings. Closes #89 * Improvements to sampler package * Standardize field names in CompactSampler * Update Javadocs * Add ability to validate the heap property in CompactSampler * Move common methods to the IStreamSampler interface * Add @DaTa annotation to Weighted class and remove unused methods * Improve package test coverage * Set fields in CompactSamplerState to private * Updates to forest/builder configuration * Change the singlePrecisionEnabled flag to use a new Precision enum * Remove logic from the builder that overrides user options. Instead we will fail fast in the constructor. Closes #77 #82 * Remove the Sequential interface This change makes the sequence index an explicit argument to the sampler and tree interfaces. As a result we can remove the Sequential container class. Other changes: * There is a new interface ISampled that which is used in the return type of getSamples() and getEvictedPoint(). * The Weighted container class implements ISampled * The signatures of acceptSample and addSample are changed to remove redundancy and to avoid leaking implementation details. In particular, acceptSample no longer returns a weight, because the use of weights is a feature of time-decayed sampling, but not a requirement for stream sampling. * Updates to method names in IStreamSampler to more clearly reflect use. Closes #80 * Unify pointer-based and compact trees (#103) Create new interfaces and abstract base classes to contain common logic between the pointer-based and compact tree implementations. * removing arbitrary nodeview (#108) Remove the getNodeView method from the INodeView interface and add it to the abstract tree base class. Additionally, change the visibility of tree methods related to tree structure from package-private to protected. Closes #106 * Change IUpdatable interface to not return nested Optional (#110) Rename UpdateReturn to UpdateResult, and add a new method isStateChange() that can be called to determine if the update changed the model. Closes #81 * Normalize constructors for state and mapper classes This change makes it so that all state and mapper classes have a single no-argument constructor and settable fields. Additionally, new test cases are added to improve test coverage in the state package. Closes #90 * Various changes to make mapper classes work in a consistent way * Have samplers serialize their own lambda and sample size values * Create a new mapper for CompactNodeManager * Update the CompactRandomCutTreeState class to contain a node manager state instead of leaf and node store states * Revert CompactNodeManager to work with ILeafStore and INodeStore interfaces instead of implementations * PointStoreCoordinator returns an IPointStore instead of a concrete class * Update mapper classes to respect the bounding box cache setting Closes #115 * Fix bounding box calculation and update caching algorithm (#118) * Fix bounding box calculation in compact trees * Add bounding box caching to pointer-based trees * Remove compact node manager class In a previous revision, we introducted the CompactNodeManager class to unify the handling of leaf nodes and internal nodes. The initial implementation was a leaky abstraction, because the tree class was still responsible for managing some node properties, like bounding boxes. The change moves node management back into the tree class. * Add support for serializing single-precision points Closes #78 #87 * RandomCutForest mapper should keep single precision when mapping to state * Refactor RandomCutForestSerDe Refactor the RandomCutForestSerDe class to operate directly on RandomCutForest objects instead of RandomCutForestState objects. Additionally, refactor the RandomCutForestSerDe unit tests to remove assertions that make more sense as a functional test in the core package or as a benchmark. * Support mapping CompactSampler to state class without copy In addition, the copy flag is added to other applicable mapper classes, but the behavior (mapping without copy) is only implemented for CompactSamplerMapper in this commit. * fixing size restrictions (#134) * fixing size restrictions * cleanup * Add new examples module Add new module with serialization examples and runner. * Add main method to examples Add a main method to examples to they can be easily invoked from an IDE. * Dynamic caching of bounding boxes (#136) Add support for setting the bounding box cache fraction at runtime. * Remove the serialization module from the project. Closes #128 * Add protostuff benchmark * Reduce the sensitivy of the single-precision consistency test Reduce the sensitivity of testConsistentScoringSinglePrecision by increasing the delta used when comparing score values. * dynamic samplers (#141) * smart copy (#143) * smart copy * fixes * cleanup of INodeView (#145) * Return a valid int value in getRootIndex Explicitly coerce null to an int value in getRootIndex Closes #142 * Split imputation into two methods and fix bounding box computation (#149) * Split the imputation method into two methods: the first method collects the imputation proposal from each tree, and the second method uses that result to construct the point estimate. The list of proposals may be of independent interest to users, hence we are separating it into its own public API call. * Fixed a bug in the conditional logic for constructing bounding boxes. Closes #148 * updated pointstore and shingle aware RCF (#152) * updated pointstore * adding tests * cleanup * changes * spotless * Exposing sequence information to PointStore internally (#154) Updating forest and point store APIs to take a sequence index parameter. * cleanup of point sum and allowing sequenceIndices to be non-unique (#155) * cleanup of point sum and allowing sequenceIndices to be non-unique * fixes * Add generic parameter to IUpdateCoordinator for point type (#157) * type fixes * spotless woes * cleanup * javadocs fix * Add support for dynamically configurable models Add support for configuring component models by setting name. This change allows us to add new configurable settings in a backwards-compatible way. * Add convenience methods for short, long, and boolean * Rename IUpdateCoordinator to IStateCoordinator (#160) * updating samplers (#161) * updating samplers * fixes * fixes + spotless * changing small stores (#163) * changing small stores * spotless * fixes * Automatically apply spotless formatting instead of failing the build (#164) * simplification of trees (#165) * simplification of trees * fixing builders and builderers * fixes * PointStore update (#166) Update the point store to support internal shingling and compaction. * Tree compression added (#168) Additional logical compression to serialized tree states. * Remove unused classes (#171) Removed several classes that were no longer being used, like SmallIndexManager, SmallLeafStore, and related classes. * Store point data as bytes in PointStoreState (#174) * Store point data as bytes in PointStoreState * Fix upper bound in for-loop in unpackInts * single precision cuts (#175) * single precision cuts * adding shingling tests * fixes * Propagate initial accept fraction through array sampler converter (#177) * Add object graph size profiler (#180) * tree constructors and changing enableCache to numeric parameter (#182) * refactoring caching logic (#183) * refactoring caching logic * fixes * Update the `isOutputReady()` method to check component models (#184) Add a new isOutputReady method to the ITraversable interface. The forest will now check all component models for readiness, instead of waiting for a fixed number of updates. Closes #181 * Update CompactNodeView to operate directly on box cache and node store (#185) * Standardize property names in model classes (#191) - Change property `compactEnabled` to `compact`. - Change Preceision enum values from `SINGLE` and `DOUBLE` to `FLOAT_32` and `FLOAT_64`. - Use `centerOfMassEnabled` instead of `enableCenterOfMass` - Change `lambda` property to `timeDecay`. - Use `rng` instead of `random` for instance fields of type `Random`. - Change `getSequenceIndexOfMostRecentLambdaUpdate` to `mostRecentTimeDecayUpdate`. * Normalize names in the state package (#193) - Match names in the corresponding model classes. - Use the "Enabled" suffix for boolean flags where it improves readability. - Use "compressed" in state classes versus "isCompressionEnabled" in mapper classes. - Use "isPartialTreeState" in state classes versus "isPartialTreeStateEnabled" in mapper classes. - Store precision as a String instead of a boolean. * Change field visibility in BoxCache and NodeStore (#197) - Lower the visibility of fields in BoxCache and NodeStore to private or protected - Rename methods in the NodeStore class: change `getParentIndex` to `deriveParentIndex`, and change `getParent` to `getParentIndex`. * Standardize on `random` instead of `rng` (#202) Closes #199 * reference test in compact trees (#203) * Rename ExecutionContext and add mode string (#204) Closes #200 Closes #201 * using point reference for two point boxes (#205) * enabling projections (#208) * enabling projections * Fixes to tree projections and V1 JSON converter (#211) * Add utility for converting from V1 JSON to V2 State class * Change package names and add tests for the JSON converter * fixes * fixes to converter * cleanup Co-authored-by: Joshua Tokle <joshtok@amazon.com> * Multiple small changes (#214) * Mutiple small changes - Update version number to 2.0 - Make compact mode the default - Add a placeholder "mode" string to RandomCutForest for forward compatibility. - Change license header back to using 2020 until we decide what to do about spotless configuration. We don't want to update every existing source fill to 2021. Closes #130 Closes #209 Closes #212 * pointsum fix Co-authored-by: Joshua Tokle <joshtok@amazon.com> * Create an IVisitorFactory interface (#216) In a previous commit (#208) we added support for projections, which added a new method to the VisitorFactory class. This change creates an IVisitorFactory interface which is implemented VisitorFactory, giving users the option to create specialized implementations. * full reproducibility on serialization (#220) * full reproducibility on serialization * fixes * Rename remaining occurrences of lambda to timeDecay (#228) * Remove alternative constructors from classes with builders (#229) Closes #189 Co-authored-by: Sudipto Guha <sudipto@amazon.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available: 127
Description of changes: Fixes restriction of 16383 for size of compact trees