Skip to content

Methods to compute partition limits#546

Merged
leerho merged 5 commits intomasterfrom
methods_to_compute_partition_limits
Apr 14, 2024
Merged

Methods to compute partition limits#546
leerho merged 5 commits intomasterfrom
methods_to_compute_partition_limits

Conversation

@leerho
Copy link
Copy Markdown
Member

@leerho leerho commented Apr 12, 2024

No description provided.

@leerho leerho requested a review from jmalkin April 12, 2024 19:39
already documented in the body of the javadoc.
@leerho
Copy link
Copy Markdown
Member Author

leerho commented Apr 12, 2024

Class by Class description of the changes.

KllItemsSketch

  • Propagation of new split out methods defined by PartitioningFeature.
  • Propagation of not requiring the normRankErr.

Partitioner

  • Propagation of new split out methods defined by PartitioningFeature.

ItemsSketch

  • Propagation of new split out methods defined by PartitioningFeature.
  • Propagation of not requiring the normRankErr.

GenericPartitionBoundaries

  • Javadoc and code comment improvements
  • Removed super interface PartitionBoundaries and pulled its methods with javadoc here.

GenericSortedView

  • extend new interface SketchPartitionLimits

ItemsSketchSortedView

  • Propagation and implementation of new split out methods defined by PartitioningFeature.
  • Simplified error messages
  • Simplified error detection for numEquallySizedParts too large
  • I figured out a simpler way to validate the maximum number of partitions without requiring the sketch normRankErr.
  • Removed now unneeded 'normRankErr and PARTITIONING_ERROR_FACTOR

PartitionBoundaries

  • Removed. Its methods moved to its only child class, GenericPartitionBoundaries.

PartitioningFeature

  • Split getPartitionBoundaries(numEquallySizedParts) into two methods:
    • getPartitionBoundariesFromNumParts(numPartitions) and
    • getPartitionBoundariesFromPartSize(nominalPartSizeItems).
    • This reflects how Druid actually was trying to use this method and ended up trying to simulate the second method incorrectly.
    • Removed the @throws clause because it was duplicative of what is already in the Javadoc body. (Also, the CI/CD runner was throwing a class not found exception on SketchesArgumentException. I think it needed to be fully qualified. But I decided to remove these clauses anyway.)

QuantilesGenericAPI

  • extend new interface SketchPartitionLimits

SketchPartitionLimits

  • This new interface defines the methods required to implement the two new limit methods
    • getMaxPartitions()
    • getMinPartitionSizeItems()
      Which can actually be implemented here as defaults.

/TEST/...

ItemsSketchTest

  • Propagation of change in ItemsSketchSortedView constructor change.

CrossCheckQuantilesTest

  • Propagation of change in ItemsSketchSortedView constructor change.

PartitionBoundariesTest

  • Propagation of method name changes.
  • Added tests for new SketchPartitionLimits methods

Copy link
Copy Markdown
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to block this, but I'm not sure if we fully test both success and failure of the new things? It's possible some of it is implicitly tested via existing code making that a non-issue. But if not, we should fix that before merging.

public interface SketchPartitionLimits {

/**
* Gets the maximum number of partitions this sketch will support based on the configured size <i>K</i>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to use the value of k anywhere? I guess it's implicit in getNumRetained()?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was about the documentation saying it uses k when the method does not actually do so explicitly (k will impact the number retained outside of exact mode, though, so it's an implicit dependence).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix.


if (numEquallySized > maxParts) {
final long partSizeItems = getMinPartitionSizeItems();
if (nominalPartitionSize < partSizeItems) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this would read a little more clearly if we rename partSizeItems to minPartSizeItems

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

+ "Retained Items / 2: " + (svLen / 2));
+ " The requested number of partitions is too large for this sketch.");
}
final long totalN = this.totalN;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalN doesn't seem to be used for quite a while in case you want to define it closer to where it's actually used. Not a big deal either way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out is was already a class member. I didn't need this line.


println("Min Size Limit: " + sizeLimit);
@SuppressWarnings("unused")
GenericPartitionBoundaries<String> gpb = sk.getPartitionBoundariesFromPartSize(sizeLimit - 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever test this method so that it succeeds rather than throwing?

Copy link
Copy Markdown
Member Author

@leerho leerho Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it manually, and it passes at sizeLimit and fails at sizeLimit - 1.
I didn't think it was necessary, but I could add a test that passes exactly at sizeLimit.
Same thing goes for the limit on num partitions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

- getMaxPartitions()
- getMinPartSizeItems()
//this should pass
final int goodNumPartsRequest = sk.getMaxPartitions();
println("Good numPartsRequest " + goodNumPartsRequest);
GenericPartitionBoundaries<String> gpb = sk.getPartitionBoundariesFromNumParts(goodNumPartsRequest);

Check notice

Code scanning / CodeQL

Unread local variable

Variable 'GenericPartitionBoundaries<String> gpb' is never read.
//this should pass
final long goodPartSizeRequest= sk.getMinPartitionSizeItems();
println("Good partSizeRequest " + goodPartSizeRequest);
GenericPartitionBoundaries<String> gpb = sk.getPartitionBoundariesFromPartSize(goodPartSizeRequest);

Check notice

Code scanning / CodeQL

Unread local variable

Variable 'GenericPartitionBoundaries<String> gpb' is never read.
@leerho leerho merged commit 30fa37e into master Apr 14, 2024
@leerho leerho deleted the methods_to_compute_partition_limits branch April 14, 2024 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants