CASSANDRA-15834 Bloom false positive rate includes true negatives#600
Closed
jtgrabowski wants to merge 1 commit intoapache:cassandra-3.0from
Closed
CASSANDRA-15834 Bloom false positive rate includes true negatives#600jtgrabowski wants to merge 1 commit intoapache:cassandra-3.0from
jtgrabowski wants to merge 1 commit intoapache:cassandra-3.0from
Conversation
Before this change the bloom filter false positive rate was calculated without true negatives which resulted in high rates. In an extreme case, where all queries return no data, the false positive rate could go up to 1.0. This change includes true negatives in [recent] bloom filter false ratio.
mike-tr-adamson
pushed a commit
to mike-tr-adamson/cassandra
that referenced
this pull request
Apr 17, 2023
* port #5205 configure table metrics aggregation via table extensions The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation, the default is INDIVIDUAL (no aggregation). CNDB will default to AGGREGATED. The setting may be set per table via ALTER/CREATE statements. The custom value is stored in the table schema to survive restarts. From the original commit message: An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or 0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual keyspace histograms. Unfortunately extension payloads must be binary, which is why I chose to use a single byte rather than an encoded string. These values are grouped into an enum, the MetrcsAggregation enum in TableMetrics. * port #5205 metric aggregation This patch provides the infrastructure required to reduce the cardinality of table metrics. Tables can either use individual metrics, as before, or keyspace metrics. This is controlled by a table metadata extension. A system property determines if, in the absence of this extension, tables should use individual or keyspace histograms by default. The default of this property is individual histograms for C*, but CNDB services will set this property to switch to keyspace histograms by default. TableMetrics.Table[Meter|Timer|Histogram] classes were modified to work without table metrics when TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED. The classes forward update calls to parents as usual, but skip table metric if it's missing. When asked about the current metric value they return either table or aggregated keyspace metric depending on TableMetrics#metricsAggregation. Additionally, an equivalent class was added for LatencyMetrics - TableMetrics.TableLatencyMetrics. It serves the same purpose as other Table* wrappers, it either forwards the calls to parent metrics and self (via LatencyMetric class) or just to parents. Also, coordinator*Latency metrics were added to keyspace metrics, this allows to aggregate coordinator* table metrics. Lastly, the table metrics are reloaded on table extension property change. * port #5205 global aggregates for tables are optional Global aggregates for table metrics may be disabled with -Dcassandra.table_metrics_export_globals = false.
adelapena
pushed a commit
to adelapena/cassandra
that referenced
this pull request
Sep 26, 2023
* port #5205 configure table metrics aggregation via table extensions The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation, the default is INDIVIDUAL (no aggregation). CNDB will default to AGGREGATED. The setting may be set per table via ALTER/CREATE statements. The custom value is stored in the table schema to survive restarts. From the original commit message: An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or 0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual keyspace histograms. Unfortunately extension payloads must be binary, which is why I chose to use a single byte rather than an encoded string. These values are grouped into an enum, the MetrcsAggregation enum in TableMetrics. * port #5205 metric aggregation This patch provides the infrastructure required to reduce the cardinality of table metrics. Tables can either use individual metrics, as before, or keyspace metrics. This is controlled by a table metadata extension. A system property determines if, in the absence of this extension, tables should use individual or keyspace histograms by default. The default of this property is individual histograms for C*, but CNDB services will set this property to switch to keyspace histograms by default. TableMetrics.Table[Meter|Timer|Histogram] classes were modified to work without table metrics when TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED. The classes forward update calls to parents as usual, but skip table metric if it's missing. When asked about the current metric value they return either table or aggregated keyspace metric depending on TableMetrics#metricsAggregation. Additionally, an equivalent class was added for LatencyMetrics - TableMetrics.TableLatencyMetrics. It serves the same purpose as other Table* wrappers, it either forwards the calls to parent metrics and self (via LatencyMetric class) or just to parents. Also, coordinator*Latency metrics were added to keyspace metrics, this allows to aggregate coordinator* table metrics. Lastly, the table metrics are reloaded on table extension property change. * port #5205 global aggregates for tables are optional Global aggregates for table metrics may be disabled with -Dcassandra.table_metrics_export_globals = false. (cherry picked from commit a9fac9c) (cherry picked from commit 01446a2)
ekaterinadimitrova2
pushed a commit
to ekaterinadimitrova2/cassandra
that referenced
this pull request
Jun 3, 2024
* port #5205 configure table metrics aggregation via table extensions The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation, the default is INDIVIDUAL (no aggregation). CNDB will default to AGGREGATED. The setting may be set per table via ALTER/CREATE statements. The custom value is stored in the table schema to survive restarts. From the original commit message: An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or 0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual keyspace histograms. Unfortunately extension payloads must be binary, which is why I chose to use a single byte rather than an encoded string. These values are grouped into an enum, the MetrcsAggregation enum in TableMetrics. * port #5205 metric aggregation This patch provides the infrastructure required to reduce the cardinality of table metrics. Tables can either use individual metrics, as before, or keyspace metrics. This is controlled by a table metadata extension. A system property determines if, in the absence of this extension, tables should use individual or keyspace histograms by default. The default of this property is individual histograms for C*, but CNDB services will set this property to switch to keyspace histograms by default. TableMetrics.Table[Meter|Timer|Histogram] classes were modified to work without table metrics when TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED. The classes forward update calls to parents as usual, but skip table metric if it's missing. When asked about the current metric value they return either table or aggregated keyspace metric depending on TableMetrics#metricsAggregation. Additionally, an equivalent class was added for LatencyMetrics - TableMetrics.TableLatencyMetrics. It serves the same purpose as other Table* wrappers, it either forwards the calls to parent metrics and self (via LatencyMetric class) or just to parents. Also, coordinator*Latency metrics were added to keyspace metrics, this allows to aggregate coordinator* table metrics. Lastly, the table metrics are reloaded on table extension property change. * port #5205 global aggregates for tables are optional Global aggregates for table metrics may be disabled with -Dcassandra.table_metrics_export_globals = false. (cherry picked from commit a9fac9c) (cherry picked from commit 01446a2)
michaelsembwever
pushed a commit
to thelastpickle/cassandra
that referenced
this pull request
Jan 7, 2026
* port #5205 configure table metrics aggregation via table extensions The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation, the default is INDIVIDUAL (no aggregation). CNDB will default to AGGREGATED. The setting may be set per table via ALTER/CREATE statements. The custom value is stored in the table schema to survive restarts. From the original commit message: An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or 0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual keyspace histograms. Unfortunately extension payloads must be binary, which is why I chose to use a single byte rather than an encoded string. These values are grouped into an enum, the MetrcsAggregation enum in TableMetrics. * port #5205 metric aggregation This patch provides the infrastructure required to reduce the cardinality of table metrics. Tables can either use individual metrics, as before, or keyspace metrics. This is controlled by a table metadata extension. A system property determines if, in the absence of this extension, tables should use individual or keyspace histograms by default. The default of this property is individual histograms for C*, but CNDB services will set this property to switch to keyspace histograms by default. TableMetrics.Table[Meter|Timer|Histogram] classes were modified to work without table metrics when TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED. The classes forward update calls to parents as usual, but skip table metric if it's missing. When asked about the current metric value they return either table or aggregated keyspace metric depending on TableMetrics#metricsAggregation. Additionally, an equivalent class was added for LatencyMetrics - TableMetrics.TableLatencyMetrics. It serves the same purpose as other Table* wrappers, it either forwards the calls to parent metrics and self (via LatencyMetric class) or just to parents. Also, coordinator*Latency metrics were added to keyspace metrics, this allows to aggregate coordinator* table metrics. Lastly, the table metrics are reloaded on table extension property change. * port #5205 global aggregates for tables are optional Global aggregates for table metrics may be disabled with -Dcassandra.table_metrics_export_globals = false. (cherry picked from commit a9fac9c) (cherry picked from commit 01446a2)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before this change the bloom filter false positive rate was calculated
without true negatives which resulted in high rates. In an extreme case,
where all queries return no data, the false positive rate could go up to
1.0.
This change includes true negatives in [recent] bloom filter false ratio.