Skip to content

CASSANDRA-15834 Bloom false positive rate includes true negatives#600

Closed
jtgrabowski wants to merge 1 commit intoapache:cassandra-3.0from
jtgrabowski:15834-3.0
Closed

CASSANDRA-15834 Bloom false positive rate includes true negatives#600
jtgrabowski wants to merge 1 commit intoapache:cassandra-3.0from
jtgrabowski:15834-3.0

Conversation

@jtgrabowski
Copy link
Contributor

Before this change the bloom filter false positive rate was calculated
without true negatives which resulted in high rates. In an extreme case,
where all queries return no data, the false positive rate could go up to
1.0.

This change includes true negatives in [recent] bloom filter false ratio.

Before this change the bloom filter false positive rate was calculated
without true negatives which resulted in high rates. In an extreme case,
where all queries return no data, the false positive rate could go up to
1.0.

This change includes true negatives in [recent] bloom filter false ratio.
mike-tr-adamson pushed a commit to mike-tr-adamson/cassandra that referenced this pull request Apr 17, 2023
* port #5205 configure table metrics aggregation via table extensions

The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation,
the default is INDIVIDUAL (no aggregation). CNDB will default to
AGGREGATED.

The setting may be set per table via ALTER/CREATE statements. The
custom value is stored in the table schema to survive restarts.

From the original commit message:

An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or
0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual
keyspace histograms. Unfortunately extension payloads must be binary, which is why
I chose to use a single byte rather than an encoded string. These values are
grouped into an enum, the MetrcsAggregation enum in TableMetrics.

* port #5205 metric aggregation

This patch provides the infrastructure required to reduce the
cardinality of table metrics.

Tables can either use individual metrics, as before, or keyspace
metrics. This is controlled by a table metadata extension.
A system property determines if, in the absence of this extension,
tables should use individual or keyspace histograms by default.
The default of this property is individual histograms for C*,
but CNDB services will set this property to switch to keyspace
histograms by default.

TableMetrics.Table[Meter|Timer|Histogram] classes were modified
to work without table metrics when
TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED.
The classes forward update calls to parents as usual, but skip
table metric if it's missing.
When asked about the current metric value they return either table
or aggregated keyspace metric depending on
TableMetrics#metricsAggregation.

Additionally, an equivalent class was added for LatencyMetrics -
TableMetrics.TableLatencyMetrics. It serves the same purpose as
other Table* wrappers, it either forwards the calls to parent
metrics and self (via LatencyMetric class) or just to parents.

Also, coordinator*Latency metrics were added to keyspace metrics,
this allows to aggregate coordinator* table metrics.

Lastly, the table metrics are reloaded on table extension property
change.

* port #5205 global aggregates for tables are optional

Global aggregates for table metrics may be disabled with
-Dcassandra.table_metrics_export_globals = false.
adelapena pushed a commit to adelapena/cassandra that referenced this pull request Sep 26, 2023
* port #5205 configure table metrics aggregation via table extensions

The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation,
the default is INDIVIDUAL (no aggregation). CNDB will default to
AGGREGATED.

The setting may be set per table via ALTER/CREATE statements. The
custom value is stored in the table schema to survive restarts.

From the original commit message:

An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or
0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual
keyspace histograms. Unfortunately extension payloads must be binary, which is why
I chose to use a single byte rather than an encoded string. These values are
grouped into an enum, the MetrcsAggregation enum in TableMetrics.

* port #5205 metric aggregation

This patch provides the infrastructure required to reduce the
cardinality of table metrics.

Tables can either use individual metrics, as before, or keyspace
metrics. This is controlled by a table metadata extension.
A system property determines if, in the absence of this extension,
tables should use individual or keyspace histograms by default.
The default of this property is individual histograms for C*,
but CNDB services will set this property to switch to keyspace
histograms by default.

TableMetrics.Table[Meter|Timer|Histogram] classes were modified
to work without table metrics when
TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED.
The classes forward update calls to parents as usual, but skip
table metric if it's missing.
When asked about the current metric value they return either table
or aggregated keyspace metric depending on
TableMetrics#metricsAggregation.

Additionally, an equivalent class was added for LatencyMetrics -
TableMetrics.TableLatencyMetrics. It serves the same purpose as
other Table* wrappers, it either forwards the calls to parent
metrics and self (via LatencyMetric class) or just to parents.

Also, coordinator*Latency metrics were added to keyspace metrics,
this allows to aggregate coordinator* table metrics.

Lastly, the table metrics are reloaded on table extension property
change.

* port #5205 global aggregates for tables are optional

Global aggregates for table metrics may be disabled with
-Dcassandra.table_metrics_export_globals = false.

(cherry picked from commit a9fac9c)
(cherry picked from commit 01446a2)
ekaterinadimitrova2 pushed a commit to ekaterinadimitrova2/cassandra that referenced this pull request Jun 3, 2024
* port #5205 configure table metrics aggregation via table extensions

The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation,
the default is INDIVIDUAL (no aggregation). CNDB will default to
AGGREGATED.

The setting may be set per table via ALTER/CREATE statements. The
custom value is stored in the table schema to survive restarts.

From the original commit message:

An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or
0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual
keyspace histograms. Unfortunately extension payloads must be binary, which is why
I chose to use a single byte rather than an encoded string. These values are
grouped into an enum, the MetrcsAggregation enum in TableMetrics.

* port #5205 metric aggregation

This patch provides the infrastructure required to reduce the
cardinality of table metrics.

Tables can either use individual metrics, as before, or keyspace
metrics. This is controlled by a table metadata extension.
A system property determines if, in the absence of this extension,
tables should use individual or keyspace histograms by default.
The default of this property is individual histograms for C*,
but CNDB services will set this property to switch to keyspace
histograms by default.

TableMetrics.Table[Meter|Timer|Histogram] classes were modified
to work without table metrics when
TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED.
The classes forward update calls to parents as usual, but skip
table metric if it's missing.
When asked about the current metric value they return either table
or aggregated keyspace metric depending on
TableMetrics#metricsAggregation.

Additionally, an equivalent class was added for LatencyMetrics -
TableMetrics.TableLatencyMetrics. It serves the same purpose as
other Table* wrappers, it either forwards the calls to parent
metrics and self (via LatencyMetric class) or just to parents.

Also, coordinator*Latency metrics were added to keyspace metrics,
this allows to aggregate coordinator* table metrics.

Lastly, the table metrics are reloaded on table extension property
change.

* port #5205 global aggregates for tables are optional

Global aggregates for table metrics may be disabled with
-Dcassandra.table_metrics_export_globals = false.

(cherry picked from commit a9fac9c)
(cherry picked from commit 01446a2)
michaelsembwever pushed a commit to thelastpickle/cassandra that referenced this pull request Jan 7, 2026
* port #5205 configure table metrics aggregation via table extensions

The default aggregation is set via cassandra.table_metrics_default_histograms_aggregation,
the default is INDIVIDUAL (no aggregation). CNDB will default to
AGGREGATED.

The setting may be set per table via ALTER/CREATE statements. The
custom value is stored in the table schema to survive restarts.

From the original commit message:

An extension has been added with key HISTOGRAM_METRICS and binary value either 0x00 or
0x01, with 0x00 meaning aggregated keyspace histograms, and 0x01 meaning individual
keyspace histograms. Unfortunately extension payloads must be binary, which is why
I chose to use a single byte rather than an encoded string. These values are
grouped into an enum, the MetrcsAggregation enum in TableMetrics.

* port #5205 metric aggregation

This patch provides the infrastructure required to reduce the
cardinality of table metrics.

Tables can either use individual metrics, as before, or keyspace
metrics. This is controlled by a table metadata extension.
A system property determines if, in the absence of this extension,
tables should use individual or keyspace histograms by default.
The default of this property is individual histograms for C*,
but CNDB services will set this property to switch to keyspace
histograms by default.

TableMetrics.Table[Meter|Timer|Histogram] classes were modified
to work without table metrics when
TableMetrics#metricsAggregation == MetricsAggregation#AGGREGATED.
The classes forward update calls to parents as usual, but skip
table metric if it's missing.
When asked about the current metric value they return either table
or aggregated keyspace metric depending on
TableMetrics#metricsAggregation.

Additionally, an equivalent class was added for LatencyMetrics -
TableMetrics.TableLatencyMetrics. It serves the same purpose as
other Table* wrappers, it either forwards the calls to parent
metrics and self (via LatencyMetric class) or just to parents.

Also, coordinator*Latency metrics were added to keyspace metrics,
this allows to aggregate coordinator* table metrics.

Lastly, the table metrics are reloaded on table extension property
change.

* port #5205 global aggregates for tables are optional

Global aggregates for table metrics may be disabled with
-Dcassandra.table_metrics_export_globals = false.

(cherry picked from commit a9fac9c)
(cherry picked from commit 01446a2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants