Skip to content

Commit

Permalink
pubsub/kvindexer:support for big numbers - v2 (backport #797) (#840)
Browse files Browse the repository at this point in the history
* pubsub/kvindexer:support for big numbers - v2 (#797)

* Applied Michaels patch

* Added corner case tests, failing curently

* Support for big floats and ints added

* Added new util file

* Fixed linter error

* added internal package

* Revert "added internal package"

This reverts commit ef7f2b4.

* added internal/indexer

* Moved utils to internal

* Fixed linter

* Updated docs

* Applied @sergio-mena s  PR comments

* Fixed linter

* Return with error in compare float

* Changelog entries

* Apply lasaroj's comments.

Co-authored-by: Lasaro <lasaro@informal.systems>

* applied some PR comments

* updated docs

Co-authored-by: Sergio Mena <sergio@informal.systems>

* Added errors and logger

* Fixed linter

* Fixed sentence in comment

* Removed changelog

* Avoid converting to string when parsing int to float

* Added unexpected types to error messages

* Added comment on the 8atom regex in pubsub

---------

Co-authored-by: Lasaro <lasaro@informal.systems>
Co-authored-by: Sergio Mena <sergio@informal.systems>
(cherry picked from commit f667d3f)

* Add changelogs

* Make breaking changes explicit for users

* Update libs/pubsub/query/query.go

Co-authored-by: Lasaro <lasaro@informal.systems>

---------

Co-authored-by: Jasmina Malicevic <jasmina.dustinac@gmail.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Lasaro <lasaro@informal.systems>
  • Loading branch information
4 people committed May 17, 2023
1 parent 19844b5 commit d067be9
Show file tree
Hide file tree
Showing 24 changed files with 735 additions and 94 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[kvindexer]` Added support for big integers and big floats in the kvindexer.
Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
([\#797](https://github.com/cometbft/cometbft/pull/797))
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[pubsub]` Added support for big integers and big floats in the pubsub event query system.
Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
([\#797](https://github.com/cometbft/cometbft/pull/797))
3 changes: 3 additions & 0 deletions .changelog/unreleased/improvements/797-pubsub-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- `[pubsub/kvindexer]` Numeric query conditions and event values are represented as big floats with default precision of 125.
Integers are read as "big ints" and represented with as many bits as they need when converting to floats.
([\#797](https://github.com/cometbft/cometbft/pull/797))
9 changes: 9 additions & 0 deletions docs/app-dev/indexing-transactions.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,3 +276,12 @@ This behavior was fixed with CometBFT 0.34.26+. However, if the data was indexed
Tendermint Core and not re-indexed, that data will be queried as if all the attributes within a height
occurred within the same event.

## Event attribute value types

Users can use anything as an event value. However, if the event attribute value is a number, the following needs to be taken into account:

- Negative numbers will not be properly retrieved when querying the indexer.
- Event values are converted to big floats (from the `big/math` package). The precision of the floating point number is set to the bit length
of the integer it is supposed to represent, so that there is no loss of information due to insufficient precision. This was not present before CometBFT v0.38.x and all float values were ignored.
- As of CometBFT v0.38.x, queries can contain floating point numbers as well.
- Note that comparing to floats can be imprecise with a high number of decimals.
19 changes: 19 additions & 0 deletions docs/core/subscription.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,25 @@ You can also use tags, given you had included them into DeliverTx
response, to query transaction results. See [Indexing
transactions](../app-dev/indexing-transactions.md) for details.

## Query parameter and event type restrictions

While CometBFT imposes no restrictions on the application with regards to the type of
the event output, there are several considerations that need to be taken into account
when querying events with numeric values.

- Queries convert all numeric event values to `big.Float` , provided by `math/big`. Integers
are converted into a float with a precision equal to the number of bits needed
to represent this integer. This is done to avoid precision loss for big integers when they
are converted with the default precision (`64`).
- When comparing two values, if either one of them is a float, the other one will be represented
as a big float. Integers are again parsed as big floats with a precision equal to the number
of bits required to represent them.
- As with all floating point comparisons, comparing floats with decimal values can lead to imprecise
results.
- Queries cannot include negative numbers

Prior to version `v0.38.x`, floats were not supported as query parameters.

## ValidatorSetUpdates

When validator set changes, ValidatorSetUpdates event is published. The
Expand Down
119 changes: 119 additions & 0 deletions internal/indexer/indexer_utils.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package indexer

import (
"fmt"
"math/big"

"github.com/cometbft/cometbft/state/indexer"
)

// If the actual event value is a float, we get the condition and parse it as a float
// to compare against
func compareFloat(op1 *big.Float, op2 interface{}) (int, bool, error) {
switch opVal := op2.(type) {
case *big.Int:
vF := new(big.Float)
vF.SetInt(opVal)
cmp := op1.Cmp(vF)
return cmp, false, nil

case *big.Float:
return op1.Cmp(opVal), true, nil
default:
return -1, false, fmt.Errorf("unable to parse arguments, bad type: %T", op2)
}
}

// If the event value we compare against the condition (op2) is an integer
// we convert the int to float with a precision equal to the number of bits
// needed to represent the integer to avoid rounding issues with floats
// where 100 would equal to 100.2 because 100.2 is rounded to 100, while 100.7
// would be rounded to 101.
func compareInt(op1 *big.Int, op2 interface{}) (int, bool, error) {

switch opVal := op2.(type) {
case *big.Int:
return op1.Cmp(opVal), false, nil
case *big.Float:
vF := new(big.Float)
vF.SetInt(op1)
return vF.Cmp(opVal), true, nil
default:
return -1, false, fmt.Errorf("unable to parse arguments, unexpected type: %T", op2)
}
}

func CheckBounds(ranges indexer.QueryRange, v interface{}) (bool, error) {
// These functions fetch the lower and upper bounds of the query
// It is expected that for x > 5, the value of lowerBound is 6.
// This is achieved by adding one to the actual lower bound.
// For a query of x < 5, the value of upper bound is 4.
// This is achieved by subtracting one from the actual upper bound.

// For integers this behavior will work. However, for floats, we cannot simply add/sub 1.
// Query :x < 5.5 ; x = 5 should match the query. If we subtracted one as for integers,
// the upperBound would be 4.5 and x would not match. Thus we do not subtract anything for
// floating point bounds.

// We can rewrite these functions to not add/sub 1 but the function handles also time arguments.
// To be sure we are not breaking existing queries that compare time, and as we are planning to replace
// the indexer in the future, we adapt the code here to handle floats as a special case.
lowerBound := ranges.LowerBoundValue()
upperBound := ranges.UpperBoundValue()

// *Explanation for the isFloat condition below.*
// In LowerBoundValue(), for floating points, we cannot simply add 1 due to the reasons explained in
// in the comment at the beginning. The same is true for subtracting one for UpperBoundValue().
// That means that for integers, if the condition is >=, cmp will be either 0 or 1
// ( cmp == -1 should always be false).
// But if the lowerBound is a float, we have not subtracted one, so returning a 0
// is correct only if ranges.IncludeLowerBound is true.
// example int: x < 100; upperBound = 99; if x.Cmp(99) == 0 the condition holds
// example float: x < 100.0; upperBound = 100.0; if x.Cmp(100) ==0 then returning x
// would be wrong.
switch vVal := v.(type) {
case *big.Int:
if lowerBound != nil {
cmp, isFloat, err := compareInt(vVal, lowerBound)
if err != nil {
return false, err
}
if cmp == -1 || (isFloat && cmp == 0 && !ranges.IncludeLowerBound) {
return false, err
}
}
if upperBound != nil {
cmp, isFloat, err := compareInt(vVal, upperBound)
if err != nil {
return false, err
}
if cmp == 1 || (isFloat && cmp == 0 && !ranges.IncludeUpperBound) {
return false, err
}
}

case *big.Float:
if lowerBound != nil {
cmp, isFloat, err := compareFloat(vVal, lowerBound)
if err != nil {
return false, err
}
if cmp == -1 || (cmp == 0 && isFloat && !ranges.IncludeLowerBound) {
return false, err
}
}
if upperBound != nil {
cmp, isFloat, err := compareFloat(vVal, upperBound)
if err != nil {
return false, err
}
if cmp == 1 || (cmp == 0 && isFloat && !ranges.IncludeUpperBound) {
return false, err
}
}

default:
return false, fmt.Errorf("invalid argument type in query: %T", v)
}
return true, nil
}
32 changes: 21 additions & 11 deletions libs/pubsub/query/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ package query

import (
"fmt"
"math/big"
"regexp"
"strconv"
"strings"
"time"

Expand Down Expand Up @@ -218,13 +218,23 @@ func compileCondition(cond syntax.Condition) (condition, error) {
return out, nil
}

// TODO(creachadair): The existing implementation allows anything number shaped
// to be treated as a number. This preserves the parts of that behavior we had
// tests for, but we should probably get rid of that.
// We use this regex to support queries of the form "8atom", "6.5stake",
// which are actively used in production.
// The regex takes care of removing the non-number suffix.
var extractNum = regexp.MustCompile(`^\d+(\.\d+)?`)

func parseNumber(s string) (float64, error) {
return strconv.ParseFloat(extractNum.FindString(s), 64)
func parseNumber(s string) (*big.Float, error) {
intVal := new(big.Int)
if _, ok := intVal.SetString(s, 10); !ok {
f, _, err := big.ParseFloat(extractNum.FindString(s), 10, 125, big.ToNearestEven)
if err != nil {
return nil, err
}
return f, err
}
f, _, err := big.ParseFloat(extractNum.FindString(s), 10, uint(intVal.BitLen()), big.ToNearestEven)
return f, err

}

// A map of operator ⇒ argtype ⇒ match-constructor.
Expand All @@ -248,7 +258,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w == v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) == 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -268,7 +278,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w < v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) < 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -288,7 +298,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w <= v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) <= 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -308,7 +318,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w > v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) > 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand All @@ -328,7 +338,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
syntax.TNumber: func(v interface{}) func(string) bool {
return func(s string) bool {
w, err := parseNumber(s)
return err == nil && w >= v.(float64)
return err == nil && w.Cmp(v.(*big.Float)) >= 0
}
},
syntax.TDate: func(v interface{}) func(string) bool {
Expand Down
78 changes: 78 additions & 0 deletions libs/pubsub/query/query_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,84 @@ var apiTypeEvents = []types.Event{
},
}

func TestBigNumbers(t *testing.T) {

apiBigNumTest := map[string][]string{
"big.value": {
"99999999999999999999",
},
"big2.value": {
"18446744073709551615", // max(uint64) == 18446744073709551615
},
"big.floatvalue": {
"99999999999999999999.10",
},
"big2.floatvalue": {
"18446744073709551615.6", // max(uint64) == 18446744073709551615
},
}

testCases := []struct {
s string
events map[string][]string
matches bool
}{

// Test cases for values that exceed the capacity if int64/float64.
{`big.value >= 99999999999999999999`,
apiBigNumTest,
true},
{`big.value > 99999999999999999998`,
apiBigNumTest,
true},
{`big2.value <= 18446744073709551615`,
apiBigNumTest, true},
{`big.floatvalue >= 99999999999999999999`,
apiBigNumTest,
true},
{`big.floatvalue > 99999999999999999998.10`,
apiBigNumTest,
true},
{`big.floatvalue > 99999999999999999998`,
apiBigNumTest,
true},
{`big2.floatvalue <= 18446744073709551615.6`,
apiBigNumTest,
true},
{`big2.floatvalue <= 18446744073709551615.6`,
apiBigNumTest,
true},
{`big2.floatvalue >= 18446744073709551615`,
apiBigNumTest,
true},
{`big2.floatvalue >= 12.5`,
apiBigNumTest,
true},
{`big.value >= 10`,
apiBigNumTest,
true},
}

for i, tc := range testCases {
t.Run(fmt.Sprintf("%02d", i+1), func(t *testing.T) {
c, err := query.New(tc.s)
if err != nil {
t.Fatalf("NewCompiled %#q: unexpected error: %v", tc.s, err)
}

got, err := c.Matches(tc.events)
if err != nil {
t.Errorf("Query: %#q\nInput: %+v\nMatches: got error %v",
tc.s, tc.events, err)
}
if got != tc.matches {
t.Errorf("Query: %#q\nInput: %+v\nMatches: got %v, want %v",
tc.s, tc.events, got, tc.matches)
}
})
}
}

func TestCompiledMatches(t *testing.T) {
var (
txDate = "2017-01-01"
Expand Down
35 changes: 26 additions & 9 deletions libs/pubsub/query/syntax/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@ package syntax
import (
"fmt"
"io"
"math"
"strconv"
"math/big"
"strings"
"time"
)
Expand Down Expand Up @@ -68,17 +67,35 @@ func (a *Arg) String() string {
}
}

// Number returns the value of the argument text as a number, or a NaN if the
// Number returns the value of the argument text as a number, or nil if the
// text does not encode a valid number value.
func (a *Arg) Number() float64 {
func (a *Arg) Number() *big.Float {
if a == nil {
return -1
return nil
}
v, err := strconv.ParseFloat(a.text, 64)
if err == nil && v >= 0 {
return v
intVal := new(big.Int)
if _, ok := intVal.SetString(a.text, 10); !ok {
f, _, err := big.ParseFloat(a.text, 10, 125, big.ToNearestEven)
if err != nil {
return nil
}
return f
}
// If it is indeed a big integer, we make sure to convert it to a float with enough precision
// to represent all the bits
bitLen := uint(intVal.BitLen())
var f *big.Float
var err error
if bitLen <= 64 {
f, _, err = big.ParseFloat(a.text, 10, 0, big.ToNearestEven)
} else {
f, _, err = big.ParseFloat(a.text, 10, bitLen, big.ToNearestEven)
}
if err != nil {
return nil
}
return math.NaN()
return f

}

// Time returns the value of the argument text as a time, or the zero value if
Expand Down

0 comments on commit d067be9

Please sign in to comment.