pubsub/kvindexer:support for big numbers - v2 (backport #797) (#840)

* pubsub/kvindexer:support for big numbers - v2 (#797) * Applied Michaels patch * Added corner case tests, failing curently * Support for big floats and ints added * Added new util file * Fixed linter error * added internal package * Revert "added internal package" This reverts commit ef7f2b4. * added internal/indexer * Moved utils to internal * Fixed linter * Updated docs * Applied @sergio-mena s PR comments * Fixed linter * Return with error in compare float * Changelog entries * Apply lasaroj's comments. Co-authored-by: Lasaro <lasaro@informal.systems> * applied some PR comments * updated docs Co-authored-by: Sergio Mena <sergio@informal.systems> * Added errors and logger * Fixed linter * Fixed sentence in comment * Removed changelog * Avoid converting to string when parsing int to float * Added unexpected types to error messages * Added comment on the 8atom regex in pubsub --------- Co-authored-by: Lasaro <lasaro@informal.systems> Co-authored-by: Sergio Mena <sergio@informal.systems> (cherry picked from commit f667d3f) * Add changelogs * Make breaking changes explicit for users * Update libs/pubsub/query/query.go Co-authored-by: Lasaro <lasaro@informal.systems> --------- Co-authored-by: Jasmina Malicevic <jasmina.dustinac@gmail.com> Co-authored-by: Sergio Mena <sergio@informal.systems> Co-authored-by: Lasaro <lasaro@informal.systems>
cometbft · May 17, 2023 · d067be9 · d067be9
1 parent 19844b5
commit d067be9
Show file tree

Hide file tree

Showing 24 changed files with 735 additions and 94 deletions.
diff --git a/.changelog/unreleased/breaking-changes/797-kvindexer-support-for-big-numbers.md b/.changelog/unreleased/breaking-changes/797-kvindexer-support-for-big-numbers.md
@@ -0,0 +1,3 @@
+- `[kvindexer]` Added support for big integers and big floats in the kvindexer.
+  Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
+  ([\#797](https://github.com/cometbft/cometbft/pull/797))
diff --git a/.changelog/unreleased/breaking-changes/797-pubsub-support-for-big-numbers.md b/.changelog/unreleased/breaking-changes/797-pubsub-support-for-big-numbers.md
@@ -0,0 +1,3 @@
+- `[pubsub]` Added support for big integers and big floats in the pubsub event query system.
+  Breaking changes: function `Number` in package `libs/pubsub/query/syntax` changed its return value.
+  ([\#797](https://github.com/cometbft/cometbft/pull/797))
diff --git a/.changelog/unreleased/improvements/797-pubsub-float.md b/.changelog/unreleased/improvements/797-pubsub-float.md
@@ -0,0 +1,3 @@
+- `[pubsub/kvindexer]` Numeric query conditions and event values are represented as big floats with default precision of 125.
+  Integers are read as "big ints" and represented with as many bits as they need when converting to floats.
+  ([\#797](https://github.com/cometbft/cometbft/pull/797))
diff --git a/docs/app-dev/indexing-transactions.md b/docs/app-dev/indexing-transactions.md
@@ -276,3 +276,12 @@ This behavior was fixed with CometBFT 0.34.26+. However, if the data was indexed
 Tendermint Core and not re-indexed, that data will be queried as if all the attributes within a height
 occurred within the same event.
 
+## Event attribute value types
+
+Users can use anything as an event value. However, if the event attribute value is a number, the following needs to be taken into account:
+
+- Negative numbers will not be properly retrieved when querying the indexer.
+- Event values are converted to big floats (from the `big/math` package). The precision of the floating point number is set to the bit length 
+of the integer it is supposed to represent, so that there is no loss of information due to insufficient precision. This was not present before CometBFT v0.38.x and all float values were ignored. 
+- As of CometBFT v0.38.x, queries can contain floating point numbers as well.
+- Note that comparing to floats can be imprecise with a high number of decimals. 
diff --git a/docs/core/subscription.md b/docs/core/subscription.md
@@ -40,6 +40,25 @@ You can also use tags, given you had included them into DeliverTx
 response, to query transaction results. See [Indexing
 transactions](../app-dev/indexing-transactions.md) for details.
 
+## Query parameter and event type restrictions
+
+While CometBFT imposes no restrictions on the application with regards to the type of 
+the event output, there are several considerations that need to be taken into account 
+when querying events with numeric values.
+
+- Queries convert all numeric event values to `big.Float` , provided by `math/big`. Integers
+are converted into a float with a precision equal to the number of bits needed
+to represent this integer. This is done to avoid precision loss for big integers when they 
+are converted with the default precision (`64`). 
+- When comparing two values, if either one of them is a float, the other one will be represented 
+as a big float. Integers are again parsed as big floats with a precision equal to the number 
+of bits required to represent them. 
+- As with all floating point comparisons, comparing floats with decimal values can lead to imprecise 
+results. 
+- Queries cannot include negative numbers 
+
+Prior to version `v0.38.x`, floats were not supported as query parameters. 
+
 ## ValidatorSetUpdates
 
 When validator set changes, ValidatorSetUpdates event is published. The

diff --git a/internal/indexer/indexer_utils.go b/internal/indexer/indexer_utils.go
@@ -0,0 +1,119 @@
+package indexer
+
+import (
+	"fmt"
+	"math/big"
+
+	"github.com/cometbft/cometbft/state/indexer"
+)
+
+// If the actual event value is a float, we get the condition and parse it as a float
+// to compare against
+func compareFloat(op1 *big.Float, op2 interface{}) (int, bool, error) {
+	switch opVal := op2.(type) {
+	case *big.Int:
+		vF := new(big.Float)
+		vF.SetInt(opVal)
+		cmp := op1.Cmp(vF)
+		return cmp, false, nil
+
+	case *big.Float:
+		return op1.Cmp(opVal), true, nil
+	default:
+		return -1, false, fmt.Errorf("unable to parse arguments, bad type: %T", op2)
+	}
+}
+
+// If the event value we compare against the condition (op2) is an integer
+// we convert the int to float with a precision equal to the number of bits
+// needed to represent the integer to avoid rounding issues with floats
+// where 100 would equal to 100.2 because 100.2 is rounded to 100, while 100.7
+// would be rounded to 101.
+func compareInt(op1 *big.Int, op2 interface{}) (int, bool, error) {
+
+	switch opVal := op2.(type) {
+	case *big.Int:
+		return op1.Cmp(opVal), false, nil
+	case *big.Float:
+		vF := new(big.Float)
+		vF.SetInt(op1)
+		return vF.Cmp(opVal), true, nil
+	default:
+		return -1, false, fmt.Errorf("unable to parse arguments, unexpected type: %T", op2)
+	}
+}
+
+func CheckBounds(ranges indexer.QueryRange, v interface{}) (bool, error) {
+	// These functions fetch the lower and upper bounds of the query
+	// It is expected that for x > 5, the value of lowerBound is 6.
+	// This is achieved by adding one to the actual lower bound.
+	// For a query of x < 5, the value of upper bound is 4.
+	// This is achieved by subtracting one from the actual upper bound.
+
+	// For integers this behavior will work. However, for floats, we cannot simply add/sub 1.
+	// Query :x < 5.5 ; x = 5 should match the query. If we subtracted one as for integers,
+	// the upperBound would be 4.5 and x would not match.  Thus we do not subtract anything for
+	// floating point bounds.
+
+	// We can rewrite these functions to not add/sub 1 but the function handles also time arguments.
+	// To be sure we are not breaking existing queries that compare time, and as we are planning to replace
+	// the indexer in the future, we adapt the code here to handle floats as a special case.
+	lowerBound := ranges.LowerBoundValue()
+	upperBound := ranges.UpperBoundValue()
+
+	// *Explanation for the isFloat condition below.*
+	// In LowerBoundValue(), for floating points, we cannot simply add 1 due to the reasons explained in
+	// in the comment at the beginning. The same is true for subtracting one for UpperBoundValue().
+	// That means that for integers, if the condition is >=, cmp will be either 0 or 1
+	// ( cmp == -1 should always be false).
+	// 	But if the lowerBound is a float, we have not subtracted one, so returning a 0
+	// is correct only if ranges.IncludeLowerBound is true.
+	// example int: x < 100; upperBound = 99; if x.Cmp(99) == 0 the condition holds
+	// example float: x < 100.0; upperBound = 100.0; if x.Cmp(100) ==0 then returning x
+	// would be wrong.
+	switch vVal := v.(type) {
+	case *big.Int:
+		if lowerBound != nil {
+			cmp, isFloat, err := compareInt(vVal, lowerBound)
+			if err != nil {
+				return false, err
+			}
+			if cmp == -1 || (isFloat && cmp == 0 && !ranges.IncludeLowerBound) {
+				return false, err
+			}
+		}
+		if upperBound != nil {
+			cmp, isFloat, err := compareInt(vVal, upperBound)
+			if err != nil {
+				return false, err
+			}
+			if cmp == 1 || (isFloat && cmp == 0 && !ranges.IncludeUpperBound) {
+				return false, err
+			}
+		}
+
+	case *big.Float:
+		if lowerBound != nil {
+			cmp, isFloat, err := compareFloat(vVal, lowerBound)
+			if err != nil {
+				return false, err
+			}
+			if cmp == -1 || (cmp == 0 && isFloat && !ranges.IncludeLowerBound) {
+				return false, err
+			}
+		}
+		if upperBound != nil {
+			cmp, isFloat, err := compareFloat(vVal, upperBound)
+			if err != nil {
+				return false, err
+			}
+			if cmp == 1 || (cmp == 0 && isFloat && !ranges.IncludeUpperBound) {
+				return false, err
+			}
+		}
+
+	default:
+		return false, fmt.Errorf("invalid argument type in query: %T", v)
+	}
+	return true, nil
+}
diff --git a/libs/pubsub/query/query.go b/libs/pubsub/query/query.go
@@ -10,8 +10,8 @@ package query
 
 import (
 	"fmt"
+	"math/big"
 	"regexp"
-	"strconv"
 	"strings"
 	"time"
 
@@ -218,13 +218,23 @@ func compileCondition(cond syntax.Condition) (condition, error) {
 	return out, nil
 }
 
-// TODO(creachadair): The existing implementation allows anything number shaped
-// to be treated as a number. This preserves the parts of that behavior we had
-// tests for, but we should probably get rid of that.
+// We use this regex to support queries of the form "8atom", "6.5stake",
+// which are actively used in production.
+// The regex takes care of removing the non-number suffix.
 var extractNum = regexp.MustCompile(`^\d+(\.\d+)?`)
 
-func parseNumber(s string) (float64, error) {
-	return strconv.ParseFloat(extractNum.FindString(s), 64)
+func parseNumber(s string) (*big.Float, error) {
+	intVal := new(big.Int)
+	if _, ok := intVal.SetString(s, 10); !ok {
+		f, _, err := big.ParseFloat(extractNum.FindString(s), 10, 125, big.ToNearestEven)
+		if err != nil {
+			return nil, err
+		}
+		return f, err
+	}
+	f, _, err := big.ParseFloat(extractNum.FindString(s), 10, uint(intVal.BitLen()), big.ToNearestEven)
+	return f, err
+
 }
 
 // A map of operator ⇒ argtype ⇒ match-constructor.
@@ -248,7 +258,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
 		syntax.TNumber: func(v interface{}) func(string) bool {
 			return func(s string) bool {
 				w, err := parseNumber(s)
-				return err == nil && w == v.(float64)
+				return err == nil && w.Cmp(v.(*big.Float)) == 0
 			}
 		},
 		syntax.TDate: func(v interface{}) func(string) bool {
@@ -268,7 +278,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
 		syntax.TNumber: func(v interface{}) func(string) bool {
 			return func(s string) bool {
 				w, err := parseNumber(s)
-				return err == nil && w < v.(float64)
+				return err == nil && w.Cmp(v.(*big.Float)) < 0
 			}
 		},
 		syntax.TDate: func(v interface{}) func(string) bool {
@@ -288,7 +298,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
 		syntax.TNumber: func(v interface{}) func(string) bool {
 			return func(s string) bool {
 				w, err := parseNumber(s)
-				return err == nil && w <= v.(float64)
+				return err == nil && w.Cmp(v.(*big.Float)) <= 0
 			}
 		},
 		syntax.TDate: func(v interface{}) func(string) bool {
@@ -308,7 +318,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
 		syntax.TNumber: func(v interface{}) func(string) bool {
 			return func(s string) bool {
 				w, err := parseNumber(s)
-				return err == nil && w > v.(float64)
+				return err == nil && w.Cmp(v.(*big.Float)) > 0
 			}
 		},
 		syntax.TDate: func(v interface{}) func(string) bool {
@@ -328,7 +338,7 @@ var opTypeMap = map[syntax.Token]map[syntax.Token]func(interface{}) func(string)
 		syntax.TNumber: func(v interface{}) func(string) bool {
 			return func(s string) bool {
 				w, err := parseNumber(s)
-				return err == nil && w >= v.(float64)
+				return err == nil && w.Cmp(v.(*big.Float)) >= 0
 			}
 		},
 		syntax.TDate: func(v interface{}) func(string) bool {

diff --git a/libs/pubsub/query/query_test.go b/libs/pubsub/query/query_test.go
@@ -173,6 +173,84 @@ var apiTypeEvents = []types.Event{
 	},
 }
 
+func TestBigNumbers(t *testing.T) {
+
+	apiBigNumTest := map[string][]string{
+		"big.value": {
+			"99999999999999999999",
+		},
+		"big2.value": {
+			"18446744073709551615", // max(uint64) == 18446744073709551615
+		},
+		"big.floatvalue": {
+			"99999999999999999999.10",
+		},
+		"big2.floatvalue": {
+			"18446744073709551615.6", // max(uint64) == 18446744073709551615
+		},
+	}
+
+	testCases := []struct {
+		s       string
+		events  map[string][]string
+		matches bool
+	}{
+
+		// Test cases for values that exceed the capacity if int64/float64.
+		{`big.value >= 99999999999999999999`,
+			apiBigNumTest,
+			true},
+		{`big.value > 99999999999999999998`,
+			apiBigNumTest,
+			true},
+		{`big2.value <= 18446744073709551615`,
+			apiBigNumTest, true},
+		{`big.floatvalue >= 99999999999999999999`,
+			apiBigNumTest,
+			true},
+		{`big.floatvalue > 99999999999999999998.10`,
+			apiBigNumTest,
+			true},
+		{`big.floatvalue > 99999999999999999998`,
+			apiBigNumTest,
+			true},
+		{`big2.floatvalue <= 18446744073709551615.6`,
+			apiBigNumTest,
+			true},
+		{`big2.floatvalue <= 18446744073709551615.6`,
+			apiBigNumTest,
+			true},
+		{`big2.floatvalue >= 18446744073709551615`,
+			apiBigNumTest,
+			true},
+		{`big2.floatvalue >= 12.5`,
+			apiBigNumTest,
+			true},
+		{`big.value >= 10`,
+			apiBigNumTest,
+			true},
+	}
+
+	for i, tc := range testCases {
+		t.Run(fmt.Sprintf("%02d", i+1), func(t *testing.T) {
+			c, err := query.New(tc.s)
+			if err != nil {
+				t.Fatalf("NewCompiled %#q: unexpected error: %v", tc.s, err)
+			}
+
+			got, err := c.Matches(tc.events)
+			if err != nil {
+				t.Errorf("Query: %#q\nInput: %+v\nMatches: got error %v",
+					tc.s, tc.events, err)
+			}
+			if got != tc.matches {
+				t.Errorf("Query: %#q\nInput: %+v\nMatches: got %v, want %v",
+					tc.s, tc.events, got, tc.matches)
+			}
+		})
+	}
+}
+
 func TestCompiledMatches(t *testing.T) {
 	var (
 		txDate = "2017-01-01"

diff --git a/libs/pubsub/query/syntax/parser.go b/libs/pubsub/query/syntax/parser.go
@@ -3,8 +3,7 @@ package syntax
 import (
 	"fmt"
 	"io"
-	"math"
-	"strconv"
+	"math/big"
 	"strings"
 	"time"
 )
@@ -68,17 +67,35 @@ func (a *Arg) String() string {
 	}
 }
 
-// Number returns the value of the argument text as a number, or a NaN if the
+// Number returns the value of the argument text as a number, or nil if the
 // text does not encode a valid number value.
-func (a *Arg) Number() float64 {
+func (a *Arg) Number() *big.Float {
 	if a == nil {
-		return -1
+		return nil
 	}
-	v, err := strconv.ParseFloat(a.text, 64)
-	if err == nil && v >= 0 {
-		return v
+	intVal := new(big.Int)
+	if _, ok := intVal.SetString(a.text, 10); !ok {
+		f, _, err := big.ParseFloat(a.text, 10, 125, big.ToNearestEven)
+		if err != nil {
+			return nil
+		}
+		return f
+	}
+	// If it is indeed a big integer, we make sure to convert it to a float with enough precision
+	// to represent all the bits
+	bitLen := uint(intVal.BitLen())
+	var f *big.Float
+	var err error
+	if bitLen <= 64 {
+		f, _, err = big.ParseFloat(a.text, 10, 0, big.ToNearestEven)
+	} else {
+		f, _, err = big.ParseFloat(a.text, 10, bitLen, big.ToNearestEven)
+	}
+	if err != nil {
+		return nil
 	}
-	return math.NaN()
+	return f
+
 }
 
 // Time returns the value of the argument text as a time, or the zero value if