New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferred schemas treat integers as floats, may silently alter data #2377

Closed
aphyr opened this Issue May 8, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@aphyr

aphyr commented May 8, 2018

Since at least 1.0.2 and through 1.0.5-dev 5b93fb4, Dgraph will infer the type of new predicates with integer values as float, which means that unless users specify a schema up front, users could write 0, and read back 0.0. In languages with aggressive numeric type coercion, this may work fine, until users attempt to write a number which is not cleanly representable as a float. For instance, if one writes 9007199254740993 to a predicate without a schema, then attempts to read that value back, dgraph will return 9007199254740992 instead. Write 27670116110564327426, and 2.7670116110564327E19 comes back--426 less than the value written.

You can reproduce this with Jepsen 56dce4d5b875bc2eec841564f865b72168c91938 by running

lein run test --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --force-download --workload types --nemesis none --sequencing server --time-limit 500

@manishrjain manishrjain added the wontfix label May 17, 2018

@manishrjain

This comment has been minimized.

Show comment
Hide comment
@manishrjain

manishrjain May 17, 2018

Member

This isn't a Dgraph thing. Go parses JSON integers as float64, which is what is causing this issue. You can see an example here:

https://play.golang.org/p/gCvBHpNpsVG

Update: A way to avoid this would be to send the data in RDF format. In JSON, as a hex encoded string with the schema set to int upfront.

Member

manishrjain commented May 17, 2018

This isn't a Dgraph thing. Go parses JSON integers as float64, which is what is causing this issue. You can see an example here:

https://play.golang.org/p/gCvBHpNpsVG

Update: A way to avoid this would be to send the data in RDF format. In JSON, as a hex encoded string with the schema set to int upfront.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr May 17, 2018

I've been trying out hex encoded strings, but I haven't figured out how to encode them properly yet--"0x123" is rejected with "invalid syntax". Strings of digits without 0x in front ("123") are treated as decimal integers. Strings with hex letters ("1a") are rejected with "invalid syntax".

aphyr commented May 17, 2018

I've been trying out hex encoded strings, but I haven't figured out how to encode them properly yet--"0x123" is rejected with "invalid syntax". Strings of digits without 0x in front ("123") are treated as decimal integers. Strings with hex letters ("1a") are rejected with "invalid syntax".

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr May 17, 2018

I can confirm that N-Quads writes round-trip correctly! That makes it look like Go's JSON library can write values it can't correctly read.

aphyr commented May 17, 2018

I can confirm that N-Quads writes round-trip correctly! That makes it look like Go's JSON library can write values it can't correctly read.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr May 17, 2018

Yup. Here's a demo. You miiight want to choose a different JSON parser, or issue warnings to users when reading values that may not have round-tripped correctly. https://play.golang.org/p/kut6IgUn0r3

package main

import (
	"encoding/json"
	"strconv"
	"fmt"
	"log"
)

func main() {
	fmt.Println("vim-go")

	var val int64
	val = 9223372036854775296

	fmt.Println("Wrote int64   " + strconv.FormatInt(val, 10))

	data := []byte(`{"hi": ` + strconv.FormatInt(val, 10) + `}`)

	m := make(map[string]interface{})
	if err := json.Unmarshal(data, &m); err != nil {
		log.Fatal(err)
	}
	for _, v := range m {
		// fmt.Printf("Key: %s. ", k)
		switch v.(type) {
		case int:
			fmt.Println("Read int:    ", v)
		case float64:
			fmt.Println("Read float64 ", v)
			f, _ := v.(float64)
			fmt.Println("As int64     ", int64(f))
		default:
			fmt.Println("Type unknown")
		}
	}
}
Wrote int64   9223372036854775296
Read float64  9.223372036854776e+18
As int64      -9223372036854775808

aphyr commented May 17, 2018

Yup. Here's a demo. You miiight want to choose a different JSON parser, or issue warnings to users when reading values that may not have round-tripped correctly. https://play.golang.org/p/kut6IgUn0r3

package main

import (
	"encoding/json"
	"strconv"
	"fmt"
	"log"
)

func main() {
	fmt.Println("vim-go")

	var val int64
	val = 9223372036854775296

	fmt.Println("Wrote int64   " + strconv.FormatInt(val, 10))

	data := []byte(`{"hi": ` + strconv.FormatInt(val, 10) + `}`)

	m := make(map[string]interface{})
	if err := json.Unmarshal(data, &m); err != nil {
		log.Fatal(err)
	}
	for _, v := range m {
		// fmt.Printf("Key: %s. ", k)
		switch v.(type) {
		case int:
			fmt.Println("Read int:    ", v)
		case float64:
			fmt.Println("Read float64 ", v)
			f, _ := v.(float64)
			fmt.Println("As int64     ", int64(f))
		default:
			fmt.Println("Type unknown")
		}
	}
}
Wrote int64   9223372036854775296
Read float64  9.223372036854776e+18
As int64      -9223372036854775808
@manishrjain

This comment has been minimized.

Show comment
Hide comment
@manishrjain

manishrjain May 17, 2018

Member

Following up on #2378.

Member

manishrjain commented May 17, 2018

Following up on #2378.

@manishrjain manishrjain added cleanup and removed wontfix labels May 17, 2018

manishrjain added a commit that referenced this issue May 18, 2018

Ability to correctly distinguish float from int. (#2398)
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .

manishrjain added a commit that referenced this issue May 18, 2018

Ability to correctly distinguish float from int. (#2398)
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment