Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferred schemas treat integers as floats, may silently alter data #2377

Closed
aphyr opened this issue May 8, 2018 · 5 comments
Closed

Inferred schemas treat integers as floats, may silently alter data #2377

aphyr opened this issue May 8, 2018 · 5 comments

Comments

@aphyr
Copy link

@aphyr aphyr commented May 8, 2018

Since at least 1.0.2 and through 1.0.5-dev 5b93fb4, Dgraph will infer the type of new predicates with integer values as float, which means that unless users specify a schema up front, users could write 0, and read back 0.0. In languages with aggressive numeric type coercion, this may work fine, until users attempt to write a number which is not cleanly representable as a float. For instance, if one writes 9007199254740993 to a predicate without a schema, then attempts to read that value back, dgraph will return 9007199254740992 instead. Write 27670116110564327426, and 2.7670116110564327E19 comes back--426 less than the value written.

You can reproduce this with Jepsen 56dce4d5b875bc2eec841564f865b72168c91938 by running

lein run test --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --force-download --workload types --nemesis none --sequencing server --time-limit 500
@manishrjain manishrjain added the wontfix label May 17, 2018
@manishrjain
Copy link
Member

@manishrjain manishrjain commented May 17, 2018

This isn't a Dgraph thing. Go parses JSON integers as float64, which is what is causing this issue. You can see an example here:

https://play.golang.org/p/gCvBHpNpsVG

Update: A way to avoid this would be to send the data in RDF format. In JSON, as a hex encoded string with the schema set to int upfront.

@aphyr
Copy link
Author

@aphyr aphyr commented May 17, 2018

I've been trying out hex encoded strings, but I haven't figured out how to encode them properly yet--"0x123" is rejected with "invalid syntax". Strings of digits without 0x in front ("123") are treated as decimal integers. Strings with hex letters ("1a") are rejected with "invalid syntax".

@aphyr
Copy link
Author

@aphyr aphyr commented May 17, 2018

I can confirm that N-Quads writes round-trip correctly! That makes it look like Go's JSON library can write values it can't correctly read.

@aphyr
Copy link
Author

@aphyr aphyr commented May 17, 2018

Yup. Here's a demo. You miiight want to choose a different JSON parser, or issue warnings to users when reading values that may not have round-tripped correctly. https://play.golang.org/p/kut6IgUn0r3

package main

import (
	"encoding/json"
	"strconv"
	"fmt"
	"log"
)

func main() {
	fmt.Println("vim-go")

	var val int64
	val = 9223372036854775296

	fmt.Println("Wrote int64   " + strconv.FormatInt(val, 10))

	data := []byte(`{"hi": ` + strconv.FormatInt(val, 10) + `}`)

	m := make(map[string]interface{})
	if err := json.Unmarshal(data, &m); err != nil {
		log.Fatal(err)
	}
	for _, v := range m {
		// fmt.Printf("Key: %s. ", k)
		switch v.(type) {
		case int:
			fmt.Println("Read int:    ", v)
		case float64:
			fmt.Println("Read float64 ", v)
			f, _ := v.(float64)
			fmt.Println("As int64     ", int64(f))
		default:
			fmt.Println("Type unknown")
		}
	}
}
Wrote int64   9223372036854775296
Read float64  9.223372036854776e+18
As int64      -9223372036854775808
@manishrjain
Copy link
Member

@manishrjain manishrjain commented May 17, 2018

Following up on #2378.

manishrjain added a commit that referenced this issue May 18, 2018
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .
manishrjain added a commit that referenced this issue May 18, 2018
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.