Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Marshaling int64 or unit64 arrays to JSON can cause loss of data #35948

Closed
erezrokah opened this issue Jun 6, 2023 · 1 comment · Fixed by #35950
Closed

[Go] Marshaling int64 or unit64 arrays to JSON can cause loss of data #35948

erezrokah opened this issue Jun 6, 2023 · 1 comment · Fixed by #35950
Assignees
Milestone

Comments

@erezrokah
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

When marshaling array.Int64 or array.Uint64 to JSON the original int64 or unit64 values can be lost.
The cause is that the values are casted to float64 before marshaling, and that type is not big enough to hold all possible int64 or unit64 values.

Relevant code:

vals[i] = float64(a.values[i]) // prevent uint8 from being seen as binary data

vals[i] = float64(a.values[i]) // prevent uint8 from being seen as binary data

Adding the following tests to numeric_test.go result in failures:

func TestInt64MarshalJSON(t *testing.T) {
	pool := memory.NewCheckedAllocator(memory.NewGoAllocator())
	defer pool.AssertSize(t, 0)

	var (
		vs = []int64{-5474557666971701248}
	)

	b := array.NewInt64Builder(pool)
	defer b.Release()

	for _, v := range vs {
		b.Append(v)
	}

	arr := b.NewArray().(*array.Int64)
	defer arr.Release()

	jsonBytes, err := json.Marshal(arr)
	if err != nil {
		t.Fatal(err)
	}
	got := string(jsonBytes)
	want := `[-5474557666971701248]`
	if got != want {
		t.Fatalf("got=%s, want=%s", got, want)
	}
}

func TestUInt64MarshalJSON(t *testing.T) {
	pool := memory.NewCheckedAllocator(memory.NewGoAllocator())
	defer pool.AssertSize(t, 0)

	var (
		vs = []uint64{14697929703826477056}
	)

	b := array.NewUint64Builder(pool)
	defer b.Release()

	for _, v := range vs {
		b.Append(v)
	}

	arr := b.NewArray().(*array.Uint64)
	defer arr.Release()

	jsonBytes, err := json.Marshal(arr)
	if err != nil {
		t.Fatal(err)
	}
	got := string(jsonBytes)
	want := `[14697929703826477056]`
	if got != want {
		t.Fatalf("got=%s, want=%s", got, want)
	}
}

A possible solution is to only do the casting to float64 when the array is of unit8 like the comment says, or skip the casting for int64 and unit64 arrays

Component(s)

Go

@erezrokah
Copy link
Contributor Author

take

zeroshade pushed a commit that referenced this issue Jun 6, 2023
…rshaling arrays (#35950)

### Rationale for this change

See [issue](#35948).

Based on https://github.com/apache/arrow/blob/9fb8697dcb442f63317c7d6046393fb74842e0ae/go/arrow/array/numeric.gen.go.tmpl#L120 it seems we should only do the casting for `unit8` and `int8` types.

### What changes are included in this PR?

This PR fixes the numeric template so it only casts `int8` and `unit8` to `float64` when JSON marshaling arrays to avoid losing data for other types.

### Are these changes tested?

Yes

### Are there any user-facing changes?

The user facing change is that if a user was JSON marshaling arrays that has `uint64` or `int64` values they can now get different (more accurate) values than before 

* Closes: #35948

Authored-by: erezrokah <erezrokah@users.noreply.github.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 13.0.0 milestone Jun 6, 2023
zeroshade pushed a commit to zeroshade/arrow that referenced this issue Apr 11, 2024
…SON marshaling arrays (apache#35950)

### Rationale for this change

See [issue](apache#35948).

Based on https://github.com/apache/arrow/blob/9fb8697dcb442f63317c7d6046393fb74842e0ae/go/arrow/array/numeric.gen.go.tmpl#L120 it seems we should only do the casting for `unit8` and `int8` types.

### What changes are included in this PR?

This PR fixes the numeric template so it only casts `int8` and `unit8` to `float64` when JSON marshaling arrays to avoid losing data for other types.

### Are these changes tested?

Yes

### Are there any user-facing changes?

The user facing change is that if a user was JSON marshaling arrays that has `uint64` or `int64` values they can now get different (more accurate) values than before

* Closes: apache#35948

Authored-by: erezrokah <erezrokah@users.noreply.github.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
(cherry picked from commit 745fa94)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants