Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigInt for i64 and u64 #24

Merged
merged 19 commits into from
Dec 24, 2021
Merged

Support BigInt for i64 and u64 #24

merged 19 commits into from
Dec 24, 2021

Conversation

mfish33
Copy link
Contributor

@mfish33 mfish33 commented Dec 19, 2021

This is the same as #22. I ended up needing this for a project I am working on so I decided to implement it. Let me know if there are any changes that are needed. Since this is a breaking change for the consumers of the API, should this be behind a feature flag?

src/de.rs Outdated
use serde::{de, serde_if_integer128};
use js_sys::{Array, ArrayBuffer, BigInt, JsString, Number, Object, Reflect, Symbol, Uint8Array};
use serde::{
de::{self},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
de::{self},
de,

src/de.rs Outdated

fn deserialize_small_signed<'de, V: de::Visitor<'de>>(&self, visitor: V) -> Result<V::Value> {
match self.as_safe_integer() {
Some(v) => visitor.visit_i32(v as _),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. With as you're casting away the whole range between 2^32...2^53 which will lead to acceptance of invalid numbers for the target data type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. My bad

src/de.rs Outdated
}
let big_int: BigInt = self.value.clone().into();
// Safe to do unwraps since we know the type is a bigint
let rust_string = big_int.to_string(10).unwrap().as_string().unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it's pretty wasteful to go via string conversion/passing/parsing just to pass numbers. Since there's no built-in API in BigInt to get 64-bit integer yet, I'd declare my own binding here to get number directly and use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is any way to do this safely. When converting an i64 into a JsValue it goes first into a string https://rustwasm.github.io/wasm-bindgen/api/src/wasm_bindgen/lib.rs.html#853. I think the same has to be done in the reverse.

A possible optimization is to first check if the number is a safe integer and use the normal f64 conversion if so.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah that's bad.

Emscripten does this by either passing 2x 32-bit integers (in standard mode) or just by passing 64-bit integer directly (when using -s WASM_BIGINT, aka https://github.com/WebAssembly/JS-BigInt-integration proposal). I think this should be doable in wasm-bindgen as well, either by making a PR upstream or doing something similar here inline.

Do you want to give it a try to make an upstream PR? If not, I might give it a go.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to give it a try to make an upstream PR? If not, I might give it a go.

You know what, I'm now tempted enough that I'll give it a go.

Copy link
Owner

@RReverser RReverser Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe for now it's better to do it here and then upstream. Anyway, my idea is that you can add bindings like this:

#[wasm_bindgen]
extern {
  #[wasm_bindgen(js_name = BigInt)]
  fn from_u64(x: u64) -> BigInt;

  #[wasm_bindgen(js_name = BigInt)]
  fn from_i64(x: i64) -> BigInt;

  ...
}

Those will bind to the JS BigInt function with different signatures and use wasm-bindgen's native support for passing 64-bit integers as the argument. This will avoid string representation altogether, so the bound BigInt function will behave just like a no-op identity function, retrieving bigint and returning one.

Similarly, for the opposite conversion you can do

#[wasm_bindgen(js_name = BigInt)]
fn to_u64(x: BigInt) -> u64;

...

and check that the original number didn't get truncated during conversion.

Copy link
Contributor Author

@mfish33 mfish33 Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I made those changes. The bindings seem to work and do not lose any precision. Do you know if wam_bindgen's function support for u64/i64 uses strings or the native BigInt support?

Do you want to give it a try to make an upstream PR? If not, I might give it a go.

I think you probably should do that as I am much less familiar with wasm_bindgen.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't use strings, but doesn't use native BigInt support either yet AFAIK - instead, it passes two 32-bit integers and reassembles them on the other side. This should be still a lot faster than string conversion & allocation.

@RReverser
Copy link
Owner

For the serializer, it will need an option as it's a breaking change. For the deserializer though, I'm tempted to say it's safe and easier to just allow both BigInt and regular numbers as a source for 64-bit integers.

- fixed loss of percision
- tests for feature flags
@mfish33
Copy link
Contributor Author

mfish33 commented Dec 21, 2021

I made the deserializer work for both BigInts as well as regular JS numbers

src/ser.rs Outdated
@@ -267,7 +267,13 @@ impl<'s> ser::Serializer for &'s Serializer {
serialize_str(&str);
}

// TODO: we might want to support `BigInt` here in the future.
#[cfg(feature="bigint_serialization")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making this a compile-time feature, that will affect all usages of serde-wasm-bindgen across the entire crate tree, can you make this a runtime option on Serializer like there is already serialize_maps_as_objects? It would allow each user to decide which format they prefer.

- runtime option to do BigInt conversion
README.md Outdated Show resolved Hide resolved
src/de.rs Outdated
if self.value.is_bigint() {
let big_int: BigInt = self.value.clone().into();
// Safe to do unwraps since we know the type is a bigint
let rust_string = big_int.to_string(10).unwrap().as_string().unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still using strings, but shouldn't anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

src/ser.rs Outdated
@@ -214,6 +215,7 @@ impl ser::SerializeStruct for ObjectSerializer<'_> {
#[derive(Default)]
pub struct Serializer {
serialize_maps_as_objects: bool,
serialize_64_bit_numbers_as_big_int:bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(here and around) please make sure to run cargo fmt in the end, or, better yet, you might want to configure format-on-save in your editor.

mfish33 and others added 2 commits December 21, 2021 15:01
Co-authored-by: Ingvar Stepanyan <me@rreverser.com>
- fixed use of string conversion
tests/serde.rs Outdated Show resolved Hide resolved
tests/serde.rs Outdated
(-(MAX_SAFE_INTEGER + 1)).serialize(&big_int_serializer).unwrap();

// Handle extreme values
std::i64::MIN.serialize(&big_int_serializer).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about missing assert_eq! or something.

tests/serde.rs Outdated
(MAX_SAFE_INTEGER + 1)
.serialize(&big_int_serializer)
.unwrap();
std::u64::MAX.serialize(&big_int_serializer).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add a few tests for conversion from a BigInt that doesn't actually fit into 64 bits? (you can create one from a string, it's fine to do so in tests)

It should throw an error, but I suspect right now it's not handled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a js file to do validation on the BigInt's. I think it is the easiest and lowest overhead way of handling the bounds check. It replaces the to_i64 and to_u64 bindings used previously

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the JS integration doesn't work in all wasm-bindgen modes, see Caveats. While I agree it would be lower overhead, I'd rather keep this crate working for all output modes, as otherwise it's a pretty significant breaking change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the initial comment where I proposed the bindings idea, you can just compare the resulting u64 with the original BigInt instance. It will require one more roundtrip to JS but shouldn't be that bad and will be working in all modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the JS integration doesn't work in all wasm-bindgen modes, see Caveats. While I agree it would be lower overhead, I'd rather keep this crate working for all output modes, as otherwise it's a pretty significant breaking change.

I missed the lack of nodejs support while reading that page of the docs. I appreciate all of the help, I am pretty new to wasm-bindgen.

src/ser.rs Outdated Show resolved Hide resolved
src/de.rs Outdated
match self.as_safe_integer() {
Some(v) => visitor.visit_i64(v),
None => self.invalid_type(visitor),
if self.value.is_bigint() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use .dyn_into() here instead to combine the is_* check and .into() cast (which does its own check internally) into one.

@RReverser
Copy link
Owner

Slightly off-topic for this PR, but what about i128 and u128? Currently those use the i64/u64 paths, but if we add support for BigInt anyway, should they perhaps also be properly supported?

@mfish33
Copy link
Contributor Author

mfish33 commented Dec 22, 2021

I have given it some thought and I am unsure what the situation looks like for bindings for i128 and u128. Are they the same as i64 and u64 where 32-bit ints are used and assembled? If so it shouldn't be super hard to add support for them. Would you like me to add it to this PR?

@RReverser
Copy link
Owner

I have given it some thought and I am unsure what the situation looks like for bindings for i128 and u128. Are they the same as i64 and u64 where 32-bit ints are used and assembled? If so it shouldn't be super hard to add support for them. Would you like me to add it to this PR?

I thought so, but looks like they're not yet correctly supported by wasm-bindgen, so they would have to either go via strings or be implemented upstream first.

Sine it's a bit more complicated, I think we can keep their implementation out of this PR, unless you want to implement string-based option for them for now - up to you.

Either way, I'd probably at least rename the option to serialize_large_number_types_as_bigints for future-proofing and maybe add tests to make sure that the option does affect i128/u128 and they end up represented as BigInts too.

- removed js module bindings
- added tests for i128, u128
@mfish33
Copy link
Contributor Author

mfish33 commented Dec 22, 2021

Ok, I enabled serialization support for i128 and u128 when serialize_large_number_types_as_bigints is enabled. In addition, I added tests to test both the default behavior as well as when the feature is enabled. The tests are not as extensive as the other types but can be expanded in another PR that adds full support for i_128 and u_128.

src/de.rs Outdated
if &bindings::big_int_from_i64(converted_number) == big_int {
visitor.visit_i64(converted_number)
} else {
Err(de::Error::custom("i64 attempted to be constructed from Bigint that was larger than i64::MAX or less than i64::MIN"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's BigInt (same in the other message below).

src/ser.rs Outdated
@@ -215,7 +215,7 @@ impl ser::SerializeStruct for ObjectSerializer<'_> {
#[derive(Default)]
pub struct Serializer {
serialize_maps_as_objects: bool,
serialize_64_bit_numbers_as_big_int: bool,
serialize_large_number_types_as_big_ints: bool,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nitpicking, but JS uses bigint for primitive type, so I think we should write them together in the API too.

src/ser.rs Outdated
if self.serialize_large_number_types_as_big_ints {
Ok(JsValue::from(v))
} else {
Err(Error::custom("i128 is not supported"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't this fallback to JS number when option isn't activated? I'm of two minds on this, perhaps it's fine for it to error out, but then it would be good to at least expand the error message to clarify that it's supported, just needs to be enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The current behavior is to error so I was going to uphold that. In addition, I don't think there are many cases where it would be a good idea to convert a 128-bit integer to a 53 bit one (Js Number).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah erroring is fine if that's current behaviour, just saying the error message probably needs to be clearer and nudge the user in the right direction.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even something like "i128 is supported only in serialize_large_number_types_as_big_ints mode" would already go a long way.

Copy link
Contributor Author

@mfish33 mfish33 Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad fixed the error in one place and not the other. It is updated now

tests/serde.rs Outdated
{
const MAX_SAFE_INTEGER: i64 = 9_007_199_254_740_991;

// Should be bigint
0_i64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: instead of converting, you can probably just do assert!(expr.is_bigint()). Or does this one provide a better error message or something?

@RReverser
Copy link
Owner

This is looking good! Left last round of nitpicks, but I think we're close :)

README.md Show resolved Hide resolved
Copy link
Owner

@RReverser RReverser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added slight rewordings to error messages. Otherwise I think this is good to go now.

src/de.rs Outdated Show resolved Hide resolved
src/de.rs Outdated Show resolved Hide resolved
src/ser.rs Outdated Show resolved Hide resolved
src/ser.rs Outdated Show resolved Hide resolved
mfish33 and others added 6 commits December 23, 2021 18:13
Co-authored-by: Ingvar Stepanyan <me@rreverser.com>
Co-authored-by: Ingvar Stepanyan <me@rreverser.com>
Co-authored-by: Ingvar Stepanyan <me@rreverser.com>
Co-authored-by: Ingvar Stepanyan <me@rreverser.com>
tests/serde.rs Outdated
@@ -218,7 +218,7 @@ fn numbers() {
// Big ints that are too large or small should error
let u64_max_bigint = std::u64::MAX.serialize(&bigint_serializer).unwrap();
from_value::<i64>(u64_max_bigint).unwrap_err();
let negative_one = -1_i64.serialize(&bigint_serializer).unwrap();
let negative_one = -(1_i64.serialize(&bigint_serializer).unwrap());
let to_small = std::u64::MAX.serialize(&bigint_serializer).unwrap() * negative_one;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo for too_small?

tests/serde.rs Outdated
@@ -218,7 +218,7 @@ fn numbers() {
// Big ints that are too large or small should error
let u64_max_bigint = std::u64::MAX.serialize(&bigint_serializer).unwrap();
from_value::<i64>(u64_max_bigint).unwrap_err();
let negative_one = -1_i64.serialize(&bigint_serializer).unwrap();
let negative_one = -(1_i64.serialize(&bigint_serializer).unwrap());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you didn't mean the opposite - (-1_i64).serialize(...)? I'm not sure if negation operator is overloaded for JsValue.

Copy link
Owner

@RReverser RReverser Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, if you use those values only as temporaries and not for testing serialization, you might as well just create BigInt::from(...) directly like BigInt::from(-i128::from(std::u64::MAX)).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if we go there, even BigInt::from(i128::MIN) since you only care about value being out of bounds of i64/u64 and that one is the easiest to get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Changed

@RReverser RReverser changed the title Make i64 and u64 types use bigint for serialization and deserialization Support BigInt for i64 and u64 Dec 24, 2021
@RReverser RReverser merged commit 8c42024 into RReverser:master Dec 24, 2021
@RReverser
Copy link
Owner

Thanks a lot!

@RReverser
Copy link
Owner

Released in 0.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants