Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usize does not work well for de/serialization. #12

Closed
Carterj3 opened this issue Feb 28, 2020 · 4 comments · Fixed by #14
Closed

usize does not work well for de/serialization. #12

Carterj3 opened this issue Feb 28, 2020 · 4 comments · Fixed by #14

Comments

@Carterj3
Copy link

Carterj3 commented Feb 28, 2020

WASM is 32bit always (afaik).

So if you serialize a FastGraph on a 64bit platform the resulting bytes will not be deserializable in a WASM binary.

Is there a real need for usize or would you be amenable to a more concrete type? Or do you know of some way to tell serde that usize should read 8 bytes (u64) when it's decoding into a u32?

@easbar
Copy link
Owner

easbar commented Feb 28, 2020

I don't think its absolutely necessary to use usize, but it also seems a bit odd to be limited to 32bits 'just' because of web assembly. Then again web assembly compatibility seems useful. Maybe it'd be best to make the integer size configurable somehow? However, I think the main reason I was using usize is that many of the numbers stored in FastGraph are used as array indices, so not sure if this is possible.

Or do you know of some way to tell serde that usize should read 8 bytes (u64) when it's decoding into a u32?

I don't but it shouldn't be hard to write some code that trims all usizes to u32s (unless you are using really large graphs that should be no problem) and then serializes the FastGraph in some 32bit compatible way?

@dabreegster any ideas about this?

@dabreegster
Copy link
Contributor

dabreegster commented Feb 28, 2020

I actually hit this problem a few days ago and hacked around it like this: dabreegster@21d95f6. As long as you don't have more than 2^32 nodes or edges or values for weight, this works fine. An alternative would be using Option in lots of places to represent invalid cases. This is a bit less memory efficient, which might matter in something performance-sensitive like fast_paths. I haven't benchmarked that yet. Even with Option, there would be problems if something bigger than 2^32 was serialized, then read back in wasm32.

If nobody has use cases for more than 2**32 yet, my vote is to go for the quick change and document the limitation.

@easbar
Copy link
Owner

easbar commented Feb 28, 2020

as you don't have more than 2^32 nodes or edges or values for weight, this works fine

Ok so you just changed the special values INVALID_NODE/EDGE and MAX_WEIGHT to u32 max instead of usize max. This way there really aren't any integers larger than 2^32 in the program (unless you had very large graphs). But changing the actualy types to u32 would be a much bigger effort right? The problem is that we often need to do something like this: arr1[arr2[index]], i.e. we are using a value stored in one array as an index into another array and this requires the usize type.

An alternative would be using Option in lots of places

I understand Option can be used as a function return type for values that might be missing, but how would it be useful to store special values in an array (like INVALID_EDGE)?

@dabreegster
Copy link
Contributor

But changing the actualy types to u32 would be a much bigger effort right?

Right, so I don't think it's worth doing this.

how would it be useful to store special values in an array

There are places in the code that do something like self.fast_graph.edges_fwd[edge_id].replaced_in_edge = INVALID_EDGE or while self.data_fwd[node].inc_edge != INVALID_EDGE. Instead of using a sentinel value, Option with None representing invalid could be a little more clear. A bigger example:

if c == INVALID_NODE {
    self.fast_graph.edges_fwd[edge_id].replaced_in_edge = INVALID_EDGE;
    self.fast_graph.edges_fwd[edge_id].replaced_out_edge = INVALID_EDGE;
} else {
    self.fast_graph.edges_fwd[edge_id].replaced_in_edge = self.get_in_edge_id(c, i); 
    self.fast_graph.edges_fwd[edge_id].replaced_out_edge =
        self.get_out_edge_id(c, self.fast_graph.edges_fwd[edge_id].adj_node);
}`

could be rewritten:

`if let Some(c) = self.center_nodes_fwd[edge_id] {
    self.fast_graph.edges_fwd[edge_id].replaced_in_edge = self.get_in_edge_id(c, i); 
    self.fast_graph.edges_fwd[edge_id].replaced_out_edge =
        self.get_out_edge_id(c, self.fast_graph.edges_fwd[edge_id].adj_node);
} else {
    self.fast_graph.edges_fwd[edge_id].replaced_in_edge = INVALID_EDGE;
    self.fast_graph.edges_fwd[edge_id].replaced_out_edge = INVALID_EDGE;
}`

But I don't think it adds that much clarity, and it'll only hurt performance.

I'll clean up my change and send a PR soon, unless there are better ideas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants