Generating a vec of a random length #130

sunshowers · 2019-02-11T21:02:13Z

Hi~! Thanks for writing this library -- as someone that's used QuickCheck extensively, the Proptest shrinking strategy (with what I think of as "structural shrinking") makes a ton more sense and makes it a pleasure to write :)

I'm trying to figure out exactly how collection strategies work. In particular, for a fixed size provided to the generator, what's the difference between:

vec(any::<T>(), 1..size)

and

(1..size).prop_flat_map(|len| vec(any::<T>(), len))

Would generation and shrinking still continue to work?

The context is that I need to set up a cyclic graph with multiple kinds of nodes, and wanted to generate the number of elements in each vector of nodes separately from generating the elements themselves.

Thank you 💜

The text was updated successfully, but these errors were encountered:

sunshowers · 2019-02-11T21:26:20Z

Oh, I wanted to mention that I could probably use a long chain of prop_flat_maps for this, but there's like 8 or 9 kinds of nodes in the graph and that's super unwieldy :(

Eh2406 · 2019-02-11T21:32:23Z

Non expert opinion:

Would generation and shrinking still continue to work?

Yes, but the performance will be very different. They both generate similarly, picking a number between 1 and size - 1 then filling a vec until it has that number of elements. Shrinking is very different. The vec( _ , 1..size) version knows that removing an item is a way to shrink the size. The prop_flat_map version does not. So when it tries to shrink the vec by deleting items it can't as it will no longer be len long. So it has to shrink the len directly, but the inner callback doesn't know what values it had before, so it gens a large number of vec( _ , len) until it finds a new failing case. So avoid prop_flat_map if shrinking performance is a concern.

I built the proptests for cargos resolver, witch involved making an DAG. I don't think I have enough of the context to give concrete advice, in your case. If you can share more about what are the Elements and what are the Nodes, I can try to pass on the advice I received.

sunshowers · 2019-02-11T22:09:10Z

@Eh2406 thank you for answering!

Yes, but the performance will be very different. They both generate similarly, picking a number between 1 and size - 1 then filling a vec until it has that number of elements. Shrinking is very different. The vec( _ , 1..size) version knows that removing an item is a way to shrink the size. The prop_flat_map version does not. So when it tries to shrink the vec by deleting items it can't as it will no longer be len long. So it has to shrink the len directly, but the inner callback doesn't know what values it had before, so it gens a large number of vec( _ , len) until it finds a new failing case. So avoid prop_flat_map if shrinking performance is a concern.

OK, yeah, that's bad. But on the other hand this might actually be necessary, since removing a node from a graph might produce an invalid graph unless any nodes that point to it are also removed.

I built the proptests for cargos resolver, witch involved making an DAG. I don't think I have enough of the context to give concrete advice, in your case. If you can share more about what are the Elements and what are the Nodes, I can try to pass on the advice I received.

I'm not at liberty to share the full context here, unfortunately. But here's what I can say:

Note that this is a cyclic graph, not a DAG.
There are several kinds of nodes, and three main categories:
- Some kinds of nodes (I'll call them "base nodes") do not have any pointers out of them. These are easy to generate so they aren't the issue.
- Some other kinds of nodes have pointers out to inherent nodes (only). These also aren't an issue.
- The remaining kinds of nodes have a fairly complex nest of pointers among them and out to the other kinds of nodes.

The actual structure in terms of kinds of nodes and what nodes point to what is defined upfront.

If I could generate the vector lengths separately from the actual nodes themselves, I could do something like:

Generate base nodes + lengths of other kinds of nodes.
For each non-base kind of node, for each node within the kind, assign pointers to other kinds of nodes by picking a random value between 0 and the length of the other kind of node.

What do you think? (Also, could you link to the proptests you wrote? Thank you 💙)

Eh2406 · 2019-02-11T23:51:52Z

The proptest for cargo are in resolve.rs and the generators are in support/resolver.rs. I may be the only person to try and read that code, so suggestions are very welcome. Including "I don't understand line number x, can you add a comment explaining it."

Fundamentally the generator has 2 parts, one that builds up the essential structure, and a prop_map that converts it into the correct types. The most useful trick is the proptest::Index type, it is designed to generate an index into a list before the len of the list is known. This lets the representation of the final graph be different from the essential graph. For cargo the real graph uses (Name, Ver) pares as a unique ID int a hashmap, but trying to generate that directly was very frustrating. So we generate a list of names, and an adjacency list of edges. Then two inforce the invariants we can either just ignore a row in the list that would break it, or fix that row. (For a DAG we make sure that the index of the pointe is greater than the index of the pointer. And flip if its not.)

Most of that may be besides the point as I think your suggestion will work with the proptest::Index for picking the random value.

sunshowers · 2019-02-12T00:51:25Z

Ahh, I didn't know about Index -- looks promising, thanks!

sunshowers · 2019-02-15T17:50:10Z

Yep, once I had Index available it was pretty easy to get working. I generated vectors of intermediate data structures with Index instead of integer pointers, then transformed them into the final data structures once I had all the lengths available.

Thank you! Should I leave this open as a doc request? Making Index more prominent would be a huge help in such situations.

AltSysrq · 2019-03-24T22:20:36Z

Sorry for not following up on this. I'm going to close the issue itself since the question has been addressed.

Do you have any suggestions for how Index could have been made more prominent for you? I'm all for improving that but I also don't want to clutter the documentation.

sunshowers · 2019-03-25T17:18:38Z

Possibly adding a note to https://docs.rs/proptest/0.9.1/proptest/strategy/trait.Strategy.html#method.prop_flat_map, though I do know that the documentation for it is already pretty long.

AltSysrq added the help-request This issue is asking for advice/help on using proptest label Feb 12, 2019

AltSysrq closed this as completed Mar 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating a vec of a random length #130

Generating a vec of a random length #130

sunshowers commented Feb 11, 2019 •

edited

sunshowers commented Feb 11, 2019

Eh2406 commented Feb 11, 2019

sunshowers commented Feb 11, 2019 •

edited

Eh2406 commented Feb 11, 2019

sunshowers commented Feb 12, 2019

sunshowers commented Feb 15, 2019

AltSysrq commented Mar 24, 2019

sunshowers commented Mar 25, 2019

Generating a vec of a random length #130

Generating a vec of a random length #130

Comments

sunshowers commented Feb 11, 2019 • edited

sunshowers commented Feb 11, 2019

Eh2406 commented Feb 11, 2019

sunshowers commented Feb 11, 2019 • edited

Eh2406 commented Feb 11, 2019

sunshowers commented Feb 12, 2019

sunshowers commented Feb 15, 2019

AltSysrq commented Mar 24, 2019

sunshowers commented Mar 25, 2019

sunshowers commented Feb 11, 2019 •

edited

sunshowers commented Feb 11, 2019 •

edited