Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating a vec of a random length #130

Closed
sunshowers opened this issue Feb 11, 2019 · 8 comments
Closed

Generating a vec of a random length #130

sunshowers opened this issue Feb 11, 2019 · 8 comments
Labels
help-request This issue is asking for advice/help on using proptest

Comments

@sunshowers
Copy link
Contributor

sunshowers commented Feb 11, 2019

Hi~! Thanks for writing this library -- as someone that's used QuickCheck extensively, the Proptest shrinking strategy (with what I think of as "structural shrinking") makes a ton more sense and makes it a pleasure to write :)

I'm trying to figure out exactly how collection strategies work. In particular, for a fixed size provided to the generator, what's the difference between:

vec(any::<T>(), 1..size)

and

(1..size).prop_flat_map(|len| vec(any::<T>(), len))

Would generation and shrinking still continue to work?

The context is that I need to set up a cyclic graph with multiple kinds of nodes, and wanted to generate the number of elements in each vector of nodes separately from generating the elements themselves.

Thank you 💜

@sunshowers
Copy link
Contributor Author

Oh, I wanted to mention that I could probably use a long chain of prop_flat_maps for this, but there's like 8 or 9 kinds of nodes in the graph and that's super unwieldy :(

@Eh2406
Copy link

Eh2406 commented Feb 11, 2019

Non expert opinion:

Would generation and shrinking still continue to work?

Yes, but the performance will be very different. They both generate similarly, picking a number between 1 and size - 1 then filling a vec until it has that number of elements. Shrinking is very different. The vec( _ , 1..size) version knows that removing an item is a way to shrink the size. The prop_flat_map version does not. So when it tries to shrink the vec by deleting items it can't as it will no longer be len long. So it has to shrink the len directly, but the inner callback doesn't know what values it had before, so it gens a large number of vec( _ , len) until it finds a new failing case. So avoid prop_flat_map if shrinking performance is a concern.

I built the proptests for cargos resolver, witch involved making an DAG. I don't think I have enough of the context to give concrete advice, in your case. If you can share more about what are the Elements and what are the Nodes, I can try to pass on the advice I received.

@sunshowers
Copy link
Contributor Author

sunshowers commented Feb 11, 2019

@Eh2406 thank you for answering!

Yes, but the performance will be very different. They both generate similarly, picking a number between 1 and size - 1 then filling a vec until it has that number of elements. Shrinking is very different. The vec( _ , 1..size) version knows that removing an item is a way to shrink the size. The prop_flat_map version does not. So when it tries to shrink the vec by deleting items it can't as it will no longer be len long. So it has to shrink the len directly, but the inner callback doesn't know what values it had before, so it gens a large number of vec( _ , len) until it finds a new failing case. So avoid prop_flat_map if shrinking performance is a concern.

OK, yeah, that's bad. But on the other hand this might actually be necessary, since removing a node from a graph might produce an invalid graph unless any nodes that point to it are also removed.

I built the proptests for cargos resolver, witch involved making an DAG. I don't think I have enough of the context to give concrete advice, in your case. If you can share more about what are the Elements and what are the Nodes, I can try to pass on the advice I received.

I'm not at liberty to share the full context here, unfortunately. But here's what I can say:

  • Note that this is a cyclic graph, not a DAG.
  • There are several kinds of nodes, and three main categories:
    • Some kinds of nodes (I'll call them "base nodes") do not have any pointers out of them. These are easy to generate so they aren't the issue.
    • Some other kinds of nodes have pointers out to inherent nodes (only). These also aren't an issue.
    • The remaining kinds of nodes have a fairly complex nest of pointers among them and out to the other kinds of nodes.

The actual structure in terms of kinds of nodes and what nodes point to what is defined upfront.

If I could generate the vector lengths separately from the actual nodes themselves, I could do something like:

  • Generate base nodes + lengths of other kinds of nodes.
  • For each non-base kind of node, for each node within the kind, assign pointers to other kinds of nodes by picking a random value between 0 and the length of the other kind of node.

What do you think? (Also, could you link to the proptests you wrote? Thank you 💙)

@Eh2406
Copy link

Eh2406 commented Feb 11, 2019

The proptest for cargo are in resolve.rs and the generators are in support/resolver.rs. I may be the only person to try and read that code, so suggestions are very welcome. Including "I don't understand line number x, can you add a comment explaining it."

Fundamentally the generator has 2 parts, one that builds up the essential structure, and a prop_map that converts it into the correct types. The most useful trick is the proptest::Index type, it is designed to generate an index into a list before the len of the list is known. This lets the representation of the final graph be different from the essential graph. For cargo the real graph uses (Name, Ver) pares as a unique ID int a hashmap, but trying to generate that directly was very frustrating. So we generate a list of names, and an adjacency list of edges. Then two inforce the invariants we can either just ignore a row in the list that would break it, or fix that row. (For a DAG we make sure that the index of the pointe is greater than the index of the pointer. And flip if its not.)

Most of that may be besides the point as I think your suggestion will work with the proptest::Index for picking the random value.

@sunshowers
Copy link
Contributor Author

Ahh, I didn't know about Index -- looks promising, thanks!

@AltSysrq AltSysrq added the help-request This issue is asking for advice/help on using proptest label Feb 12, 2019
@sunshowers
Copy link
Contributor Author

Yep, once I had Index available it was pretty easy to get working. I generated vectors of intermediate data structures with Index instead of integer pointers, then transformed them into the final data structures once I had all the lengths available.

Thank you! Should I leave this open as a doc request? Making Index more prominent would be a huge help in such situations.

@AltSysrq
Copy link
Collaborator

Sorry for not following up on this. I'm going to close the issue itself since the question has been addressed.

Do you have any suggestions for how Index could have been made more prominent for you? I'm all for improving that but I also don't want to clutter the documentation.

@sunshowers
Copy link
Contributor Author

Possibly adding a note to https://docs.rs/proptest/0.9.1/proptest/strategy/trait.Strategy.html#method.prop_flat_map, though I do know that the documentation for it is already pretty long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help-request This issue is asking for advice/help on using proptest
Projects
None yet
Development

No branches or pull requests

3 participants