New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to generate unique values #232
Comments
Unique values are tricky thing... There are at least two issues which are not solved and currently not clear how to solve.
|
That's a good point that I didn't really consider. The intended use case I had for this was for generating small data sets of just a few values. For unique integers and longs, tracking the last generated value and incrementing it might work: private int uniqueInt = 0;
public int nextInt() {
uniqueInt += faker.random().nextInt(1, x);
return uniqueInt;
} This would raise another issue though, where it's possible for the value to overflow, so it might not be the best solution for large amounts of data.
The solution I made does handle infinite loops by timing out and throwing an exception if the unique value check takes longer than 10 seconds. But like I said above, the solution is mostly intended for small amounts of data where the user is mindful of the random data they are generating. |
+ one more question: why do all the methods use the same storage of unique methods? |
Since Long 42 is a different type than Integer 42, it would be possible for nextLong to return 42 after nextInt returned it. It was intentional for everything to share the same uniqueValueStore. My thinking was that everything returned from a method on faker.unique should be something that wasn't returned previously. Theoretically it should be possible to track return values based on the method that was called, but that would add extra complexity which I didn't see beneficial. |
I think it's an interesting idea, but it sounds quite brittle. I don't think there's any way to make this a reliable feature if you're generating large amounts of data. It's almost impossible to know how many unique values can come from a yml file, which complicates things. What's wrong with generating a large amount of data, put them in a set, and take the data you need after that? I have no objection against a feature like this, but I'm a bit hesitant to add a feature which "sometimes" works. |
I'll go ahead and close this issue. I don't think there's a good way to guarantee uniqueness with large amounts of data without running into issues with memory or throwing an exception. It seems like it's better to let the user decide how they want uniqueness handled. Thanks for the feedback! |
Currently there isn't a way to enforce that random values from faker are different. This is mainly an issue when writing tests with ids or keys that cannot be the same. The data produced by faker is usually unique enough, but there is still a small chance that tests will randomly fail if we're not careful.
My solution is to have a unique faker that keeps a store of every value that it has generated. It has a base method that takes in a supplier and ensures that the value from the supplier has not been generated before during the unique faker's lifespan. For example:
The store is kept at the unique faker level, so the uniqueness is only persisted during the lifespan of the faker object. If there are two different fakers they could potentially generate the same values.
Would there be any issues with having a faker like this? Here is the full implementation that I had in mind:
The text was updated successfully, but these errors were encountered: