Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique constraint on Faker objects #251

Closed
bartdebever opened this issue Aug 29, 2019 · 1 comment
Closed

Unique constraint on Faker objects #251

bartdebever opened this issue Aug 29, 2019 · 1 comment
Labels
gotcha issues that new users encounter that should be looked at to improve new user experience wontfix

Comments

@bartdebever
Copy link
Contributor

Context

More often than not, I've ran into issues with using Bogus and tests where they could succeeed locally and randomly fail on the build server because of an unique constraint on the in memory database.

Any idea could be to allow the user of the library to say that a rule should only produce unique values. For example the methode signature could be

RuleFor(property, setter, isUnique=false);

Alternatives

As provided in #152 the UniqueIndex could be used to make some output data unique.
But this doesn't seem as intuitive as a more generalized solution as some generated values might not benefit from a UniqueIndex.

Has the feature been requested before?

152

If the feature request is approved, would you be willing to submit a PR?

Yes / No (Help can be provided if you need assistance submitting a PR)

@bchavez
Copy link
Owner

bchavez commented Aug 29, 2019

Hi Bart,

My initial thoughts are, while a great idea; though, in practice, uniqueness is a complex topic and sometimes ambiguous.

  • What does uniqueness mean to you?
  • What should uniqueness mean to us?
  • What is uniqueness in the context of some object T?

Even if there was a way for developers to identify what properties of T make T unique, what about the composition of two uniquely tagged properties?

  • .RuleFor(T.PropA, isUnique = true)
  • .RuleFor(T.PropB, isUnique = true)

Is it sufficient that T.PropA and T.PropB are individually unique on their own? Maybe. Or, should the compound properties, T.PropA and T.PropB, taken together be considered a compound unique key in the sequence generation of T objects? Perhaps.

And, user-defined types:

  • What does it mean to mark some property T.Cat where the type of property .Cat is some user-defined class Cat? How do we interpret .RuleFor(T.Cat, isUnqiue = true)?

Next, let's consider the implementation details. As far as I know, there are two ways to go about implementing uniqueness:

  1. Keep track of all the objects you've generated thus far so that newly generated T can be checked against some existing list of already generated objects of T. Imagine the following pseudo-code:

    let n = number of T to generate
    for i from 0 to n 
       t = generate(T);
       for v in trackingList[from 0 to i-1]
          if t unique to v then continue
          else fail
       trackingList.Add(t)
    

    At first glance, the runtime complexity of such an implementation might not perform well. Is it roughly O(n^2)? I'm not super good at calculating Big-O, so please correct me if I'm wrong.

    LINQPad_3534
    Iteration plot from n=2,000 to n=120,000 in 2K increments (x-axis) vs time it takes in seconds (y-axis).

    We could run into some big problems when we start looking at 1M generations. 1M generations is a realistic use case and starting to be somewhat common for Bogus. See Faker.Person.FirstName seems really slow compared to Faker.Name.FirstName #248, just as recently.

    The moral of this implementation story is keeping track of state is hard.

  2. The other alternative implementation for uniqueness with O(n) order time complexity is the approach we take now. We put the onus on the developer and punt the uniqueness problem on the developer to solve. This way Bogus doesn't make any assumptions of what uniqueness means to anyone. If you need uniqueness, then you'll need to figure out how to make your objects unique for your particular scenario. As the case with Apply Unique Column Constrain #152, using f.IndexGlobal and f.IndexFaker part of your .RuleFor generation expression you can create uniqueness without having to track objects in a lookup list. I think the best we can do is give developers some state variables (f.IndexGlobal and f.IndexFaker) that can help them derive uniqueness that is defined in their own terms for use in rule expressions.

Finally, concepts of uniqueness, object identity, database identity come up in ORMs all the time. So, I don't think this is a problem Bogus should attempt to solve wholesale.

Last but not least, getting theoretical, if I recall correctly, eventually all sequences derived from a pseudo-random number generator (PRNG) eventually repeat. So, we might be hitting our heads against theoretical and mathematical computing limits when it comes to deriving uniqueness from deterministic PRNGs. As one might say, a fruitless endeavor.

So, those are my thoughts... please feel free to carry on the conversation after I close the issue.

Thanks,
Brian

🍂 🍃 Distance - Falling (ft Alys Be)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gotcha issues that new users encounter that should be looked at to improve new user experience wontfix
Projects
None yet
Development

No branches or pull requests

2 participants