Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped / Thread safe verion of Randomizer.Seed = new Random(SEED); #100

Closed
oliverkane opened this issue Nov 2, 2017 · 6 comments · Fixed by #101
Closed

Scoped / Thread safe verion of Randomizer.Seed = new Random(SEED); #100

oliverkane opened this issue Nov 2, 2017 · 6 comments · Fixed by #101

Comments

@oliverkane
Copy link

Hi there. I saw that there's a static property which can be used for deterministic results.

int SEED = 12345;
Randomizer.Seed = new Random(SEED);

However, if I use this approach with multiple threads, I get inconsistent results because my threads are asyncronous. Is there a method that I'm overlooking which would allow me a locally scoped version of this property? I'd very much like to be able to have 100+ concurrent operations going on and still get the benefit of determinism.

Thanks for your time! Love the library either way 👍

@bchavez
Copy link
Owner

bchavez commented Nov 2, 2017

Hi @oliverkane ,

This is a good idea. We could probably have .Seed(int) hang off new Faker<T>() for scoped seeds similar to the way we have .StrictMode() in a static (global) and method (per faker instance).

//Global static
Randomizer.Seed = new Random(8675309);

//Per Faker instance.
var testOrders = new Faker<Order>()
    .Seed(12345)
    .RuleFor(o => o.OrderId, f => orderIds++)
    .RuleFor(o => o.Item, f => f.PickRandom(fruit))
    .RuleFor(o => o.Quantity, f => f.Random.Number(1, 10));

I think it should be possible but will take some refactoring. I'll try to take a look later today.

🚗 🚙 "Let the good times roll..."

@oliverkane
Copy link
Author

That's pretty much a dream for me, since that means that you can chain a global seed to "child" seeds by

var masterFaker = new Faker().Seed(12345);
var childFaker1 = new Faker().Seed(masterFaker.Random.Number(Int32.MaxValue);
var childFaker2 = new Faker().Seed(masterFaker.Random.Number(Int32.MaxValue);
var childFaker3 = new Faker().Seed(masterFaker.Random.Number(Int32.MaxValue);

Which would allow for pretty complex, but still deterministic results in even the most complex of situations. Should even work across app boundaries that way too.

Your response warms my heart, because I was digging around the source considering making a fork, but I'm waaay out of my depth.

@bchavez
Copy link
Owner

bchavez commented Nov 3, 2017

Hey @oliverkane ,

Just a quick update on this. I was able to get pretty far on this tonight only 7 failing tests out of 251. Not too bad... The only hold up are these static WordFunctions in Randomizer:

public static class WordFunctions
{
public static List<Func<string>> Functions = new List<Func<string>>();
static WordFunctions()
{
var commerce = new Commerce();
var company = new Company();
var address = new Address();
var finance = new Finance();
var hacker = new Hacker();
var name = new Name();
Functions.Add(() => commerce.Department());
Functions.Add(() => commerce.ProductName());
Functions.Add(() => commerce.ProductAdjective());
Functions.Add(() => commerce.ProductMaterial());
Functions.Add(() => commerce.ProductName());
Functions.Add(() => commerce.Color());
Functions.Add(() => company.CatchPhraseAdjective());
Functions.Add(() => company.CatchPhraseDescriptor());
Functions.Add(() => company.CatchPhraseNoun());
Functions.Add(() => company.BsAdjective());
Functions.Add(() => company.BsBuzz());
Functions.Add(() => company.BsNoun());
Functions.Add(() => address.StreetSuffix());
Functions.Add(() => address.County());
Functions.Add(() => address.Country());
Functions.Add(() => address.State());
Functions.Add(() => address.StreetSuffix());
Functions.Add(() => finance.AccountName());
Functions.Add(() => finance.TransactionType());
Functions.Add(() => finance.Currency().Description);
Functions.Add(() => hacker.Noun());
Functions.Add(() => hacker.Verb());
Functions.Add(() => hacker.Adjective());
Functions.Add(() => hacker.IngVerb());
Functions.Add(() => hacker.Abbreviation());
Functions.Add(() => name.JobDescriptor());
Functions.Add(() => name.JobArea());
Functions.Add(() => name.JobType());
}
}

All these WordFunctions rely on the global static Seed and since they are static, the static constructor only runs once and (in my current refactoring) captures the global Seed state which might change based on a localized/scoped seed.

I need a bit more time to think and chew on the overall architecture here. The Bogus.Extensions.* namespace uses statics too. You're right on point, this is quite a difficult change. I always suspected a bad code smell using all these statics in Bogus and working on this issue really brings them into focus.

Rest assured, I still think your idea of setting isolated seed values on Faker<T> is a good idea. I just need a bit more time to mull over these changes, sleep on them, and explore a few more passes at refactoring.

Also, if you (or anyone on the watchlist) have suggestions, I'd love to hear alternate points of views on the subject and any suggestions on design/architectural patterns would be cool.

Just going to jot down some random ideas that are floating in my head:

  • Allow passing in an optional int seed value alongside everywhere we ask for locale. This way, we can use the int seed to generate a localized Seed for Randomizer. This is working out okay so far, but my feeling is it kind of pollutes the constructors everywhere on DataSets. Not totally happy with it.
  • Use ThreadLocal<T>, pro: avoids having to refactor all the constructors for (string locale, int? seed), cons: performance can suffer and is a bit automagical having variables like this.
  • Drop the global Seed, and focus on localized seeds only. Maybe the code reduces to something more manageable and simplified (or could increase in complexity). Still, need to explore this.

It's 3AM and need to get some sleep. Good night 💤 🛌 :)

🌙 🌠 "Nothing good happens past 2am..."

@oliverkane
Copy link
Author

To respond to bullet points before I start my day:

  • I'm generally opposed to modifying constructors, and have taken more to public properties. This is a stylistic choice of my own which basically boils down to, "Do I strictly need this at instantiation? If not, prefer MyClass{ Prop = val } over new MyClass(val) In .NET land, it seems conventional, though not by any means a rule, for constructor values to live for the lifetime of the object.
  • ThreadLocal may confused other consumers of your library if they us C#'s Events, as events, while having a look of multi-threadedness, actually run in the same thread. This wouldn't provide them any isolation, and since ordering of the firing Delegates is not guarenteed to be preserved, ruin deterministic results in cases with many wired events/dynamic sets of them.
  • I like this approach the most. If one still needs a static, it's not terribly hard to construct your own one liner class to hold that value and pass it in, whichever other method is chosen.

I'll have more when I have some weekend time.

@bchavez
Copy link
Owner

bchavez commented Nov 4, 2017

Hey @oliverkane ,

Good news! I was able to arrive at a decent implementation of this. The solution, I think is two part: 1) flowing the randomizer/localized seed across the entire internal dependency graph and 2) having a .Clone() feature.

  • Re: 1) -- Some data sets have dependencies on other data sets. IE: Internet dataset requires the Name dataset to generate some internet names and such. Originally, this was a problem because the randomizers were isolated from one another because each data set had its own randomizer that drew values from the global seed. The problem grew larger at the Faker facade level because at Faker facade level, we prepare all the data sets in one super Faker class ie:Name, Address,Company, Internet etc... so they can be used inside f => f.Address rules ... so you can imagine, anytime the seed is changed, a whole graph data sets and their dependents need to be notified when the seed changes. Especially complicating the issue is a static global seed in the mix of all this.

    So, I removed the global Randomizer.Seed and just focused on isolating all the randomizers. This allowed me to get a bearing on the randomizer flow. I implemented a mini "publisher" pattern SeedNotifer<T> that is used to Flow [1], [2] localized randomizer to child dependents.

    Once all instances were isolated and captured correctly, the Bogus.Extensions.* and object models like Person were worked on.

    Lastly, I added the global static randomizer back. No more statics internally anywhere and we're still able to keep the existing behavior without major breaking changes. The current design does not pollute the constructors either. So, pretty much all the public APIs are left intact as-is. Really cool.

  • Re 2) -- The final peice of the puzzle was the need for cloning semathics. Over time, cloning Faker<T> was something that's cropped up here and there in the past but I never got around to it. You can now Faker<T>.Clone() new instances of any faker. So, now the following becomes possible:

[Fact]
public void clone_has_different_rules()
{
   var rootFaker = new Faker<Examples.Order>()
      .UseSeed(88)
      .RuleFor(o => o.OrderId, f => f.IndexVariable++)
      .RuleFor(o => o.Quantity, f => f.Random.Number(1, 3))
      .RuleFor(o => o.Item, f => f.Commerce.Product());

   var cloneFaker = rootFaker.Clone()
      .RuleFor(o => o.Quantity, f => f.Random.Number(4, 6));

   var rootOrder = rootFaker.Generate();
   var clonedOrder = cloneFaker.Generate();

   rootOrder.Quantity.Should()
      .BeGreaterOrEqualTo(1).And
      .BeLessOrEqualTo(3);

   clonedOrder.Quantity.Should()
      .BeGreaterOrEqualTo(4).And
      .BeLessOrEqualTo(6);
}

And for the grand finale, parallel determinism with Faker<T>:

[Fact]
public void parallel_determinism()
{
   var orderFaker = new Faker<Examples.Order>()
      .RuleFor(o => o.OrderId, f => f.IndexVariable++)
      .RuleFor(o => o.Quantity, f => f.Random.Number(1, 3))
      .RuleFor(o => o.Item, f => f.Commerce.Product());

   var orders = ParallelEnumerable.Range(1, 5)
      .Select(threadId =>
         orderFaker
            .Clone()
            .UseSeed(88)
            .Generate(4).ToArray()
      ).ToArray();

   foreach( var arrayOf4 in orders )
   {
      CheckSequence(arrayOf4);
   }
}
private void CheckSequence(Examples.Order[] items)
{
   items.Select(i => i.Item)
      .Should().Equal("Tuna", "Pants", "Shoes", "Soap");

   items.Select(i => i.Quantity)
      .Should().Equal(3, 1, 3, 2);
}

The relevant tests are in Issue100.cs and CloneTests.cs

This is pretty badass if you ask me. 🎸 😎 And way fun to implement too. Pretty happy about it and I think it feels just about right in terms of maintenance. I don't think I'm willing to go much more beyond this tho since this does really push Bogus right to the edge of the API "getting too complex" imho; although deprecating the "global seed" would bring it back in line a bit, but would also break a lot of code out in the wild. We'll leave that one static Randomizer.Seed seed for now.

I'll clean up the implementation tomorrow and probably merge bit later this weekend. Need some finishing touches on the API and need to break out some classes into their own files. But the meat of it looks pretty good. 🍖

5AM..! Yeow. Time flies when you haven' fun! Time for sleep! 💤 🛌

💼 👔 "Taking care of business every day... Taking care of business every way..."


EDIT
Everthing has been merged and published in Bogus v20: https://www.nuget.org/packages/Bogus/20.0.1

@joost00719
Copy link

joost00719 commented May 17, 2022

Are there plans to add the UseSeed(...) method to the generic Faker too? (Bogus.Faker, not Bogus.Faker)

EDIT:
There is a way to use a seed in the generic Bogus.Faker
https://stackoverflow.com/a/68799519/7081176

Example usage:

        public AbstractFakeAsyncRepository(int seed)
        {
            Rng = new Random(seed);
            Faker = new Faker();
            Faker.Random = new Randomizer(seed);
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants