New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing randomness with faker is impossible #389

Closed
scheparev-moberries opened this Issue Jun 29, 2017 · 19 comments

Comments

Projects
None yet
5 participants
@scheparev-moberries

scheparev-moberries commented Jun 29, 2017

In the case when the fields are specified over faker providers, it's impossible to manage randomness because order of evaluation is not fixed (iterating a dict internally?).
Example:
Seed the random before of instantiation of

class CompanyFactory(factory.DjangoModelFactory):
    class Meta:
        model = Company

    name = faker.company
    status = 'APR'

with

import factory
faker = factory.faker.Faker._get_faker(locale='de_DE')
faker.random.seed(0)

returns the same result. But adding any other fuzzy field, for instance:

faker.phone_number

makes both fields to be filled in an unpredictable way. Maybe because the fields are evaluated in the dict iteration.
It would be nice to have a work around at least.

@rbarrois

This comment has been minimized.

Member

rbarrois commented Jun 29, 2017

Hi,

Maybe this is simply due to the shortened example you've pasted, but I think that we should do:

class CompanyFactory(factory.django.DjangoModelFactory):
    name = factory.Faker('company')

Here, factory.Faker will handle the connection with faker; the code you're using would use the same name for every company.

@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Jun 29, 2017

It was working sporadically without using a lazy attirubute, so the actual is

name = factory.LazyFunction(faker.company)

I'll check your notation

@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Jun 29, 2017

Still got random results with

class CompanyFactory(factory.DjangoModelFactory):
    class Meta:
        model = Company

    name = factory.Faker('company', locale='de')
    phone = factory.Faker('phone_number')
@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Jun 29, 2017

UPD: it only happens in my unit tests, when I try to instantiate the factory from the django console with seeding in one statement, the result persists. I got totally no idea what is happening in between, because in both cases I instantiate the factory right after the seeding.

@rbarrois

This comment has been minimized.

Member

rbarrois commented Jun 29, 2017

Thanks!

So this is indeed a bug; which version do you use?

@rbarrois rbarrois added the Bug label Jun 29, 2017

@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Jun 29, 2017

factory_boy==2.8.1

@rbarrois

This comment has been minimized.

Member

rbarrois commented Jul 30, 2017

I've just released version 2.9.0, can you check whether the bug is still there?

There have been many fixes to the faker-related code, and improvements to the core of factory_boy in that release.

Thanks!

@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Aug 30, 2017

Unfortunately the problem persists with the version 2.9.0

@rbarrois

This comment has been minimized.

Member

rbarrois commented Aug 30, 2017

Have you tried with version 2.9.2?

Can you paste a stripped-down unit test where you have the issue?

@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Aug 30, 2017

Model:

class Company(models.Model):
    name = models.CharField(blank=False, max_length=255)
    phone = models.CharField(default='', blank=True, max_length=40)
    about = models.TextField(default='', blank=True)
    street = models.CharField(default='', blank=True, max_length=255)
    postcode = models.CharField(default='', blank=True, max_length=14)
    website = models.URLField(default='', blank=True, max_length=255)

    created_at = CreationDateTimeField()

Factory and test:

import factory

faker = factory.faker.Faker._get_faker()


class TestFactory(factory.DjangoModelFactory):
    class Meta:
        model = Company

    name = factory.LazyFunction(faker.company)
    phone = factory.LazyFunction(faker.phone_number)
    about = factory.LazyFunction(faker.bs)
    street = factory.LazyFunction(faker.street_name)
    postcode = factory.LazyFunction(faker.zipcode)
    website = factory.LazyFunction(faker.url)
    created_at = factory.LazyFunction(faker.date_time_this_year)


class TestRandomCrash(TestCase):
    def test_random(self):
        faker = factory.faker.Faker._get_faker(locale='de_DE')
        faker.random.seed(0)
        print(TestFactory().name)

I've updated to 2.9.2, the behavior persists. Run the test multiple times, company names remains random

UPD: fixed faker_en not defined

@rbarrois

This comment has been minimized.

Member

rbarrois commented Aug 30, 2017

As stated above, this is not the supported way of interacting with faker from factory_boy.

Please try again with:

class TestFactory(factory.DjangoModelFactory):
    class Meta:
        model = Company

    name = factory.Faker('company')
    phone = factory.Faker('phone_number')
    about = factory.Faker('bs')
    street = factory.Faker('street_name')
    postcode = factory.Faker('zipcode')
    website = factory.Faker('url')
    created_at = factory.Faker('date_time_this_year')
@scheparev-moberries

This comment has been minimized.

scheparev-moberries commented Aug 30, 2017

Tried it, still the same

@joecridge

This comment has been minimized.

joecridge commented Dec 2, 2017

Seeing this also in 2.9.2: any factory using factory.Faker() for more than one attribute is non-deterministic when called in a test, but works fine in the Django shell.


Edit:

It appears that order in which the attributes are evaluated is not deterministic. For example, if I define a factory with a first_name and a last_name, and call build() on it 10 times, 50% of the time I get one list of (first_name, last_name) pairs, (first_name evaluated first for every object), and 50% of the time I get another (last_name evaluated first for every object).

It seems strange that the evaluation order is the same within a test run, but varies between test runs. This isn’t always the case with more complicated factories (e.g. with lazy attributes), but even then there seem to be ‘streaks’ where the evaluation order is the same for a set of objects before it switches to something completely different.

@joecridge

This comment has been minimized.

joecridge commented Dec 2, 2017

I’ve followed the issue back to utils.sort_ordered_objects:

def sort_ordered_objects(items, getter=lambda x: x):
    return sorted(items, key=lambda x: getattr(getter(x), OrderedBase.CREATION_COUNTER_FIELD, -1))

What happens here is that all the factory.Faker() attributes have the same creation counter, and so sorted just iterates over items dictionary to sort amongst them, resulting in unpredictable ‘sorted’ output.

@rbarrois I’m not sure what the fault is here: does sort_ordered_objects assume that the creation counters are unique (in which case the counter code is at fault), or is sort_ordered_objects supposed to handle repeated counter values deterministically (in which case its implementation needs to sort the corresponding keys e.g. alphabetically)?

@chongkim

This comment has been minimized.

chongkim commented Dec 12, 2017

I just cloned factory_boy and ran the test using make test (or python -m unittest tests). It came back with

Ran 377 tests in 0.770s

OK (skipped=1)

But when I ran python -m unittest tests.test_using, it came back with

======================================================================
FAIL: test_same_seed_is_used_between_fuzzy_and_faker_generators (tests.test_using.RepeatableRandomSeedFakerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chongkim/github/factory_boy/tests/test_using.py", line 2334, in test_same_seed_is_used_between_fuzzy_and_faker_generators
    self.assertEqual(students_1[0].two, students_2[0].two)
AssertionError: 'William Brown' != 'Mark Peterson'
- William Brown
+ Mark Peterson


----------------------------------------------------------------------
Ran 138 tests in 0.127s

FAILED (failures=1, skipped=1)```
@chongkim

This comment has been minimized.

chongkim commented Dec 12, 2017

I've fixed the problem in #438

@chongkim

This comment has been minimized.

chongkim commented Jan 25, 2018

I've made a PR. I noticed that some PR are years old. What do I need to do to get this into master? I'd hate to have it sit around for some step I neglected to do.

@rbarrois rbarrois closed this in 988d580 Feb 11, 2018

@rbarrois

This comment has been minimized.

Member

rbarrois commented Feb 11, 2018

Thanks for the analysis, and for @chongkim analysis.

Actually, the issue was not in the use of sorted declarations, but in the improper configuration of faker's random generator :)

@stefanjcollier

This comment has been minimized.

stefanjcollier commented Feb 11, 2018

Great work guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment