Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Usernames are not random enough -- getting collisions in my specs #19

Open
d11wtq opened this Issue · 9 comments

6 participants

@d11wtq

This happens very sporadically, but in tests I'm using ffaker to generate random usernames. I was previously just using ActiveSupport::SecureRandom.hex(16), but a colleague convinced me that ffaker was the way to go, so I switched and have had much success, besides this single issue.

In specs where I needs to create two or more users, sometimes, but not always (read, mostly not), ffaker generates the same username twice, which causes my specs to fail due to duplicate key conflicts in the database. Re-running the test suite always corrects the issue. My database is correctly cleaned between examples... this issue is localized to within a single example/test.

I think a possible solution (and maybe it actually exists but I don't see it), is for ffaker to strike out already used usernames, so that the same one cannot be generated twice, then provide some method to reset ffaker's dictionary so that all struck-out entries are released for re-use. This way one can just reset ffaker in their setup/teardown logic.

@EmmanuelOga
Owner

Hello,

The reason you get duplicates is because ffaker works out of random indexes to arrays of data. Since we want the names to be random and not always generate the same data each time you start getting names, we use Kernel's rand method to get the index to the arrays, and so sometimes you might get duplicated values.

One easy 'fix' for this is to create a helper method to store names already generated. I've using sham.rb from an old version of the machinist gem to this purpose:

https://gist.github.com/1008143

require 'sham'
require 'ffaker'

Sham.define do
  email { Faker::Internet.email }
end

1.upto(10) do
  puts Sham.email
end

I'm thinking it might be useful to have Sham inside ffaker, wdyt?

@rstacruz
Collaborator

+1 for a built-in Sham inside ffaker

@d11wtq

+1 for having it built-in, I agree. I notice that Machinist 1 requires that the developer uses Sham, while Machinist 2 removes that requirement. I wonder how they get around it. I'm using DataMapper so I'm using Machinist 1, without Sham... and now I understand why Sham is needed, thanks ;) You could probably just have a Faker::Base.reset! call, or something of that nature to simplify things I guess. Might knock a few points off those performance numbers though ;)

@rstacruz
Collaborator

By the way, I think if this was to be built-into FFaker, it has to be optional for the obvious performance hit just to answer a case that's probably only meaningful 5% of the time.

Maybe this can be done with something like:

  1. Faker::Internet.ensure_uniqueness! which will wrap all of the given generator's methods with a sham shim (heh), or

  2. Implement Faker::Unique (with a const_missing hook) so you can just use Faker::Unique::Internet.email

@chulkilee

I don't think (f)faker are responsible for creating unique values.

If you need to create unique username in tests, you can do it easily with adding sequence (e.g. FactoryGirl sequence)

Generating not enough random values is quite different issue - that can be a problem, but I don't think your description is that case.

@deXterbed

@chulkilee i don't think using sequence with names makes any sense, meaningful names is the reason anyone started using ffaker in the first place. Why not just maintain a list of used names to keep up with the uniqueness constraints?

@chulkilee

I do think username with sequence is meaningful enough for testing.

It's not "meaningless" username as much as "user1", "user2". In the real world people are experiencing collision of usernames, aren't they? And websites suggest adding random unique number (not sequential) at the end in that case (e.g. google)

I don't think adding sham in ffaker is a good idea. Generally it's good to have the single responsibility for a library. For example I'm using factory_girl (which can be used in the place of sham).

@EmmanuelOga
Owner

Sadly I don't use ffaker myself very much these days and putting some time aside to work on it has not been the first thing in my mind for a while.

But I always have in the back burner the idea of removing randomness from ffaker and given each unique ffaker permutation a unique sequence number. I think this would be a great feature. The API would look something like:

def (Faker::Name).name(permutation_id = random(max_name_permutations))
blah...
end

Rewriting each generator to accept a permutation id would be a lot of effort... Once done, if you needed unique names you would be able to pass an incrementing number to each method:

Faker::NameXX.fullname(1) # Tony Kamo
Faker::NameXX.fullname(2) # Anita Lahuerfanita
Faker::NameXX.fullname(3) # Carozo Ynarizota
Faker::NameXX.fullname(1) # Tony Kamo
# etc...

A checksum to int mod max_permutation_number could be used to map names to names in funny ways

Faker::NameUS.fullname("Emmanuel Oga".hash % NameUS::FULLNAME_COUNT) # John Doe

:smile:

One MAJOR advantage is that the test suite would be a lot cleaner and less flaky!

@gmile

@EmmanuelOga would be really useful! Just stumbled upon this "buggy" behaviour as well...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.