New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample data generation #23558
Sample data generation #23558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me - will definitely be super useful!
dashboard/lib/sample_data.rb
Outdated
) | ||
end | ||
|
||
# will delete the teacher and all of the teacher's sections and students |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: comment style isn't consistent with the others:
Delete the teacher and all of the teacher's sections...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I have a bunch of small style and readability suggestions, and one concern about the potential to delete the wrong users (with a suggestion).
dashboard/lib/sample_data.rb
Outdated
# Returns a seeded random number generator for consistent test data | ||
def self.prng | ||
@@prng ||= Random.new(0) | ||
@@prng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is unnecessary. Line 9 will evaluate to this anyway:
irb(main):002:0> defined? foo
=> nil
irb(main):003:0> foo ||= 'bar'
=> "bar"
irb(main):004:0> foo
=> "bar"
irb(main):005:0> foo = 'another value'
=> "another value"
irb(main):006:0> foo ||= 'bar'
=> "another value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, changed
dashboard/lib/sample_data.rb
Outdated
@@prng = nil | ||
|
||
# Returns a seeded random number generator for consistent test data | ||
def self.prng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the p? Persistent?
Is this a common abbreviation? I'd prefer a more clear and descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a common abbreviation for Pseudorandom Number Generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess I should have googled that. How about changing the comment s/random/pseudorandom
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had chosen prng
also because Ruby docs use that variable name in their samples for Random
: https://ruby-doc.org/core-2.2.0/Random.html
But it may be unnecessarily unconfusing. Changed to rng
.
dashboard/lib/sample_data.rb
Outdated
login_type: Section::LOGIN_TYPE_PICTURE, grade: 2, age_min: 7, | ||
age_max_inclusive: 9, script_name: "coursea-2018", num_students: 30, | ||
use_imperfect_results: true} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby-style:
- The curly braces aren't needed. For that matter neither are the parentheses, but they're helpful for multi-line declarations like this.
- The first arg should be on a new line and the final end parenthesis should be unindented one level. I'm surprised Rubocop didn't complain about this.
create_section(
teacher: teacher, name: "CSF 1",
login_type: Section::LOGIN_TYPE_PICTURE, grade: 2, age_min: 7,
age_max_inclusive: 9, script_name: "coursea-2018", num_students: 30,
use_imperfect_results: true
)
Minor Ruby-style nit: prefer single quote strings when not using interpolation.
I also tend to prefer splitting these long hashes into one key per line for readability. That's not necessary but it tends to make reading, comparing, and editing easier. This would be my ideal format:
create_section(
teacher: teacher,
name: 'CSF 1',
login_type: Section::LOGIN_TYPE_PICTURE,
grade: 2,
age_min: 7,
age_max_inclusive: 9,
script_name: 'coursea-2018',
num_students: 30,
use_imperfect_results: true
)
Same for the other calls below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, all curly braces removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentations fixed. I tried one line per parameter but it explodes that section of code into multiple screenfuls. I left it with multiple parameters per line because I can see the whole sample case generation strategy on one screen.
dashboard/lib/sample_data.rb
Outdated
# Delete any existing test data | ||
user = User.find_by_email_or_hashed_email(email) | ||
unless user.nil? | ||
user.sections.each do |section| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note:
The User.sections association is designated dependent destroy.
Same with Section.followers.
So, deleting one of these test users will (soft) delete all their sections and followers. It's worth a comment here to explain that this is to also (hard) delete the test users and to hard delete the section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, commented
dashboard/lib/sample_data.rb
Outdated
user.sections.each do |section| | ||
# Delete all students in each section | ||
section.followers.each do |follower| | ||
student_user = User.find_by_id(follower.student_user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary lookup. We can iterate section.students directly since the follower
here is only used to find the student. That will do one sql query with a join, rather than an initial query plus one each to find all the students.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, done
dashboard/lib/sample_data.rb
Outdated
# Create the section | ||
section = create :section, {user: options[:teacher], name: options[:name], | ||
login_type: options[:login_type], grade: options[:grade], | ||
script: script} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby-style: the curly braces aren't needed
Also for convenience, you could use Rails' Hash.slice to form a subset hash:
section = create :section, script: script, **options.slice(:teacher, :name, :login_type, :grade)
Note: the double-splat (**
) is needed to convert the sliced hash into keyword args so it can be combined with the previous script:
keyword argument. It wouldn't be needed otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done
dashboard/lib/sample_data.rb
Outdated
script: script} | ||
|
||
# Create students in section | ||
(0..options[:num_students] - 1).each do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use an inclusive range from 0 to num students - 1, especially when we're not using the iterator? Remove the -1 and either iterate inclusively from 1 to num_students, or exclusively from 0 to num_students.
[development] dashboard > num = 5
=> 5
[development] dashboard > (0..num - 1).map {|n| n} # unneessary math
=> [0, 1, 2, 3, 4]
[development] dashboard > (1..num).map {|n| n} # inclusive from 1
=> [1, 2, 3, 4, 5]
[development] dashboard > (0...num).map {|n| n} # exclusive from 0
=> [0, 1, 2, 3, 4]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, did (1..num)
dashboard/lib/sample_data.rb
Outdated
age_min = options[:age_min] | ||
age_max_inclusive = options[:age_max_inclusive] | ||
|
||
age = prng.rand(age_max_inclusive + 1 - age_min) + age_min |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand can take a range.
How about?
age = prng.rand(age_min..age_max_inclusive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done
dashboard/lib/sample_data.rb
Outdated
|
||
age = prng.rand(age_max_inclusive + 1 - age_min) + age_min | ||
gender = random_gender | ||
student_user = create :student, {age: age, gender: gender} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruby-style: unnecessary {}
s
Elsewhere too. I'm going to stop commenting on each one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done throughout
dashboard/lib/sample_data.rb
Outdated
|
||
max_level = | ||
if prng.rand(100) < 90 | ||
(level_count.to_f * (prng.rand(0.6) + 0.2)).to_i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth using a range for readability: prng.rand(0.2..0.8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done
@aoby how do these changes look? |
dashboard/lib/sample_data.rb
Outdated
user.sections.each do |section| | ||
# Hard-delete all students in each section. | ||
section.students.each do |student_user| | ||
raise "Not a sample student" unless student_user.name =~ SAMPLE_STUDENT_NAME_REGEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding some identifiable information in the error, like student id or email.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done
dashboard/lib/sample_data.rb
Outdated
UserGeo.where(user_id: user.id).destroy_all | ||
# Delete the existing test teacher | ||
raise "Not a sample teacher" unless user.name.eql? SAMPLE_TEACHER_NAME | ||
raise "Not a sample teacher" unless user.email.eql? SAMPLE_TEACHER_EMAIL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx done
# Choose random properties and create student | ||
age_min = options[:age_min] | ||
age_max_inclusive = options[:age_max_inclusive] | ||
current_student += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this will reset the count for each section, reusing the student names between sections. Names don't have to be unique so that's ok but it might be clearer / easier to distinguish if we increase the number for all students in the seed (across all sections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I did this deliberately because it seems less confusing to look at sections this way (with numbering within sections). If we prefer it "globally numbered" we can make that change later.
# and recreate. Sections and followers would be soft-deleted by | ||
# dependency when we delete the teacher; but to not leave a trail of | ||
# old test data behind, we explictly hard-delete. | ||
def self.create_teacher(email, password, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since deleting happens in this method that could conceivably in the future be called from anywhere, how about duplicating the
raise "Should not be run outside of development" unless CDO.rack_env?(:development)
check at the top of this method too, just to be paranoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see you've done this down on line 82. I'd pull it up here - as currently written, you could hit the hard-delete on a section without raising if that section had no students.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added another one at top of method. I'd like to have defense in depth as the code changes in the future, perhaps inadvertently gets copied/moved into other contexts, etc.
dashboard/lib/sample_data.rb
Outdated
# Delete any existing test data | ||
user = User.find_by_email_or_hashed_email(email) | ||
unless user.nil? | ||
user.sections.each do |section| | ||
# Hard-delete all students in each section. | ||
section.students.each do |student_user| | ||
raise "Not a sample student" unless student_user.name =~ SAMPLE_STUDENT_NAME_REGEX | ||
raise "Not a sample student - #{student_user.name}" unless student_user.name =~ SAMPLE_STUDENT_NAME_REGEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name is not unique. Why not provide something that we can identify the record with like id or hashed_email?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is not unique, but it IS identifiable. The only way we could accidentally delete, say, prod data is if someone named a user "TestStudent[#] Codeberg" (AND other code here somehow went berzerk and found it). I think is is a sufficiently robust cross-check for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the error message, not the check. Getting an error "Not a sample student - Bobby Tables" is not that helpful if we want to find which student that is. I want the error message to have enough context to debug this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I don't think this really satisfies my earlier request
dashboard/lib/sample_data.rb
Outdated
raise "Not a sample teacher" unless user.name.eql? SAMPLE_TEACHER_NAME | ||
raise "Not a sample teacher" unless user.email.eql? SAMPLE_TEACHER_EMAIL | ||
unless (user.name.eql? SAMPLE_TEACHER_NAME) && (user.email.eql? SAMPLE_TEACHER_EMAIL) | ||
raise "Not a sample teacher - #{user.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name is not unique. Why not provide something that we can identify the record with like id or email?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do check for email here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. I'm referring to the error message
This PR adds the ability for developers to easily generate some representative test data useful to develop features against.
Developers can now run the command:
dashboard/bundle exec rake seed:sample_data
to create sample data.Presently, the sample data created is a single test teacher (
testteacher@code.org
); a normal-sized (30) and large (150) section of each of CSF, CSD and CSP; test students to fill the sections with a distribution of ages and genders; and student progress for all students with a reasonable distribution of completions, results, skips and no progress. This set of sample data is most useful for teacher feature scenarios. In the future, we could add additional sample data for other scenarios such as PLC.The sample data will be persistent on the developer's machine unless some action is taken separately to delete it. The sample data creation process is idempotent; if you re-run the script, it will delete any existing sample data and recreate it. So as we add to and change the sample data, developers can update to the latest simply be re-running the script.
Presently this is intended to be run only in the development environment (and will raise if run outside the development environment). If we find it useful, in the future we could consider automatically populating test environments with sample data.