-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample data generation #23558
Sample data generation #23558
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
include FactoryGirl::Syntax::Methods | ||
|
||
class SampleData | ||
TEST_ACCOUNT_PASSWORD = "00secret".freeze | ||
@@prng = nil | ||
|
||
# Returns a seeded random number generator for consistent test data | ||
def self.prng | ||
@@prng ||= Random.new(0) | ||
@@prng | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line is unnecessary. Line 9 will evaluate to this anyway:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, changed |
||
end | ||
|
||
# Creates sample data | ||
def self.seed | ||
raise "Should not be run outside of development" unless CDO.rack_env?(:development) | ||
|
||
# Create a test teacher | ||
teacher = create_teacher "testteacher@code.org", "TestTeacher Codeberg" | ||
|
||
# Create normal-sized and large sections for each of CSF, CSD and CSP | ||
|
||
create_section({teacher: teacher, name: "CSF 1", | ||
login_type: Section::LOGIN_TYPE_PICTURE, grade: 2, age_min: 7, | ||
age_max_inclusive: 9, script_name: "coursea-2018", num_students: 30, | ||
use_imperfect_results: true} | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ruby-style:
create_section(
teacher: teacher, name: "CSF 1",
login_type: Section::LOGIN_TYPE_PICTURE, grade: 2, age_min: 7,
age_max_inclusive: 9, script_name: "coursea-2018", num_students: 30,
use_imperfect_results: true
) Minor Ruby-style nit: prefer single quote strings when not using interpolation. I also tend to prefer splitting these long hashes into one key per line for readability. That's not necessary but it tends to make reading, comparing, and editing easier. This would be my ideal format: create_section(
teacher: teacher,
name: 'CSF 1',
login_type: Section::LOGIN_TYPE_PICTURE,
grade: 2,
age_min: 7,
age_max_inclusive: 9,
script_name: 'coursea-2018',
num_students: 30,
use_imperfect_results: true
) Same for the other calls below There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, all curly braces removed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indentations fixed. I tried one line per parameter but it explodes that section of code into multiple screenfuls. I left it with multiple parameters per line because I can see the whole sample case generation strategy on one screen. |
||
|
||
create_section({teacher: teacher, name: "CSF 2 (Large)", | ||
login_type: Section::LOGIN_TYPE_PICTURE, grade: 2, age_min: 7, | ||
age_max_inclusive: 9, script_name: "coursea-2018", num_students: 150, | ||
use_imperfect_results: true} | ||
) | ||
|
||
create_section({teacher: teacher, name: "CSD 1", | ||
login_type: Section::LOGIN_TYPE_EMAIL, grade: 7, age_min: 13, | ||
age_max_inclusive: 14, script_name: "csd1-2018", num_students: 30, | ||
use_imperfect_results: false} | ||
) | ||
|
||
create_section ({teacher: teacher, name: "CSD 2 (Large)", | ||
login_type: Section::LOGIN_TYPE_EMAIL, grade: 7, age_min: 13, | ||
age_max_inclusive: 14, script_name: "csd1-2018", num_students: 150, | ||
use_imperfect_results: false} | ||
) | ||
|
||
create_section ({teacher: teacher, name: "CSP 1", | ||
login_type: Section::LOGIN_TYPE_EMAIL, grade: 10, age_min: 14, | ||
age_max_inclusive: 16, script_name: "csp1-2018", num_students: 30, | ||
use_imperfect_results: false} | ||
) | ||
|
||
create_section ({teacher: teacher, name: "CSP 2 (Large)", | ||
login_type: Section::LOGIN_TYPE_EMAIL, grade: 10, age_min: 14, | ||
age_max_inclusive: 16, script_name: "csp1-2018", num_students: 150, | ||
use_imperfect_results: false} | ||
) | ||
end | ||
|
||
# will delete the teacher and all of the teacher's sections and students | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: comment style isn't consistent with the others: |
||
# and recreate. | ||
def self.create_teacher(email, name) | ||
# Delete any existing test data | ||
user = User.find_by_email_or_hashed_email(email) | ||
unless user.nil? | ||
user.sections.each do |section| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: The User.sections association is designated dependent destroy. Same with Section.followers. So, deleting one of these test users will (soft) delete all their sections and followers. It's worth a comment here to explain that this is to also (hard) delete the test users and to hard delete the section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, commented |
||
# Delete all students in each section | ||
section.followers.each do |follower| | ||
student_user = User.find_by_id(follower.student_user_id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unnecessary lookup. We can iterate section.students directly since the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thx, done |
||
raise "Not a test user" unless student_user.name.include? "Codeberg" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems dicey. We may override the test user's name, and what's to keep a real user from being named Codeberg? I'd prefer a more reliable way of differentiating these test users if we want to keep them apart. As a serialized property it will be stored in the existing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm mostly okay with this given the other defenses in place to only run this in development, and I'm not thrilled with the idea of adding a properties column for this purpose (it may not require a migration but it's still another test-specific support in production code). That said, I think the following would be an improvement:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're sure this will stay only in development, then it's ok but then I agree use a constant in this file as the name (or name prefix) and make it something less like a real name and different from the FactoryBot default of "User n Codeberg". If we think this could ever run on production, then it scares me to automatically hard delete users and I do want more protection even if it means having "test-specific support" in production. We could call it I'd like to see a unit test for this too - make sure we delete the users we expect and only them. |
||
UserGeo.where(user_id: student_user.id).destroy_all | ||
student_user.really_destroy! | ||
end | ||
# Delete each section | ||
section.really_destroy! | ||
end | ||
UserGeo.where(user_id: user.id).destroy_all | ||
# Delete the existing test teacher | ||
raise "Not a test user" unless user.name.include? "Codeberg" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above |
||
user.really_destroy! | ||
end | ||
# Create the test teacher | ||
create :teacher, {email: email, name: name, | ||
password: TEST_ACCOUNT_PASSWORD, terms_of_service_version: 1} | ||
end | ||
|
||
# Generate a random gender choice with reasonable distributions | ||
def self.random_gender | ||
val = prng.rand(100) | ||
gender = nil | ||
case val | ||
when 0..29 | ||
gender = "m" | ||
when 30..59 | ||
gender = "f" | ||
when 60..89 | ||
gender = nil | ||
when 90..94 | ||
gender = "n" | ||
when 95..99 | ||
gender = "o" | ||
end | ||
gender | ||
end | ||
|
||
# Create a test section | ||
# Options is a hash with expected values: | ||
# :teacher - User object of teacher | ||
# :name - section name | ||
# :login_type - Section::LOGIN_TYPE_xxx value of login type for section | ||
# :grade - grade for section | ||
# :age_min - minimum age to generate for students | ||
# :age_max_inclusive - max age (inclusive) to generate for students | ||
# :script_name - name of script to assign for section | ||
# :num_students - number of students to generate for section | ||
# :use_imperfect_results - if true, generate some less than perfect | ||
# results. (CSF allows imperfect results, CSD and CSP do not.) | ||
def self.create_section(options) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are all these options required? If so it might be better to use keyword arguments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thought about that, and yes, these are all required. |
||
script = Script.get_from_cache(options[:script_name]) | ||
level_count = script.script_levels.count | ||
|
||
# Create the section | ||
section = create :section, {user: options[:teacher], name: options[:name], | ||
login_type: options[:login_type], grade: options[:grade], | ||
script: script} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ruby-style: the curly braces aren't needed Also for convenience, you could use Rails' Hash.slice to form a subset hash: section = create :section, script: script, **options.slice(:teacher, :name, :login_type, :grade) Note: the double-splat ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, done |
||
|
||
# Create students in section | ||
(0..options[:num_students] - 1).each do | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why use an inclusive range from 0 to num students - 1, especially when we're not using the iterator? Remove the -1 and either iterate inclusively from 1 to num_students, or exclusively from 0 to num_students.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, did (1..num) |
||
# Choose random properties and create student | ||
age_min = options[:age_min] | ||
age_max_inclusive = options[:age_max_inclusive] | ||
|
||
age = prng.rand(age_max_inclusive + 1 - age_min) + age_min | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rand can take a range. age = prng.rand(age_min..age_max_inclusive) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, done |
||
gender = random_gender | ||
student_user = create :student, {age: age, gender: gender} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ruby-style: unnecessary Elsewhere too. I'm going to stop commenting on each one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, done throughout |
||
|
||
# Add student to section | ||
create :follower, {section: section, student_user: student_user} | ||
|
||
# Create random student progress. | ||
pct_skipped = prng.rand(15) | ||
pct_imperfect = options[:use_imperfect_results] ? prng.rand(40) + 25 + | ||
pct_skipped : 0 | ||
|
||
max_level = | ||
if prng.rand(100) < 90 | ||
(level_count.to_f * (prng.rand(0.6) + 0.2)).to_i | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth using a range for readability: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thx, done |
||
else | ||
# To simulate real-world data, some students have no progress | ||
0 | ||
end | ||
|
||
# Create progress for this student on each level, up to a random | ||
# max level | ||
current_level = 0 | ||
script.script_levels.each do |script_level| | ||
break if current_level == max_level | ||
|
||
# Roll the dice to decide if progress is completed, perfect, or | ||
# skipped for this level | ||
rand_val = prng.rand(100) | ||
best_result = | ||
if rand_val < pct_skipped | ||
nil | ||
elsif rand_val < pct_imperfect | ||
ActivityConstants::MINIMUM_PASS_RESULT | ||
else | ||
ActivityConstants::BEST_PASS_RESULT | ||
end | ||
|
||
# Save progress for this level if not skipping | ||
if best_result | ||
create :user_level, user: student_user, script_id: script.id, | ||
level_id: script_level.level_id, attempts: 1, | ||
best_result: best_result | ||
end | ||
|
||
current_level += 1 | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the p? Persistent?
Is this a common abbreviation? I'd prefer a more clear and descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a common abbreviation for Pseudorandom Number Generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess I should have googled that. How about changing the comment
s/random/pseudorandom
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had chosen
prng
also because Ruby docs use that variable name in their samples forRandom
: https://ruby-doc.org/core-2.2.0/Random.htmlBut it may be unnecessarily unconfusing. Changed to
rng
.