Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor contact rollups code to library, add tests #13388

Merged
merged 5 commits into from Feb 24, 2017

Conversation

jeremydstone
Copy link

This PR moves existing code out of cron jobs build_contact_rollups and sync_contact_rollups and into a new library, contact_rollups.rb. It also adds tests for the first step in the contact rollup process.

contact_rollups.rb is not new code - it is moved from the cron job code with very few changes. The major change is that I had to change it from using the Sequel database library to ActiveRecord in order to participate in the test framework and use FactoryGirl.

I have not moved the cron job code to use it yet. There is a little more work to do there but the cron code will be dropping down to just a call to the library. The cron job code is not running in production at the moment.

The test right now tests that the process works at all, picks up the right contacts, and that the analysis of ages taught (one of the most important fields) works. In future PRs I will be expanding this test to cover more of the rollup process output.

Copy link
Contributor

@aoby aoby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!

I have a number of style comments, but the overall structure looks good.

One of the tests is failing in circle too.

expected_column_values = expected_values_hash[email]
expected_column_values.each do |key, value|
column_index = COLUMN_NAME_TO_INDEX_MAP[key]
# puts ("email: #{email} actual: #{row[column_index]} expected: #{value}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented-out line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thx

expected_values.each do |expected_values_hash|
email = expected_values_hash.keys.first
email_sanitized = ActiveRecord::Base.sanitize(expected_values_hash.keys.first)
results = ActiveRecord::Base.connection.execute("select roles, ages_taught from pegasus_test.contact_rollups_daily where email=#{email_sanitized}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but it would be easier to read if this ActiveRecord::Base.connection.execute was aliased

def execute(sql)
  ActiveRecord::Base.connection.execute(sql)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashercodeorg out of curiosity, why not? It could be better named, such as execute_sql.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way. There is cost to the code reader to delve into the aliased function to see what it does. Since we don't do this routinely/consistently elsewhere, I am going to leave as is.


# Should now have expected_count records in daily rollups table, and still none in main rollups table
assert ActiveRecord::Base.connection.execute("select count(*) from pegasus_test.contact_rollups_daily").first[0] == expected_count
assert ActiveRecord::Base.connection.execute("select count(*) from pegasus_test.contact_rollups").first[0] == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assert_equal. Note the expected value comes first, so:

assert_equal expected_count, ActiveRecord::Base...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thx (throughout)

email = expected_values_hash.keys.first
email_sanitized = ActiveRecord::Base.sanitize(expected_values_hash.keys.first)
results = ActiveRecord::Base.connection.execute("select roles, ages_taught from pegasus_test.contact_rollups_daily where email=#{email_sanitized}")
assert results.count == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_equal 1, results.count

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thx

expected_column_values.each do |key, value|
column_index = COLUMN_NAME_TO_INDEX_MAP[key]
# puts ("email: #{email} actual: #{row[column_index]} expected: #{value}")
assert row[column_index] == value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_equal

# Create teacher 3 with one section and multiple students
@teacher3 = create(:teacher, email: "rolluptestteacher3@code.org")
@teacher3_section = create(:section, user: @teacher3)
create_sections_helper @teacher3, [[{age: 6}, {age: 10}, {age: 14}, {age: 10}]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the section on line 23 for? Line 24 will create a new section inside create_sections_helper that contains all the students.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps to test a teacher with a section with no students? At least, I see that as being a useful test to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a useful test, and it would make more sense to have a separate teacher with an empty section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was in there accidentally. But it is a good test to have an empty section, I added that in deliberately now.


private

def rollups_test_helper(expected_count, expected_values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method name is pretty generic. Since it builds the contact rollups and asserts a bunch of values, perhaps rename something like build_and_verify_contact_rollups.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thx.

{ "rolluptestteacher2@code.org": { "roles": "Teacher", "ages_taught": "6" }},
{ "rolluptestteacher3@code.org": { "roles": "Teacher", "ages_taught": "6,10,14" }},
{ "rolluptestteacher4@code.org": { "roles": "Teacher", "ages_taught": "9,10,11,14,15" }}
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

def self.insert_from_pegasus_forms
start = Time.now
log "Inserting contacts and IP geo data from pegasus.forms"
ActiveRecord::Base.connection.execute "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the test above, this repeated ActiveRecord::Base.connection.execute is a bit cumbersome and would be more readable aliased in a method.

# Verify expected values in contacts_rollup_daily
expected_values.each do |expected_values_hash|
email = expected_values_hash.keys.first
email_sanitized = ActiveRecord::Base.sanitize(expected_values_hash.keys.first)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it can't hurt, but I don't think you need to sanitize values that you provide in the test... unless you're worried about your own SQL injection attack ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can hurt, because it obfuscates what is happening in the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing this on general principle. Since you both had negative reactions to it, I have pulled it out. Once we start running static security code analysis routinely it will presumably light up on this and I may have to do something.

require 'cdo/contact_rollups'

PEGASUS_TEST_DB_NAME = "pegasus_#{Rails.env}"
COLUMN_NAME_TO_INDEX_MAP = { "roles": 0, "ages_taught": 1 }.freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quotes around roles and ages_taught are superfluous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thx

expected_values.each do |expected_values_hash|
email = expected_values_hash.keys.first
email_sanitized = ActiveRecord::Base.sanitize(expected_values_hash.keys.first)
results = ActiveRecord::Base.connection.execute("select roles, ages_taught from pegasus_test.contact_rollups_daily where email=#{email_sanitized}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. :(


private

def rollups_test_helper(expected_count, expected_values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide YARD comments for helper methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thx

class PardotTest < ActiveSupport::TestCase
def test_empty_contacts
# Test the rollup process with an empty database
rollups_test_helper 0, []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the comment, I still have no idea what the expectations are. Part of this is the rollups_test_helper name, as @aoby commented on. Part of this is the method signature. Maybe

build_and_verify_contact_rollups { expected_count: 0, expected_contacts: [] }

# Create teacher 3 with one section and multiple students
@teacher3 = create(:teacher, email: "rolluptestteacher3@code.org")
@teacher3_section = create(:section, user: @teacher3)
create_sections_helper @teacher3, [[{age: 6}, {age: 10}, {age: 14}, {age: 10}]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps to test a teacher with a section with no students? At least, I see that as being a useful test to have.

rollups_test_helper 0, []
end

def test_teachers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be split into multiple tests, suggested by the fact that the test can fail for any number of reasons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the rollup process operates on a large corpus of data, I am going to leave this as is. If I split it up, each test would be operating on a database with a small number of users (such as one). By generating a mildly interesting body of test fixtures, it demonstrates that we don't have something going on such as bad joins that will generate additional rows in the output. Things like that might not show up in the smaller data set of the individual things we are verifying later.

# Verify expected values in contacts_rollup_daily
expected_values.each do |expected_values_hash|
email = expected_values_hash.keys.first
email_sanitized = ActiveRecord::Base.sanitize(expected_values_hash.keys.first)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can hurt, because it obfuscates what is happening in the test.

assert ActiveRecord::Base.connection.execute("select count(*) from pegasus_test.contact_rollups").first[0] == 0

# Verify expected values in contacts_rollup_daily
expected_values.each do |expected_values_hash|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That this is so unreadable suggests the expected_value format is wrong. It would be less complex and more readable to have, e.g.,

expected_values = {
  "rolluptestteacher1@code.org": { "roles": "Teacher", "ages_taught": nil },
  "rolluptestteacher2@code.org": { "roles": "Teacher", "ages_taught": "6" },
  "rolluptestteacher3@code.org": { "roles": "Teacher", "ages_taught": "6,10,14" },
  "rolluptestteacher4@code.org": { "roles": "Teacher", "ages_taught": "9,10,11,14,15" }
}

and

expected_value.each do |email, expected_email_info|
  results = ActiveRecord::Base.connection.execute(
    "select roles, ages_taught from pegasus_test.contact_rollups_daily where email=#{email}"
  )
  assert_equal 1, results.count
  expected_email_info.each do |column, expected_column_value|
    assert_equal expected_column_value, results.first[COLUMN_NAME_TO_INDEX_MAP[column]]
  end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: This (as written and as suggested) runs a DB query per email. It might make sense to optimize the test to grab all the results, then compare all the results. Depending, it might make sense to have expected_values and results directly comparable, e.g., make it so that the assertion assert_equal expected_values, results is sufficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on the simplification, done, thx. On the test run time optimization - trying to batch the 5 DB queries into 1 only saves a handful of millisecond of test run time, not worth doing.

@jeremydstone
Copy link
Author

@aoby done, thx

@jeremydstone jeremydstone merged commit 133c048 into staging Feb 24, 2017
@jeremydstone jeremydstone deleted the contact_rollup_refactor branch February 24, 2017 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants