Re-enable test: Facilitator View of dashboard is as expected #35775

islemaster · 2020-07-10T23:14:49Z

Enables the eyes test disabled in #35749, and fixes a performance issue that we believe caused the flakiness in the first place.

Which test are we talking about?

Here is the scenario in question:

code-dot-org/dashboard/test/ui/features/plc/pd/dashboard_view.feature

Lines 6 to 14 in 4008770

    
           Scenario: Facilitator View of dashboard is as expected 
        
             Given I am a facilitator with started and completed courses 
        
             And I am on "http://studio.code.org/pd/workshop_dashboard" 
        
             And I wait to see element with ID "endedWorkshopsTable" 
        
             And I wait to see element with ID "inProgressWorkshopsTable" 
        
             And I wait to see element with ID "notStartedWorkshopsTable" 
        
             And I open my eyes to test "facilitator dashboard" 
        
             And I see no difference for "viewing facilitator dashboard" 
        
             And I close my eyes

At first glance, there's not a whole lot going on here, except I suspect that first step I am a facilitator with started and completed courses is doing the heavy lifting. We'll come back to that.

How often does it fail?

Here's the last 30 days of Sauce Labs' history on this test, from our test environment. You can see that the test gradually gets more flaky over time, then suddenly stops failing two days ago.

The test has been passing consistently in our Drone environment. This suggested that our failures were somehow related to persistent state on the test environment, which wouldn't be a problem on Drone. Given that it recently started succeeding again, maybe we didn't need to skip it? Still, it seems like a good idea to investigate, since it got so bad that it was only passing 20% of the time.

How does it fail?

We looked at several sample failures and they are very consistent. Here's a sample cucumber log:

When you open the Sauce Labs test log, you are greeted with this message:

The last cucumber step in the SL log is I am a facilitator with started and completed courses and the last selenium command is this execute_async which successfully creates a teacher account:

The video record also shows that the browser stays up for 60 seconds at this point, without receiving a command. Conclusion: Our test client is doing something that takes longer than 60 seconds, so Sauce Labs gives up and closes the remote browser. Next time our test tries to send a command to Sauce Labs, it discovers that the test session has expired.

Another observation: Although this is an "eyes" test, we don't need to check Applitools logs because the test is consistently failing before it reaches any Applitools steps.

What is the test doing that takes so long?

To understand this, we need to dig down into our step definitions.

I am a facilitator with started and completed courses invokes several other steps in turn.

code-dot-org/dashboard/test/ui/features/step_definitions/pd.rb

Lines 73 to 82 in d671161

    
           Given(/^I am a facilitator with started and completed courses$/) do 
        
             random_name = "TestFacilitator" + SecureRandom.hex[0..9] 
        
             steps %Q{ 
        
               And I create a teacher named "#{random_name}" 
        
               And I make the teacher named "#{random_name}" a facilitator for course "CS Fundamentals" 
        
               And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and start it 
        
               And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and end it 
        
               And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people 
        
             } 
        
           end

Ben and I looked at these steps one at a time and tried to identify potentially slow code.

And I create a teacher named "#{random_name}"

code-dot-org/dashboard/test/ui/features/step_definitions/steps.rb

Lines 1285 to 1288 in d671161

    
           And(/^I create a teacher named "([^"]*)"( and go home)?$/) do |name, home| 
        
             create_user(name, age: '21+', user_type: 'teacher', email_preference_opt_in: 'yes') 
        
             navigate_to replace_hostname('http://studio.code.org') if home 
        
           end

code-dot-org/dashboard/test/ui/features/step_definitions/steps.rb

Lines 1249 to 1270 in d671161

    
           def create_user(name, url: '/users.json', code: 201, **user_opts) 
        
             navigate_to replace_hostname('http://studio.code.org/reset_session') 
        
             Retryable.retryable(on: RSpec::Expectations::ExpectationNotMetError, tries: 3) do 
        
               email, password = generate_user(name) 
        
               browser_request( 
        
                 url: url, 
        
                 method: 'POST', 
        
                 body: { 
        
                   user: { 
        
                     user_type: 'student', 
        
                     email: email, 
        
                     password: password, 
        
                     password_confirmation: password, 
        
                     name: name, 
        
                     age: '16', 
        
                     terms_of_service_version: '1' 
        
                   }.merge(user_opts) 
        
                 }, 
        
                 code: code 
        
               ) 
        
             end 
        
           end

This step does interact with the remote browser. It accounts for the navigation to /reset_session and for the successful POST request we observed in the Sauce Labs log. I think it's safe to say this step completes successfully.

And I make the teacher named "#{random_name}" a facilitator for course "CS Fundamentals"

code-dot-org/dashboard/test/ui/features/step_definitions/pd.rb

Lines 147 to 153 in d671161

    
           And(/^I make the teacher named "([^"]*)" a facilitator for course "([^"]*)"$/) do |name, course| 
        
             require_rails_env 
        
             user = User.find_by(email: @users[name][:email]) 
        
             user.permission = UserPermission::FACILITATOR 
        
             Pd::CourseFacilitator.create(facilitator_id: user.id, course: course) 
        
           end

The require_rails_env step is known to be slow, but we do use it in other places. I'd also expect these tests to be failing on Drone if that was the only issue. We'll keep it in mind.

Everything else here looks fine - we're finding a user by email, granting them a permission, and inserting a CourseFacilitator row.

Note that this step does not interact with the remote browser.

And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and start it

The last three steps are actually all the same, heavily-parameterized step:

code-dot-org/dashboard/test/ui/features/step_definitions/pd.rb

Lines 470 to 523 in d671161

    
           And(/^I create a workshop for course "([^"]*)" ([a-z]+) by "([^"]*)" with (\d+) (people|facilitators)(.*)$/) do |course, role, name, number, number_type, post_create_actions| 
        
             # Organizer 
        
             organizer = 
        
               if role == 'organized' 
        
                 User.find_by(name: name) 
        
               else 
        
                 User.find_or_create_teacher( 
        
                   {name: 'Organizer', email: "organizer#{SecureRandom.hex[0..5]}@code.org"}, nil, 'workshop_organizer' 
        
                 ) 
        
               end 
        
             workshop = Retryable.retryable(on: [ActiveRecord::RecordNotUnique, ActiveRecord::RecordInvalid], tries: 5) do 
        
               FactoryGirl.create(:workshop, :funded, 
        
                 on_map: true, 
        
                 course: course, 
        
                 organizer_id: organizer.id, 
        
                 capacity: number.to_i, 
        
                 location_name: 'Buffalo', 
        
                 num_sessions: 1, 
        
                 sessions_from: Date.new(2018, 4, 1), 
        
                 enrolled_and_attending_users: number_type == 'people' ? number.to_i : 0 
        
               ) 
        
             end 
        
             # Facilitators 
        
             if number_type == 'facilitators' 
        
               number.to_i.times do 
        
                 workshop.facilitators << create_facilitator(course) 
        
               end 
        
             else 
        
               facilitator = role == 'facilitated' ? User.find_by(name: name) : create_facilitator(course) 
        
               workshop.facilitators << facilitator 
        
             end 
        
             # Attendees 
        
             if number_type == 'people' 
        
               number.to_i.times do 
        
                 create_enrollment(workshop) 
        
               end 
        
             else 
        
               if role == 'attended' 
        
                 create_enrollment(workshop, name) 
        
               end 
        
             end 
        
             if post_create_actions.include?('and end it') 
        
               workshop.update!(started_at: DateTime.new(2016, 3, 15)) 
        
               workshop.update!(ended_at: DateTime.new(2016, 3, 15)) 
        
             elsif post_create_actions.include?('and start it') 
        
               workshop.update!(started_at: DateTime.new(2016, 3, 15)) 
        
             else 
        
               workshop.update!(started_at: nil, ended_at: nil) 
        
             end 
        
           end

There's a lot going on here. First, notice that this code also does not interact with the remote browser.

Second, there are two User.find_by(name: name) calls, and one of them (line 500) is invoked by all three steps we are investigating. Our users table does not contain an index on name, so this accounts for at least three table scans during this step. On Drone that's probably fine - our database will only contain users we've created for tests. In our test environment, however, we can have millions of rows in the users table.

Today there were more than three million rows in users on test, which was sufficient to make this query take several seconds. We haven't yet uncovered whether this number was higher two days ago.

Third, this method invokes create_enrollment for each of the three steps in our parent step:

code-dot-org/dashboard/test/ui/features/step_definitions/pd.rb

Lines 436 to 459 in d671161

    
           def create_enrollment(workshop, name=nil) 
        
             first_name = name.nil? ? "First - #{SecureRandom.hex}" : name 
        
             last_name = name.nil? ? "Last - #{SecureRandom.hex}" : "Last" 
        
             user = Retryable.retryable(on: [ActiveRecord::RecordInvalid], tries: 5) do 
        
               FactoryGirl.create :teacher 
        
             end 
        
             enrollment = Pd::Enrollment.create!( 
        
               first_name: first_name, 
        
               last_name: last_name, 
        
               email: user.email, 
        
               school_info: SchoolInfo.find_or_create_by( 
        
                 { 
        
                   country: 'US', 
        
                   school_type: 'other', 
        
                   state: 'WA', 
        
                   zip: '98101', 
        
                   school_name: 'Code.org' 
        
                 } 
        
               ), 
        
               pd_workshop_id: workshop.id 
        
             ) 
        
             PEGASUS_DB[:forms].where(kind: 'PdWorkshopSurvey', source_id: enrollment.id).delete 
        
           end

create_enrollment in turn contains a School.find_or_create_by call that doesn't use any of the schools table's indexes, resulting in another table scan. There are about 100 thousand rows in schools on test today.

Conclusion

This test will fail if the require_rails_env call and the six table scans we found take more than 60 seconds all together. It seems likely that the slow queries are the main issue here, since this has been a problem on test but not Drone.

It looks like something (we're not sure what) happened two days ago to make these queries faster on test. Despite this, we're going ahead and fixing the users table scans here, to avoid similar issues in the future.

Links

Investigation notes (Code.org-internal)
Example failing run:
- Cucumber log
- Sauce Labs
Example successful run:
- Sauce Labs

Reviewer Checklist:

Tests provide adequate coverage
Code is well-commented
New features are translatable or updates will not break translations
Relevant documentation has been added or updated
User impact is well-understood and desirable
Pull Request is labeled appropriately
Follow-up work items (including potential tech debt) are tracked and linked

islemaster · 2020-07-11T00:12:48Z

dashboard/test/ui/features/plc/pd/dashboard_view.feature

+  And I wait to see element with ID "inProgressWorkshopsTable"
+  And I wait to see element with ID "notStartedWorkshopsTable"
+  And I open my eyes to test "facilitator dashboard"
+  And I see no difference for "viewing facilitator dashboard"


The entire test changes here because we are reversing an error in #35749 when the test was disabled. This actually puts the test back the way it originally was.

That was me (bad copy/paste), thanks for fixing!

islemaster · 2020-07-11T00:13:46Z

dashboard/test/ui/features/step_definitions/pd.rb

-    And I make the teacher named "#{random_name}" a workshop organizer
-    And I create a workshop for course "CS Fundamentals" organized by "#{random_name}" with 5 people and end it
-  }
-end


We noticed these two steps while looking for other code that might be affected by these table scans. It turns out, both of these are entirely unused.

islemaster · 2020-07-11T00:14:27Z

dashboard/test/ui/features/step_definitions/pd.rb

@@ -497,7 +479,7 @@ def create_facilitator(course)
      workshop.facilitators << create_facilitator(course)
    end
  else
-    facilitator = role == 'facilitated' ? User.find_by(name: name) : create_facilitator(course)
+    facilitator = role == 'facilitated' ? find_test_user_by_name(name) : create_facilitator(course)


We believe this particular slow query was the cause of this test failing.

islemaster · 2020-07-11T00:15:04Z

dashboard/test/ui/features/step_definitions/pd.rb

  user.permission = UserPermission::WORKSHOP_ORGANIZER
 end

 And(/^I make the teacher named "([^"]*)" a workshop admin$/) do |name|
  require_rails_env

-  user = User.find_by(email: @users[name][:email])
+  user = find_test_user_by_name(name)


Most of these replacements are simply a refactor, since they were already looking the test user up by email.

islemaster · 2020-07-11T00:15:58Z

dashboard/test/ui/features/step_definitions/steps.rb

@@ -1062,6 +1062,10 @@ def generate_user(name)
  return email, password
 end

+def find_test_user_by_name(name)
+  User.find_by(email: @users[name][:email])
+end


We've located this helper in steps.rb adjacent to the generate_user method which is responsible for initializing and populating the @users variable, and confirmed that it is accessible from pd.rb.

breville

Very interesting!

Would it make sense to erase and recreate the test env every so often?

islemaster · 2020-07-13T15:44:46Z

Would it make sense to erase and recreate the test env every so often?

Yes, it probably would! It might be reasonable to reset the test database on every run. Didn't we do this at some point? If that slows down our test cycle too much, I could see us instead choosing to wipe it once a week.

I think the counterargument is that not resetting the test database makes it more productionlike, giving us the chance of catching issues with migrations and performance issues caused by having large numbers of rows. I think the counter-counterargument is that test is probably not large in a productionlike way, since it grows with tests that don't clean up after themselves, and not in a way that models production traffic.

I'm not sure this particular failure makes a good case one way or another: We caught a performance problem, but it was specific to our test code.

islemaster added 3 commits July 10, 2020 16:11

Restore formerly flaky test

837af4b

More efficient test user lookup

e117280

Remove two unused cucumber steps

a82b955

islemaster commented Jul 11, 2020

View reviewed changes

islemaster marked this pull request as ready for review July 11, 2020 00:16

islemaster requested review from bencodeorg and a team July 11, 2020 00:16

breville approved these changes Jul 11, 2020

View reviewed changes

molly-moen approved these changes Jul 13, 2020

View reviewed changes

islemaster merged commit 1139081 into staging Jul 13, 2020

islemaster deleted the reenable-plc-test branch July 13, 2020 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable test: Facilitator View of dashboard is as expected #35775

Re-enable test: Facilitator View of dashboard is as expected #35775

islemaster commented Jul 10, 2020 •

edited

islemaster Jul 11, 2020

bencodeorg Jul 13, 2020

islemaster Jul 11, 2020

islemaster Jul 11, 2020

islemaster Jul 11, 2020

islemaster Jul 11, 2020

breville left a comment

islemaster commented Jul 13, 2020

	Scenario: Facilitator View of dashboard is as expected
	Given I am a facilitator with started and completed courses
	And I am on "http://studio.code.org/pd/workshop_dashboard"
	And I wait to see element with ID "endedWorkshopsTable"
	And I wait to see element with ID "inProgressWorkshopsTable"
	And I wait to see element with ID "notStartedWorkshopsTable"
	And I open my eyes to test "facilitator dashboard"
	And I see no difference for "viewing facilitator dashboard"
	And I close my eyes

	Given(/^I am a facilitator with started and completed courses$/) do
	random_name = "TestFacilitator" + SecureRandom.hex[0..9]
	steps %Q{
	And I create a teacher named "#{random_name}"
	And I make the teacher named "#{random_name}" a facilitator for course "CS Fundamentals"
	And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and start it
	And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and end it
	And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people
	}
	end

	And(/^I create a teacher named "([^"]*)"( and go home)?$/) do \|name, home\|
	create_user(name, age: '21+', user_type: 'teacher', email_preference_opt_in: 'yes')
	navigate_to replace_hostname('http://studio.code.org') if home
	end

	def create_user(name, url: '/users.json', code: 201, **user_opts)
	navigate_to replace_hostname('http://studio.code.org/reset_session')
	Retryable.retryable(on: RSpec::Expectations::ExpectationNotMetError, tries: 3) do
	email, password = generate_user(name)
	browser_request(
	url: url,
	method: 'POST',
	body: {
	user: {
	user_type: 'student',
	email: email,
	password: password,
	password_confirmation: password,
	name: name,
	age: '16',
	terms_of_service_version: '1'
	}.merge(user_opts)
	},
	code: code
	)
	end
	end

	And(/^I make the teacher named "([^"])" a facilitator for course "([^"])"$/) do \|name, course\|
	require_rails_env

	user = User.find_by(email: @users[name][:email])
	user.permission = UserPermission::FACILITATOR
	Pd::CourseFacilitator.create(facilitator_id: user.id, course: course)
	end

	And(/^I create a workshop for course "([^"])" ([a-z]+) by "([^"])" with (\d+) (people\|facilitators)(.*)$/) do \|course, role, name, number, number_type, post_create_actions\|
	# Organizer
	organizer =
	if role == 'organized'
	User.find_by(name: name)
	else
	User.find_or_create_teacher(
	{name: 'Organizer', email: "organizer#{SecureRandom.hex[0..5]}@code.org"}, nil, 'workshop_organizer'
	)
	end

	workshop = Retryable.retryable(on: [ActiveRecord::RecordNotUnique, ActiveRecord::RecordInvalid], tries: 5) do
	FactoryGirl.create(:workshop, :funded,
	on_map: true,
	course: course,
	organizer_id: organizer.id,
	capacity: number.to_i,
	location_name: 'Buffalo',
	num_sessions: 1,
	sessions_from: Date.new(2018, 4, 1),
	enrolled_and_attending_users: number_type == 'people' ? number.to_i : 0
	)
	end

	# Facilitators
	if number_type == 'facilitators'
	number.to_i.times do
	workshop.facilitators << create_facilitator(course)
	end
	else
	facilitator = role == 'facilitated' ? User.find_by(name: name) : create_facilitator(course)
	workshop.facilitators << facilitator
	end

	# Attendees
	if number_type == 'people'
	number.to_i.times do
	create_enrollment(workshop)
	end
	else
	if role == 'attended'
	create_enrollment(workshop, name)
	end
	end

	if post_create_actions.include?('and end it')
	workshop.update!(started_at: DateTime.new(2016, 3, 15))
	workshop.update!(ended_at: DateTime.new(2016, 3, 15))
	elsif post_create_actions.include?('and start it')
	workshop.update!(started_at: DateTime.new(2016, 3, 15))
	else
	workshop.update!(started_at: nil, ended_at: nil)
	end
	end

	def create_enrollment(workshop, name=nil)
	first_name = name.nil? ? "First - #{SecureRandom.hex}" : name
	last_name = name.nil? ? "Last - #{SecureRandom.hex}" : "Last"
	user = Retryable.retryable(on: [ActiveRecord::RecordInvalid], tries: 5) do
	FactoryGirl.create :teacher
	end
	enrollment = Pd::Enrollment.create!(
	first_name: first_name,
	last_name: last_name,
	email: user.email,
	school_info: SchoolInfo.find_or_create_by(
	{
	country: 'US',
	school_type: 'other',
	state: 'WA',
	zip: '98101',
	school_name: 'Code.org'
	}
	),
	pd_workshop_id: workshop.id
	)

	PEGASUS_DB[:forms].where(kind: 'PdWorkshopSurvey', source_id: enrollment.id).delete
	end

Re-enable test: Facilitator View of dashboard is as expected #35775

Re-enable test: Facilitator View of dashboard is as expected #35775

Conversation

islemaster commented Jul 10, 2020 • edited

Which test are we talking about?

How often does it fail?

How does it fail?

What is the test doing that takes so long?

And I create a teacher named "#{random_name}"

And I make the teacher named "#{random_name}" a facilitator for course "CS Fundamentals"

And I create a workshop for course "CS Fundamentals" facilitated by "#{random_name}" with 5 people and start it

Conclusion

Links

Reviewer Checklist:

islemaster Jul 11, 2020

Choose a reason for hiding this comment

bencodeorg Jul 13, 2020

Choose a reason for hiding this comment

islemaster Jul 11, 2020

Choose a reason for hiding this comment

islemaster Jul 11, 2020

Choose a reason for hiding this comment

islemaster Jul 11, 2020

Choose a reason for hiding this comment

islemaster Jul 11, 2020

Choose a reason for hiding this comment

breville left a comment

Choose a reason for hiding this comment

islemaster commented Jul 13, 2020

islemaster commented Jul 10, 2020 •

edited