Fix Traffic Ops Tenancy and Activity Bugs, Fix TO API Test Framework to work with Tenancy#3163
Conversation
There was a problem hiding this comment.
This fixes a race, where tests would randomly fail when DSSes weren't in the 20 returned by default.
There was a problem hiding this comment.
This fixes a race, where certain tests would create additional parameters with the same name+file (which is permissible, as long as the values are different). This fixes it to delete all returned parameters.
There was a problem hiding this comment.
This makes DeleteTestTenants delete all tenants starting with children, instead of hard-coding which need deleting first.
|
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
@rob05c Nearly two months ago @dangogh changed Fatalf to Errorf across all tests because Fatalf tests from cleaning up (f32c5ec). It seems that other tests would have the same panic issue if their create method failed. I want to make sure we have a consistent understanding of how tests should terminate and why.
There was a problem hiding this comment.
As an example outside this PR it seems in crconfig_test.go Dan changed Fatalf to Errorf (2018-10-30) then you used Fatalf again afterwards (2018-11-29). All cases of Errorf vs Fatalf should probably be covered in a separate PR imo.
There was a problem hiding this comment.
+1 -- if the Fatalf occurs before the test gets cleaned up, the following test may fail. Really should only use Fatalf if we can't recover from the situation, and that should be made very clear.
There was a problem hiding this comment.
Really should only use Fatalf if we can't recover from the situation
That's exactly what happens here, that's why I changed it. If an error is returned, the following line
log.Debugln("Response: ", resp.Alerts)
will panic. If an error is returned, the test can't keep going, which is exactly what Fatalf is for. Because of the panic, the test won't get cleaned up anyway, and subsequent tests will fail anyway. Using Fatal instead of Error avoids the panic stacktrace, and makes it easier to see what went wrong to fix it.
We really need to fix the API Test framework to clean up tests even if they panic or t.Fatalf. I don't think it'll be hard, I think it's as simple as changing all our tests from e.g.
func TestCacheGroups(t *testing.T) {
CreateTestTypes(t)
CreateTestCacheGroups(t)
GetTestCacheGroups(t)
CheckCacheGroupsAuthentication(t)
UpdateTestCacheGroups(t)
DeleteTestCacheGroups(t)
DeleteTestTypes(t)
}
To
func TestCacheGroups(t *testing.T) {
defer DeleteTestTypes(t)
CreateTestTypes(t)
defer DeleteTestCacheGroups(t)
CreateTestCacheGroups(t)
GetTestCacheGroups(t)
CheckCacheGroupsAuthentication(t)
UpdateTestCacheGroups(t)
}
(similar to
but thedefer Deletes need to come before their Creates).
But that should probably be its own PR.
There was a problem hiding this comment.
Actually, thinking about this some more, I think we should use Fatalf anywhere the test can't continue, irrespective of defer cleanups attempts.
There are always going to be cases where "delete" can't clean up appropriately, after a test fails at some point. Test errors are just like compile errors: you should only ever consider the first failure. Because any one failure can cause a chain of other unrelated failures, or worse, false successes. Any one test failure is a failure of the entire test suite.
The only right answer is to only consider the first failure, and cleanup is irrelevant, because all subsequent tests, pass or fail, are invalid.
This means all build pipelines (and people) should be using go test -failfast.
There was a problem hiding this comment.
Ok, the Fatal cleanup issue has been fixed in #3173
dangogh
left a comment
There was a problem hiding this comment.
also has conflicts w/ master in the tc-fixtures.json file...
traffic_ops/client/asn.go
Outdated
There was a problem hiding this comment.
+1 on moving these -- dunno why they got in asn.go in the first place..
There was a problem hiding this comment.
that's a lot of duplicate code.. can the former call the latter with n=-1 (for no limit)?
There was a problem hiding this comment.
Ok, I changed them to use a common func. IMO the abstraction isn't worth the readability cost. So, I changed it in the safest way I could think of.
A sigil (-1 or 0) would be confusing, since not passing a limit results in the default API limit of 20, not infinite (which is the bug adding a N func fixes); likewise passing the API limit would be bug-prone if the API changed; so, passing the query param itself seemed the safest and least confusing.
There was a problem hiding this comment.
+1 -- if the Fatalf occurs before the test gets cleaned up, the following test may fail. Really should only use Fatalf if we can't recover from the situation, and that should be made very clear.
There was a problem hiding this comment.
+1 -- this is much clearer, although still pretty hairy. I wonder if it would be worth creating this as an SQL function that can be tested/maintained outside of the code?
There was a problem hiding this comment.
We've already decided to eliminate the use_tenancy param and have it now always set to 1. Can we get rid of that portion?
There was a problem hiding this comment.
I think we made the decision, but the code doesn't currently enforce it, at present users can still leave tenancy disabled.
I think we need to merge #2791 before removing use_tenancy checks.
bc59bb5 to
e20885a
Compare
|
Fixed merge conflicts. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
- Fix TO Tenancy to prevent access when parent tenants are inactive. - Add TO API Tests to verify tenancy, verify access is disabled when parents or self are inactive. - Fix TO API Test user panic on test failure. - Fix TO API Test tenants to automatically delete children before parents, rather than hard-coding. - Fix TO API Test parameters race condition, fixed to delete all parameters not only first name+file. - Fix TO API Tests to enable Tenancy on all tests (by creating Parameters, which creates the use_tenancy=true param). The TO API Tests were completely broken with tenancy, due to the above bugs. With the bugs fixed, the tests pass with tenancy enabled. Fixes apache#2732
As requested in PR review.
3961cbb to
f8ac49f
Compare
|
Refer to this link for build results (access rights to CI server needed): |
This fixes major TO Tenancy bugs, as well as fixing the TO API framework to work with Tenancy enabled, and enabling it.
Unfortunately, all the tenancy issues, and the test framework, are intimately related, and there isn't a good way to separate them; so this PR is unfortunately large.
Specifically:
Fix TO Tenancy checks to consider whether the user's tenant is active, not whether the resource's tenant is active.
Fix TO Tenancy to prevent access when parent tenants (of the
user's tenant) are inactive.
Add TO API Tests to verify tenancy, verify access is disabled when
parents or self are inactive.
Fix TO API Test user panic on test failure.
Fix TO API Test tenants to automatically delete children before
parents, rather than hard-coding.
Fix TO API Test parameters race condition, fixed to delete all
parameters not only first name+file.
Fix TO API Tests to enable Tenancy on all tests (by creating
Parameters, which creates the use_tenancy param).
The TO API Tests were completely broken with tenancy, due
to the above bugs. With the bugs fixed, the tests pass
with tenancy enabled.
Fixes #2732
What does this PR do?
Fixes #2732
Which TC components are affected by this PR?
What is the best way to verify this PR?
Run API tests. Run TO with Tenancy enabled, verify resources are accessible as expected.
Check all that apply