Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e/storage: delay CreateDriver, simplify APIs and test definition #72434

Merged

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Dec 29, 2018

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

CreateDriver is a potentially expensive operation, depending on the
driver. Creating and tearing down a framework instance also takes
time (measured at 6 seconds on a fast machine) and produces quite a
bit of log output.

Both can be avoided for tests that skipped based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.

The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.

To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).

Because we are changing the API already, also other changes are made:

  • some function prototypes get simplified
  • the naming of functions is changed to match their purpose
    (tests aren't executed by the test suite, they only get defined
    for later execution)
  • unused methods get removed (TestSuite.skipUnsupportedTest is redundant)

Which issue(s) this PR fixes:
Fixes #72288

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

e2e storage tests run faster and are easier to read

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 29, 2018
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 29, 2018
@liggitt liggitt removed their request for review December 30, 2018 02:44
@msau42
Copy link
Member

msau42 commented Jan 2, 2019

/assign @verult
cc @mkimuram

@pohly
Copy link
Contributor Author

pohly commented Jan 3, 2019

/cc @mkimuram

@k8s-ci-robot
Copy link
Contributor

@pohly: GitHub didn't allow me to request PR reviews from the following users: mkimuram.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mkimuram

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly
Copy link
Contributor Author

pohly commented Jan 3, 2019

This WIP because only the provisioning testsuite has been updated. To get the code compiled, the rest of the testsuites were removed.

@pohly
Copy link
Contributor Author

pohly commented Jan 3, 2019

I'd like to get feedback on this approach and a decision on #70992 before continuing with this PR.

@pohly pohly force-pushed the storage-volume-testsuites-config branch 2 times, most recently from 1d427cf to c764180 Compare January 4, 2019 19:48
@mkimuram
Copy link
Contributor

mkimuram commented Jan 4, 2019

@pohly

I understand that 12th commit is the core part of this PR, 1st-5th commits are from #70992, and 6th-11th commit are doing other minor fixes or improvements than separating config from driver.

I think that it would be easier to be reviewed by separating 6th-11th from this PR if they aren't prerequisites for this PR.
(10th commit might be better to be merged as part of #70992. 6th,7th, 9th, and 11th commits won't have dependency to #70992 and this PR, so it could be a separate PR, which applies change to master branch. 8th commit will have a dependency to #70992, but it would be easy to resolve merge conflict, so it would be better to make it a change to master branch. In addition, 8th commit seems adding new test, so this should be a separate PR. )

As for 12th commit, I agree to separate config from driver.

Can this refactoring be easily applied to other testsuites?
For provisioning testsuite, it only needs to handle dynamic provisioning. However, for most of other tests, they needs to also handle pre-provisioned and inline volume. I wonder how common logics for creating such resources could be implemented to each test suites by this way.
It would be helpful if you could commit some of the sample implementations for other testusites to see how it looks like, first. (Like two tests from subpath and one test from volumes.)

If there are easy way to achieve it, I agree to change this way, for it looks easier to read and understand.

@davidz627
Copy link
Contributor

/cc

Copy link
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick first pass. Thanks for working on this!

test/e2e/framework/specs.go Outdated Show resolved Hide resolved
test/e2e/storage/csi_volumes.go Outdated Show resolved Hide resolved
test/e2e/storage/drivers/csi.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/provisioning.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/provisioning.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/provisioning.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/provisioning.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/testdriver.go Outdated Show resolved Hide resolved
test/e2e/storage/testsuites/testdriver.go Outdated Show resolved Hide resolved
@@ -59,7 +59,7 @@ type TestVolume interface {
type PreprovisionedVolumeTestDriver interface {
TestDriver
// CreateVolume creates a pre-provisioned volume of the desired volume type.
CreateVolume(volumeType testpatterns.TestVolType) TestVolume
CreateVolume(config *TestConfig, volumeType testpatterns.TestVolType) TestVolume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as with CreateDriver, I think it would be clearer to return a TestConfig if CreateVolume may modify the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you meant it should take a TestConfig and return a potentially different one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet.

@pohly pohly force-pushed the storage-volume-testsuites-config branch from c764180 to 4410815 Compare January 10, 2019 17:22
@pohly
Copy link
Contributor Author

pohly commented Feb 14, 2019

@msau42 @wongma7 @davidz627: all tests green (including gce alpha and serial), no further changes planned as far as I am concerned -> please have a look.

@pohly pohly changed the title WIP: e2e/storage: delay CreateDriver, simplify APIs and test definition e2e/storage: delay CreateDriver, simplify APIs and test definition Feb 14, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Feb 14, 2019
Copy link
Contributor

@wongma7 wongma7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had some inconsequential suggestions.
driver & driver interface changes lgtm.
testsuite changes largely lgtm, may need to do another pass. Agree with removing skipUnsupportedTest from testsuites, only snapshots implemented it

// Now do the more expensive test initialization.
config, testCleanup = driver.PrepareTest(f)
resource = createGenericVolumeTestResource(driver, config, pattern)
if resource.volSource == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, check not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, but it was there in the original code and I wanted to avoid changing test logic.

// Now do the more expensive test initialization.
config, testCleanup = driver.PrepareTest(f)
resource = createGenericVolumeTestResource(driver, config, pattern)
if resource.volSource == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 🤷‍♂️. I wanted to avoid changing test logic.

claimSize := dDriver.GetClaimSize()
sc = dDriver.GetDynamicProvisionStorageClass(config, "")
if sc == nil {
framework.Skipf("Driver %q does not define Dynamic Provision StorageClass - skipping", dInfo.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these Skip messages should specify that the sc is not defined because the fstype %s is not supported. there is no other reason (currently?) for a driver to implement DynamicPVTestDriver but return nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this code here know why the driver didn't return a storage class? I don't think it should make assumptions based on the current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I guess it's fine as-is. I guess it is the driver author's responsibility to make sense of why their GetDynamicProvisionStorageClass would return nil. Maybe it could return an error in the future, seems pointless now though

// TestDynamicProvisioning tests dynamic provisioning with specified StorageClassTest
func (t StorageClassTest) TestDynamicProvisioning() *v1.PersistentVolume {
client := t.Client
claim := t.Claim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could fail w/ nice error if this is nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if Github is showing me the right context. What can be nil here? t.Client and/or t.Claim?

Are you worried that they might be accidentally passed as nil because they are now struct members instead of separate function parameters? I agree, this isn't caught explicitly. If this is the only requested change, then I'd prefer to add assertions in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant t.Claim, though client could be nil as well. either way, the nil error will be caught a few lines below, so yeah these assertions can be added in a separate PR as "nice-to-haves" 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was rebasing anyway, I added the asserts.

Copy link
Contributor

@jarrpa jarrpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the full PR, but the parts I'm familiar with LGTM.

Copy link
Contributor

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM. Just a couple small comments. Thanks for doing this!

defer destroyCSIDriver(csics, driver)
defer driver.CleanupDriver()
defer destroyCSIDriver(csics, config.GetUniqueDriverName())
defer testCleanup()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we want to cleanup the test whether the driver exists or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This was introduced in #74079 and I noticed it too when rebasing on top of that, then notified @jarrpa. There's now a fix pending in #74079... which has been merged. I'll need to rebase once more.

By("Deleting pod")
err := framework.DeletePodWithWait(f, cs, pod)
Expect(err).ToNot(HaveOccurred(), "while deleting pod")
pod = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is suspicious to me. I want to avoid a pattern where we are reusing structs and just clearing field. Is there a way we can start from scratch instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does start from scratch, with a fresh copy of all local variables. Setting the pointers to nil is meant to ensure that cleanup can be called more than once, should the need ever arise. Currently this isn't done, so I could remove the pod = nil and leave the pointers to already deleted items, but personally would prefer to keep this extra precaution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I need to be more precise: each test starts with a fresh copy of the the local variables if init initializes all of them. I see how that might be something that can be missed for a complex test suite where some variables are only set under certain conditions.

This can be avoided by storing all these local variables in a struct and re-initializing the struct before setting individual members. I've done that in commit ec3655a.

I'm undecided whether it makes the code easier to read ("shared" variables clearly marked with l. prefix) or harder to read (extra prefix). @davidz627 Please let me know whether we should do it this way or by carefully setting all variables in init.

This increases type safety and makes the code easier to read because
it becomes obvious that the "test resource" passed to some functions
must be the result of a previous CreateVolume.

This makes it possible to remove:
- functions that never did anything (the DeleteVolume methods in
  drivers that never create a volume)
- type casts (in the DeleteVolume implementation)
- the unused DeleteVolume parameters
- the stand-alone DeleteVolume function (which would be just a non-nil
  check)

GetPersistentVolumeSource and GetVolumeSource could also become
methods on more specific interfaces - they don't actually use anything
from TestDriver instance which provides them.

The main motivation however is to reduce the number of methods which
might need an explicit test config parameter.
CreateDriver (now called SetupTest) is a potentially expensive
operation, depending on the driver. Creating and tearing down a
framework instance also takes time (measured at 6 seconds on a fast
machine) and produces quite a bit of log output.

Both can be avoided for tests that skip based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.

The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.

To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).

Because we are changing the API already, also other changes are made:
- some function prototypes get simplified
- the naming of functions is changed to match their purpose
  (tests aren't executed by the test suite, they only get defined
  for later execution)
- unused methods get removed (TestSuite.skipUnsupportedTest is redundant)
The recommended approach for not running unsuitable tests is to skip
them at runtime with an explanation. Filtering out unsuitable test
patters and thus not even defining unsuitable tests was done earlier
because it was faster than skipping tests at runtime.

But now these tests can be skipped efficiently, so this special case
can be removed.
Generated via hack/update-bazel.sh.
There is a risk that the init function does not reset one of the local
variables that was set by a previous test. To avoid this, all
variables set by init are now in a struct which gets cleaned
completely first.
@pohly pohly force-pushed the storage-volume-testsuites-config branch from 1692255 to ec3655a Compare February 15, 2019 10:06
@pohly
Copy link
Contributor Author

pohly commented Feb 15, 2019

/test pull-kubernetes-e2e-gce-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial

@pohly
Copy link
Contributor Author

pohly commented Feb 15, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 15, 2019
@childsb childsb added this to the v1.13 milestone Feb 15, 2019
@msau42
Copy link
Member

msau42 commented Feb 19, 2019

@childsb this should be 1.14 milestone right?

/approve

will let @davidz627 and/or @wongma7 lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2019
@davidz627
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage e2e: TestDriver should be unique per driver instance