e2e/storage: delay CreateDriver, simplify APIs and test definition #72434

pohly · 2018-12-29T16:17:17Z

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

CreateDriver is a potentially expensive operation, depending on the
driver. Creating and tearing down a framework instance also takes
time (measured at 6 seconds on a fast machine) and produces quite a
bit of log output.

Both can be avoided for tests that skipped based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.

The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.

To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).

Because we are changing the API already, also other changes are made:

some function prototypes get simplified
the naming of functions is changed to match their purpose
(tests aren't executed by the test suite, they only get defined
for later execution)
unused methods get removed (TestSuite.skipUnsupportedTest is redundant)

Which issue(s) this PR fixes:
Fixes #72288

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

e2e storage tests run faster and are easier to read

msau42 · 2019-01-02T22:54:46Z

/assign @verult
cc @mkimuram

pohly · 2019-01-03T07:13:23Z

/cc @mkimuram

k8s-ci-robot · 2019-01-03T07:13:24Z

@pohly: GitHub didn't allow me to request PR reviews from the following users: mkimuram.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mkimuram

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2019-01-03T07:14:34Z

This WIP because only the provisioning testsuite has been updated. To get the code compiled, the rest of the testsuites were removed.

pohly · 2019-01-03T07:15:24Z

I'd like to get feedback on this approach and a decision on #70992 before continuing with this PR.

mkimuram · 2019-01-04T20:28:20Z

@pohly

I understand that 12th commit is the core part of this PR, 1st-5th commits are from #70992, and 6th-11th commit are doing other minor fixes or improvements than separating config from driver.

I think that it would be easier to be reviewed by separating 6th-11th from this PR if they aren't prerequisites for this PR.
(10th commit might be better to be merged as part of #70992. 6th,7th, 9th, and 11th commits won't have dependency to #70992 and this PR, so it could be a separate PR, which applies change to master branch. 8th commit will have a dependency to #70992, but it would be easy to resolve merge conflict, so it would be better to make it a change to master branch. In addition, 8th commit seems adding new test, so this should be a separate PR. )

As for 12th commit, I agree to separate config from driver.

Can this refactoring be easily applied to other testsuites?
For provisioning testsuite, it only needs to handle dynamic provisioning. However, for most of other tests, they needs to also handle pre-provisioned and inline volume. I wonder how common logics for creating such resources could be implemented to each test suites by this way.
It would be helpful if you could commit some of the sample implementations for other testusites to see how it looks like, first. (Like two tests from subpath and one test from volumes.)

If there are easy way to achieve it, I agree to change this way, for it looks easier to read and understand.

davidz627 · 2019-01-04T23:15:16Z

/cc

davidz627

just a quick first pass. Thanks for working on this!

test/e2e/framework/specs.go

test/e2e/storage/csi_volumes.go

test/e2e/storage/drivers/csi.go

test/e2e/storage/testsuites/provisioning.go

test/e2e/storage/testsuites/testdriver.go

wongma7 · 2019-01-07T22:09:36Z

test/e2e/storage/testsuites/testdriver.go

@@ -59,7 +59,7 @@ type TestVolume interface {
 type PreprovisionedVolumeTestDriver interface {
 	TestDriver
 	// CreateVolume creates a pre-provisioned volume of the desired volume type.
-	CreateVolume(volumeType testpatterns.TestVolType) TestVolume
+	CreateVolume(config *TestConfig, volumeType testpatterns.TestVolType) TestVolume


same as with CreateDriver, I think it would be clearer to return a TestConfig if CreateVolume may modify the config

So you meant it should take a TestConfig and return a potentially different one?

Not done yet.

test/e2e/storage/testsuites/base.go

test/e2e/storage/testsuites/provisioning.go

pohly · 2019-02-14T13:28:26Z

@msau42 @wongma7 @davidz627: all tests green (including gce alpha and serial), no further changes planned as far as I am concerned -> please have a look.

wongma7

I just had some inconsequential suggestions.
driver & driver interface changes lgtm.
testsuite changes largely lgtm, may need to do another pass. Agree with removing skipUnsupportedTest from testsuites, only snapshots implemented it

wongma7 · 2019-02-14T19:35:35Z

test/e2e/storage/testsuites/volume_io.go

+		// Now do the more expensive test initialization.
+		config, testCleanup = driver.PrepareTest(f)
+		resource = createGenericVolumeTestResource(driver, config, pattern)
+		if resource.volSource == nil {


nit, check not needed

Probably not, but it was there in the original code and I wanted to avoid changing test logic.

wongma7 · 2019-02-14T19:36:05Z

test/e2e/storage/testsuites/volumes.go

+		// Now do the more expensive test initialization.
+		config, testCleanup = driver.PrepareTest(f)
+		resource = createGenericVolumeTestResource(driver, config, pattern)
+		if resource.volSource == nil {


Same 🤷‍♂️. I wanted to avoid changing test logic.

wongma7 · 2019-02-14T19:51:09Z

test/e2e/storage/testsuites/provisioning.go

+		claimSize := dDriver.GetClaimSize()
+		sc = dDriver.GetDynamicProvisionStorageClass(config, "")
+		if sc == nil {
+			framework.Skipf("Driver %q does not define Dynamic Provision StorageClass - skipping", dInfo.Name)


these Skip messages should specify that the sc is not defined because the fstype %s is not supported. there is no other reason (currently?) for a driver to implement DynamicPVTestDriver but return nil

Can this code here know why the driver didn't return a storage class? I don't think it should make assumptions based on the current implementation.

yeah I guess it's fine as-is. I guess it is the driver author's responsibility to make sense of why their GetDynamicProvisionStorageClass would return nil. Maybe it could return an error in the future, seems pointless now though

wongma7 · 2019-02-14T19:56:32Z

test/e2e/storage/testsuites/provisioning.go

+// TestDynamicProvisioning tests dynamic provisioning with specified StorageClassTest
+func (t StorageClassTest) TestDynamicProvisioning() *v1.PersistentVolume {
+	client := t.Client
+	claim := t.Claim


could fail w/ nice error if this is nil

Not sure if Github is showing me the right context. What can be nil here? t.Client and/or t.Claim?

Are you worried that they might be accidentally passed as nil because they are now struct members instead of separate function parameters? I agree, this isn't caught explicitly. If this is the only requested change, then I'd prefer to add assertions in a separate PR.

I meant t.Claim, though client could be nil as well. either way, the nil error will be caught a few lines below, so yeah these assertions can be added in a separate PR as "nice-to-haves" 👍

As I was rebasing anyway, I added the asserts.

jarrpa

I can't speak to the full PR, but the parts I'm familiar with LGTM.

davidz627

mostly LGTM. Just a couple small comments. Thanks for doing this!

davidz627 · 2019-02-14T23:02:42Z

test/e2e/storage/csi_volumes.go

-					defer destroyCSIDriver(csics, driver)
-					defer driver.CleanupDriver()
+					defer destroyCSIDriver(csics, config.GetUniqueDriverName())
+					defer testCleanup()


shouldn't we want to cleanup the test whether the driver exists or not

Yes. This was introduced in #74079 and I noticed it too when rebasing on top of that, then notified @jarrpa. There's now a fix pending in #74079... which has been merged. I'll need to rebase once more.

davidz627 · 2019-02-14T23:29:55Z

test/e2e/storage/testsuites/subpath.go

+			By("Deleting pod")
+			err := framework.DeletePodWithWait(f, cs, pod)
+			Expect(err).ToNot(HaveOccurred(), "while deleting pod")
+			pod = nil


this is suspicious to me. I want to avoid a pattern where we are reusing structs and just clearing field. Is there a way we can start from scratch instead?

The test does start from scratch, with a fresh copy of all local variables. Setting the pointers to nil is meant to ensure that cleanup can be called more than once, should the need ever arise. Currently this isn't done, so I could remove the pod = nil and leave the pointers to already deleted items, but personally would prefer to keep this extra precaution.

Hmm, I need to be more precise: each test starts with a fresh copy of the the local variables if init initializes all of them. I see how that might be something that can be missed for a complex test suite where some variables are only set under certain conditions.

This can be avoided by storing all these local variables in a struct and re-initializing the struct before setting individual members. I've done that in commit ec3655a.

I'm undecided whether it makes the code easier to read ("shared" variables clearly marked with l. prefix) or harder to read (extra prefix). @davidz627 Please let me know whether we should do it this way or by carefully setting all variables in init.

This increases type safety and makes the code easier to read because it becomes obvious that the "test resource" passed to some functions must be the result of a previous CreateVolume. This makes it possible to remove: - functions that never did anything (the DeleteVolume methods in drivers that never create a volume) - type casts (in the DeleteVolume implementation) - the unused DeleteVolume parameters - the stand-alone DeleteVolume function (which would be just a non-nil check) GetPersistentVolumeSource and GetVolumeSource could also become methods on more specific interfaces - they don't actually use anything from TestDriver instance which provides them. The main motivation however is to reduce the number of methods which might need an explicit test config parameter.

CreateDriver (now called SetupTest) is a potentially expensive operation, depending on the driver. Creating and tearing down a framework instance also takes time (measured at 6 seconds on a fast machine) and produces quite a bit of log output. Both can be avoided for tests that skip based on static information (like for instance the current OS, vendor, driver and test pattern) by making the test suite responsible for creating framework and driver. The lifecycle of the TestConfig instance was confusing because it was stored inside the DriverInfo, a struct which conceptually is static, while the TestConfig is dynamic. It is cleaner to separate the two, even if that means that an additional pointer must be passed into some functions. Now CreateDriver is responsible for initializing the PerTestConfig that is to be used by the test. To make this approach simpler to implement (= less functions which need the pointer) and the tests easier to read, the entire setup and test definition is now contained in a single function. This is how it is normally done in Ginkgo. This is easier to read because one can see at a glance where variables are set, instead of having to trace values though two additional structs (TestResource and TestInput). Because we are changing the API already, also other changes are made: - some function prototypes get simplified - the naming of functions is changed to match their purpose (tests aren't executed by the test suite, they only get defined for later execution) - unused methods get removed (TestSuite.skipUnsupportedTest is redundant)

The recommended approach for not running unsuitable tests is to skip them at runtime with an explanation. Filtering out unsuitable test patters and thus not even defining unsuitable tests was done earlier because it was faster than skipping tests at runtime. But now these tests can be skipped efficiently, so this special case can be removed.

Generated via hack/update-bazel.sh.

There is a risk that the init function does not reset one of the local variables that was set by a previous test. To avoid this, all variables set by init are now in a struct which gets cleaned completely first.

pohly · 2019-02-15T12:42:54Z

/test pull-kubernetes-e2e-gce-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial

pohly · 2019-02-15T12:43:08Z

/priority important-soon

msau42 · 2019-02-19T22:22:41Z

@childsb this should be 1.14 milestone right?

/approve

will let @davidz627 and/or @wongma7 lgtm

k8s-ci-robot · 2019-02-19T22:23:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/e2e/storage/OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

davidz627 · 2019-02-20T00:18:19Z

/lgtm

fejta-bot · 2019-02-20T07:18:14Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-02-20T11:51:08Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

k8s-ci-robot requested review from liggitt and msau42 December 29, 2018 16:17

k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 29, 2018

pohly mentioned this pull request Dec 29, 2018

storage e2e: TestDriver should be unique per driver instance #72288

Closed

liggitt removed their request for review December 30, 2018 02:44

k8s-ci-robot assigned verult Jan 2, 2019

pohly mentioned this pull request Jan 4, 2019

e2e/storage: avoid definining unusable tests #70992

Closed

pohly force-pushed the storage-volume-testsuites-config branch 2 times, most recently from 1d427cf to c764180 Compare January 4, 2019 19:48

k8s-ci-robot requested a review from davidz627 January 4, 2019 23:15

davidz627 reviewed Jan 4, 2019

View reviewed changes

wongma7 reviewed Jan 7, 2019

View reviewed changes

mathu97 reviewed Jan 8, 2019

View reviewed changes

test/e2e/storage/testsuites/base.go Show resolved Hide resolved

wongma7 reviewed Jan 9, 2019

View reviewed changes

test/e2e/storage/testsuites/provisioning.go Outdated Show resolved Hide resolved

pohly force-pushed the storage-volume-testsuites-config branch from c764180 to 4410815 Compare January 10, 2019 17:22

pohly changed the title ~~WIP: e2e/storage: delay CreateDriver, simplify APIs and test definition~~ e2e/storage: delay CreateDriver, simplify APIs and test definition Feb 14, 2019

wongma7 approved these changes Feb 14, 2019

View reviewed changes

jarrpa approved these changes Feb 14, 2019

View reviewed changes

davidz627 reviewed Feb 14, 2019

View reviewed changes

pohly added 5 commits February 15, 2019 08:38

e2e: update bazel BUILD files

e79cd9e

Generated via hack/update-bazel.sh.

e2e/storage: re-initialize all local variables

ec3655a

There is a risk that the init function does not reset one of the local variables that was set by a previous test. To avoid this, all variables set by init are now in a struct which gets cleaned completely first.

pohly force-pushed the storage-volume-testsuites-config branch from 1692255 to ec3655a Compare February 15, 2019 10:06

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 15, 2019

childsb added this to the v1.13 milestone Feb 15, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2019

k8s-ci-robot assigned davidz627 Feb 20, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2019

k8s-ci-robot merged commit 07cf5fa into kubernetes:master Feb 20, 2019

mkimuram mentioned this pull request Feb 26, 2019

Fix resource leaking PDs for volume mode test #74554

Merged

pohly mentioned this pull request Mar 21, 2019

E2E storage: simplify test code and configuration #72242

Closed

e2e/storage: delay CreateDriver, simplify APIs and test definition #72434

e2e/storage: delay CreateDriver, simplify APIs and test definition #72434

Conversation

pohly commented Dec 29, 2018 • edited

msau42 commented Jan 2, 2019

pohly commented Jan 3, 2019

k8s-ci-robot commented Jan 3, 2019

pohly commented Jan 3, 2019

pohly commented Jan 3, 2019

mkimuram commented Jan 4, 2019

davidz627 commented Jan 4, 2019

davidz627 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Feb 14, 2019

wongma7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarrpa left a comment

Choose a reason for hiding this comment

davidz627 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Feb 15, 2019

pohly commented Feb 15, 2019

msau42 commented Feb 19, 2019

k8s-ci-robot commented Feb 19, 2019

davidz627 commented Feb 20, 2019

fejta-bot commented Feb 20, 2019

fejta-bot commented Feb 20, 2019

pohly commented Dec 29, 2018 •

edited