-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e/storage: testing of external storage drivers #72836
Conversation
This feels like it doesn't need to be in-tree...with all the/your efforts refactoring the framework I envisioned the ideal solution would be to develop something analogous to csi-sanity out of tree so that someone has full control over the interface, versioning, etc. of this "shim" between drivers and the existing e2e tests. I think the interface being an arg to e2e.test would be unpleasant to use...I need to checkout kubernetes & do It doesn't mean we can't maintain both though. Anyway I like very much the general idea of having a generic driver that simply takes yamls so that I don't have to write a TestDriver or any code at all unless I'm doing something complicated, it's exactly what I had in mind also. |
/retest |
@wongma7 I think @msau42 wanted this in-tree because then the e2e.test binary that gets released together with each Kubernetes release (it does get released, right? I'm not sure because I've never used it) can be used as-is. The intended users wouldn't be developers, but rather cluster admins who want to run sanity or conformance tests. For that audience it shouldn't be necessary to check out Kubernetes and to build from source. Developers of a CSI driver however are better served by including the E2E framework and the storage testsuites package in their own Ginkgo test suite. Then they can run it together with addtional, custom tests. I had a comment with several questions, but I think I accidentally deleted that yesterday because the comment was shown to me twice. Let me write it down again:
|
e2e.test binaries don't seem to get included in binary releases (kubernetes-server-*.tar.gz). I am picturing some place in the kubernetes-csi org that documents how to run the csi e2e tests, like how https://github.com/cncf/k8s-conformance documents how to run sonobuoy to run k8s conformance tests. I think we are going to end up maintaining something out-of-tree like sonobuoy/csi-sanity anyway, because it's not trivial for developers to include the E2E framework and we need to provide an example. Given all that, I'm still unsure if this should be in-tree or if it could live out-of-tree to serve both cluster-admins and developers... i.e. cluster admins could run it as-is (each version of this out-of-tree thing would be pinned to a kubernetes version) and developers could also run it as-is or fork it if they want to write custom tests.
I guess somewhere in kubernetes-csi docs and here https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-tests.md
IMO no (assuming you mean GetPersistentVolumeSource)
I think you are right, that seems like a bug, it means the dynamic XFS tests aren't actually testing XFS?
Right, we need to formalize what "conformant" means first, then I think tagging the tests would be okay. At the moment there aren't so many tests that we need to be selective though... |
Matthew Wong <notifications@github.com> writes:
e2e.test binaries don't seem to get included in binary releases
(kubernetes-server-*.tar.gz). I am picturing some place in the
kubernetes-csi org that documents how to run the csi e2e tests, like
how https://github.com/cncf/k8s-conformance documents how to run
sonobuoy to run k8s conformance tests.
I thought Sonobuoy was only a simpler way of running the conformance
tests, but from
https://github.com/cncf/k8s-conformance/blob/master/instructions.md it
looks it is more than that.
I don't think kubernetes-csi should establish its own "storage
conformance" testing. IMHO that belongs into the normal Kubernetes
conformance testing and thus needs to be in-tree.
For additional features a separate test suite would work.
I think we are going to end up maintaining something out-of-tree like
sonobuoy/csi-sanity anyway, because it's not trivial for developers to
include the E2E framework and we need to provide an example.
I have a prototype here: https://github.com/pohly/csi-e2e
My plan was to use that for a CSI release job in Prow. Publishing the
resulting test/e2e.test binary could be added.
Given all that, I'm still unsure if this should be in-tree or if it
could live out-of-tree to serve both cluster-admins and
developers... i.e. cluster admins could run it as-is (each version of
this out-of-tree thing would be pinned to a kubernetes version) and
developers could also run it as-is or fork it if they want to write
custom tests.
@msau42, what do you think?
> Do we need to support more than just dynamic provisioning?
IMO no (assuming you mean GetPersistentVolumeSource)
That's what I meant.
I think you are right, that seems like a bug, it means the dynamic XFS
tests aren't actually testing XFS?
https://github.com/kubernetes/kubernetes/blob/e41d9f15530d5c5bd116c90a988c8245f0be7974/test/e2e/storage/testpatterns/testpattern.go#L132
https://github.com/kubernetes/kubernetes/blob/35061468cc9c03a377745a731a6ce7151f959948/test/e2e/storage/testsuites/base.go#L167
Looks like it. Can someone from Google check?
|
Regarding fstype testing: |
|
||
// StorageClass is required for enabling dynamic provisioning tests. | ||
// When set to "default", a StorageClass with DriverInfo.Name as provisioner is used. | ||
// Everything else is treated like the filename of a .yaml or .json file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the file is searched for in what path? I guess RepoRoot ($GOPATH/src/k8s.io/kubernetes/) ? Should document it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, it depends on how the e2e/framework/testfiles
package is set up in the E2E test suite that includes this package here. But you are right, I need to mention that here and explain the relationship with the -repo-root
parameter.
A safer approach will be to use an absolute file. #71667 was merged, so that works.
@@ -0,0 +1,217 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a README under .../storage/external?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, an explanation of how to use this and maybe even a link to an example (when there is one) would be very helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good, but I'll also add short README here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@wongma7 I found that There's a KEP pending about reorganizing that a bit (https://github.com/kubernetes/enhancements/pull/701/files) but in principle there is a path of getting a generic test binary into the hand of users, if that's what we want to do (still undecided). |
@pohly OK, sgtm, then I don't really have any objections to pointing users to e2e.test and maintaining this code here. |
@msau42 did we come to a conclusion whether we want this in-tree? |
One advantage of keeping it in-tree is that the tests definitions are tied to the K8s release. You won't have to manage things like K8s adding new tests for new features that may fail on older K8s clusters. And test fixes will automatically get picked up with each release. A disadvantage is it becomes difficult to manage differences in the driver definition across releases. But if this doesn't change often, then it should be manageable. Will these test cases try to run in the intree e2e job because they're defined in the binary? cc @davidz627 if he has ideas on how we would ideally want to do e2e testing in the pd repo. |
Michelle Au <notifications@github.com> writes:
A disadvantage is it becomes difficult to manage differences in the
driver definition across releases. But if this doesn't change often,
then it should be manageable.
Agreed.
Will these test cases try to run in the intree e2e job because they're
defined in the binary?
The tests are only getting defined if the -storage.testdriver parameter
is given, so no.
We could rip out the in-tree definition of the gcepd driver and replace
it with a pre-test deployment (defined by the Prow job) and execution
via -storage.testdriver, but that's a separate change.
|
dcfd365
to
de799a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this! Looks mostly pretty good but I have a couple questions. I also think a README explaining how this works and how to use it would be very valuable.
@@ -0,0 +1,217 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, an explanation of how to use this and maybe even a link to an example (when there is one) would be very helpful
return errors.Errorf("%q: DriverInfo.Name not set", filename) | ||
} | ||
|
||
// klog.Infof("Test driver definition from %q: %+v", filename, driver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: uncomment or remove these logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully decided about that yet. I added that because it gives feedback to the user about which parts of his config file really had an effect (because parsing will silently ignore unknown fields) and to make it easier for users to find "their" tests, but then the output is repeated once for each test process in a parallel Ginkgo run, so I commented it out again.
I'll probably remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
// When set to "default", a StorageClass with DriverInfo.Name as provisioner is used. | ||
// Everything else is treated like the filename of a .yaml or .json file | ||
// that defines a StorageClass. | ||
StorageClass string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this variable should be "dual use" here. Seems like there's a concept of StorageClassName
and a concept of StorageClassDefinitionFile
and they should be separated out with sensible defaults for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned StorageClass into a struct with two members, FromName and FromFile.
|
||
// Needed only for deserialization which isn't done, therefore the | ||
// functions below can be empty stubs. | ||
var _ runtime.Object = &driverDefinition{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is its "only needed for deserialization which isnt done" then why do we need it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
driverDefinition
must implement the interface, otherwise it cannot be passed to runtime.DecodeInto
. But runtime.DecodeInto
then never calls these methods.
If there is a better method for parsing a .yaml or .json file into a normal struct, then I am all ears.
|
||
func (d *driverDefinition) SkipUnsupportedTest(pattern testpatterns.TestPattern) { | ||
supported := false | ||
// TODO (?): add support for more volume types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the volume types supported should also be defined by the external driver. Some may support Block
or Snapshots
in the future and you'd probably want to be able to toggle those on/off through the DriverInfo
definition or some other method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't about snapshot or blocks. Other pattern.VolType
values are currently:
- PreprovisionedPV
- InlineVolume
I think those cannot be tested with a generic test driver.
kubernetes/test/e2e/storage/testpatterns/testpattern.go
Lines 36 to 46 in 076af3d
// TestVolType represents a volume type to be tested in a TestSuite | |
type TestVolType string | |
var ( | |
// InlineVolume represents a volume type that is used inline in volumeSource | |
InlineVolume TestVolType = "InlineVolume" | |
// PreprovisionedPV represents a volume type for pre-provisioned Persistent Volume | |
PreprovisionedPV TestVolType = "PreprovisionedPV" | |
// DynamicPV represents a volume type for dynamic provisioned Persistent Volume | |
DynamicPV TestVolType = "DynamicPV" | |
) |
But snapshots is something that can be added. It's not in the PR yet because it was added to the TestDriver
interface after my initial version of this PR.
I think block device don't need any special support. It's a capability, and that can be set already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added snapshotting support. I've tried it manually and the test ran, but failed because my cluster lacked some feature gate:
• Failure in Spec Teardown (AfterEach) [365.125 seconds]
External Storage [Driver: csi-hostpath]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/external/external.go:78
[Testpattern: Dynamic PV (default fs)] provisioning
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/base.go:75
should provision storage with snapshot data source [Feature:VolumeSnapshotDataSource] [AfterEach]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/provisioning.go:202
Expected error:
<*errors.StatusError | 0xc0005290e0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
Status: "Failure",
Message: "PersistentVolumeClaim \"pvc-6xw22\" is invalid: spec.dataSource: Forbidden: VolumeSnapshotDataSource is disabled by feature-gate",
Reason: "Invalid",
Details: {
Name: "pvc-6xw22",
Group: "",
Kind: "PersistentVolumeClaim",
UID: "",
Causes: [
{
Type: "FieldValueForbidden",
Message: "Forbidden: VolumeSnapshotDataSource is disabled by feature-gate",
Field: "spec.dataSource",
},
],
RetryAfterSeconds: 0,
},
Code: 422,
},
}
PersistentVolumeClaim "pvc-6xw22" is invalid: spec.dataSource: Forbidden: VolumeSnapshotDataSource is disabled by feature-gate
not to have occurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not PreprovisionedPV? Like StorageClass, there could be a PersistentVolume field with FromName or FromFile and its source could be returned in GetPersistentVolumeSource. That said, other than maybe for the nfs and iscsi csi drivers IDK if it will be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful for CSI drivers in progress. e.g. I haven't implemented CreateVolume yet but want to make sure publish works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets complicated if we want to add test cases that create multiple volumes. Dynamic provisioning is very useful for writing test cases like that.
But maybe those test cases can just be gated based on if the driver supports Preprovisioned PVs.
We also need to think about if we want to support inline ephemeral CSI volumes now too. Most of our test cases are specific to PVs
} | ||
|
||
func init() { | ||
flag.Var(testDriverParameter{}, "storage.testdriver", "name of a .yaml or .json file that defines a driver for storage testing, can be used more than once") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be used more than once? Do you just define the flag multiple times in the command line and go just figures it out that you want to init multiple versions of this package?
It's not clear to me how to invoke this test package, what kind of parameter to pass in to testDriverParameter
since it seems to be just an empty struct.
I think these are all questions that could be answered in a README
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe another way to do this would be that testDriverParameter
could point to the file that defines the DriverInfo
? It seems like maybe thats the idea but i'm not sure why we're using an empty struct that might be implementing an interface (I assume thats where Set
comes from even though theres no static interface check). It's just a little bit confusing to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, -storage.testdriver
can be given more than once. Each parameter instantiates the testsuite for another driver. The mechanism for doing that is the testDriverParameter
and its implementation of the flag.Value
interface. Let me add the interface check and some comments to make this clearer.
If we just stored the parameter instead of handling it right away, then a) the parameter can only be used once (at least when using a plain string value) and more importantly, b) we have no way of hooking into the process init between "parsing parameters" and "test suite starts", i.e. we cannot define the tests for that driver (or drivers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explanation.
de799a4
to
056ae4d
Compare
@davidz627 @wongma7 please have another look. This should now be ready for merging, or at least as ready as I can make it for 1.14. I've been using the branch for my kubernetes-csi CI work (kubernetes-csi/csi-driver-host-path#16) and it works for that. |
/retest |
6754c13
to
27ce623
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
some typos to fix later maybe
test/e2e/storage/external/README.md
Outdated
@@ -0,0 +1,49 @@ | |||
When a test suite like Kubernete's test/e2e/e2e.test includes this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's "Kubernetes' test/e2e/e2e.test". https://data.grammarbook.com/blog/apostrophes/apostrophes-with-words-ending-in-s/
But let's better avoid that altogether with "like test/e2e/e2e.test from Kubernetes"... done.
test/e2e/storage/external/README.md
Outdated
times to enabling testing of a certain pre-installed storage driver. | ||
|
||
The parameter takes as argument the name of a .yaml or .json file. The | ||
filename can be absolute or relative too --repo-root. The content of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/too/to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
One test failed anyway, so I pushed another commit with the typo fixes. |
/retest |
/lgtm |
StorageClass struct { | ||
// FromName set to true enables the usage of a | ||
// storage class with DriverInfo.Name as provisioner. | ||
FromName bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify in the comments that this well generate a storageclass with empty parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also added that the FromFile
option is for the case where parameters are needed.
// provisioned volumes. Default is "5GiB". | ||
ClaimSize string | ||
|
||
// ClientNodeName selects a specific node for scheduling test pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this is only needed for drivers like hostpath AND do not support topology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do we want to allow this? Can we just require that drivers implement topology if they have such a constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need this option right now ourselves for testing the hostpath driver as part of the upcoming kubernetes-csi CI. It's okay to explain that it shouldn't be needed and I have added a comment about it, but we should still have that option.
|
||
func (d *driverDefinition) SkipUnsupportedTest(pattern testpatterns.TestPattern) { | ||
supported := false | ||
// TODO (?): add support for more volume types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets complicated if we want to add test cases that create multiple volumes. Dynamic provisioning is very useful for writing test cases like that.
But maybe those test cases can just be gated based on if the driver supports Preprovisioned PVs.
We also need to think about if we want to support inline ephemeral CSI volumes now too. Most of our test cases are specific to PVs
We can always extend this later. Right now it supports everything that can be supported without invoking extra code, doesn't it? Supporting pre-provisioned volumes would imply that those volumes get created somehow and then are made available for testing. I haven't thought much yet about how to do this in detail. You once mentioned running code shipped in containers. That probably would work. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It is useful to apply the storage testsuite also to "external" (= out-of-tree) storage drivers. One way of doing that is setting up a custom E2E test suite, but that's still quite a bit of work. An easier alternative is to parameterize the Kubernetes e2e.test binary at runtime so that it instantiates the testsuite for one or more drivers. Some parameters have to be provided before starting the test because they define configuration and capabilities of the driver and its storage backend that cannot be discovered at runtime. This is done by populating the DriverDefinition with the content of the file that the new -storage.testdriver parameters points to. The universal .yaml and .json decoder from Kubernetes is used. It's flexible, but has some downsides: - currently ignores unknown fields (see kubernetes#71589) - poor error messages when fields have the wrong type Storage drivers have to be installed in the test cluster before starting e2e.test. Only tests involving dynamically provisioned volumes are currently supported.
f3832b4
to
6644db9
Compare
@msau42 I've rebased and squashed. |
/lgtm |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
It is useful to apply the storage testsuite also to "external" (=
out-of-tree) storage drivers. One way of doing that is setting up a
custom E2E test suite, but that's still quite a bit of work.
An easier alternative is to parameterize the Kubernetes e2e.test
binary at runtime so that it instantiates the testsuite for one or
more drivers.
Special notes for your reviewer:
I would prefer to merge this after PR #72434.
Does this PR introduce a user-facing change?: