Improves reliability of e2e testing #3382

phantomjinx · 2022-06-21T17:17:36Z

Major changes

Switch to global operator where possible
Refactored location of tests to reflect either global or namspaced testing

tadayosi

Great workflow improvements! There are only a few nitpicks.

e2e/support/test_support.go

tadayosi · 2022-06-23T04:45:18Z

Also, I'm curious -- after the changes should we choose global e2e testing as default, and only go for namespaced testing only when it's truly necessary?

phantomjinx · 2022-06-23T07:20:17Z

Also, I'm curious -- after the changes should we choose global e2e testing as default, and only go for namespaced testing only when it's truly necessary?

The way I've approached splitting the tests is only if the test has to be namespaced should it be so. For example, those tests that require adding an env var to the operator need to be namespaced whilst those tests that add changes to the integration platform can still be global. Therefore, the namespaced tests tend to be defined as install-related, cli-related and change-to-the-operator-related.

tadayosi

LGTM, thanks!

christophd · 2022-06-28T14:06:29Z

config/rbac/operator-role-knative.yaml

@@ -49,6 +51,8 @@ rules:
  - messaging.knative.dev
  resources:
  - subscriptions
+  - channels
+  - inmemorychannels


this is being fixed by the proper addressable-resolver cluster role binding in #3400

So it is no longer necessary to add these to the permissions and can be removed from this PR?

yes, this is obsolete and you can remove that because it is already covered by the addressable-resolver cluster role provided by Knative itself. In the end Camel K just uses a cluster role binding in order to bind the operator service account to this addressable-resolver cluster role. This should give all read permissions to Knative resources such as brokers, channels, etc.

Not sure about these create/delete permissions that you have also added in this PR for brokers and routes. Do we need that for some of the e2e tests?

@christophd
All additions to the permissions have been required due to tests failure on OCP4 and log messages to the affect that 'systemaccount... was forbidden from ${verb}ing x:y:z'

@phantomjinx yep makes sense, I wonder though if the write permissions are only required for the e2e tests. I can not recall that Camel K creates Knative brokers in production code

tadayosi · 2022-06-29T04:57:15Z

The OpenShift workflow keeps failing, so there seems to be something which may not be just flaky.

phantomjinx · 2022-06-29T16:06:30Z

@squakez
You might like to review the changes I have made to promotion in the final commit of this series. The promotion tests now all pass successfully (locally at least) with these changes.

squakez · 2022-06-30T07:36:23Z

It really fails one of the promotion tests:

    --- FAIL: TestKamelCLIPromote/plain_integration_promotion (0.31s)

phantomjinx · 2022-06-30T15:24:37Z

It really fails one of the promotion tests:
    --- FAIL: TestKamelCLIPromote/plain_integration_promotion (0.31s)

Openshift still failing but the Promote is now passing.

* Permissions necessary for knative to operate with camel-k

* Migrate to global operator when appropriate * Relocated tests to differentiate between cases that can run as with a global operator, if preferred, and those requiring a namespaced operator * New KamelInstall function that will not install an operator if the env var CAMEL_K_FORCE_GLOBAL_TEST is set. Instead it'll check if a global operator is available then install just the IntegrationPlatform * If test is being run as global test then used the global operator namespace rather than the given test namespace * Add channel to global operator install * Use global operator namespace for getting logs * Fixes OperatorPod function in test_support * Use KamelInstall rather than Kamel("install", ...) to support global * Fix order of uninstall cmd steps * Add ServicesByType function for testing a service's spec type * Add debugging and logging * Removes panic() in favour of failing test to allow for proper teardowns * Requires assigning the test (locus) to test support * Adds pre-cleanup jobs to each workflow * The name of the catalogsource was being assumed buried in the bundle build. Instead, move it to config-cluster, allowing pre-clean to occur after config but before build. * pre-clean must be moved since it clears the catalog source which has just been installed. * Rename e2e-kubernetes to e2e-common * Separates out e2e-install tests - namespace scoped tests * Specifies catalogsource name and namespace as inputs * If using OLM, the CRDs will be uninstalled prior to the call to uninstall the integration platform resulting in an error that integrationPlatform type cannot be found. * Cleanup preflight namespace after testing * Do not throw error if uninstall returns an error condition * Removes the catalogsource to ensure the index image of the catalog source is no longer cached. * Clean up orphan resources in cluster - integrations, platforms, kamelets, operatorgroups * Don't rely on kamel binary for uninstallation * Seems on some clusters it takes tests up to 10 minutes to get the integrations up and running * Sets tests to PROBLEMATIC for later fixing * Fixes flaky tests to make more reliable * Fix Test: Rest Query needs to look at correct namespace * KNative test fix: Move knative install after pre-clean * service-trait test failure * On OCP4, its possible for the services to be created in a different order and the ExternalName svc ends up being named "platform-http-server" * Adds ServicesByType function that fetches services according to the type field

* If not the profile's of integrations are updated to "Knative"

* Seems camel-k operator sometimes starts then immediately restarts so mitigate by extra waits and extra checks to ensure it is running and stable

* A system:puller rolebinding is necessary on openshift to ensure the images in one namespace can be pulled by another * Continue collecting the source kamelets even if there are no traits * Avoid json syntax errors if the rawmessages from the traits are empty * When validating kamelets, need to drop any namedconfig path extensions from the kamelet URIs so that the names can be directly compared * promotion_test.go * Have the test await for the readiness of the platform before doing the tests. Promotion depends on the info of the platform being complete.

* Stop 415 error occurring on Openshift 3.11

phantomjinx · 2022-07-04T11:38:32Z

@squakez @christophd @tadayosi
Can we look at merging this now please?

The oppenshift tests are failing but the errors are much more random (quarkus-native tests not spinning up, Environment tests failing then passing ....). The same tests are comfortably passing on OCP4 test suite so the tests themselves and camel-k seem less a problem than the OCP 3.11 platform. I can continue looking but the priority feels a lot lower now.

tadayosi · 2022-07-05T06:09:36Z

@phantomjinx After merging this, it appears that the openshift-build workflow starts to be unstable again. IIRC I haven't seen this type of test failure before:

=== RUN   TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit
Integration "jvm-to-native" created
=== CONT  TestNativeIntegrations
    test_support.go:116: Get "https://10.1.0.19:8443/apis/camel.apache.org/v1/namespaces/test-22193ffe-620a-425d-a687-b15526c6fa0d/integrationkits?labelSelector=camel.apache.org%2Fkit.layout%3Dnative": http2: client connection lost
=== CONT  TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit
    testing.go:1113: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== CONT  TestNativeIntegrations
    dump.go:47: -------------------- start dumping namespace test-22193ffe-620a-425d-a687-b15526c6fa0d --------------------
    test_support.go:1851: Error while dumping namespace test-22193ffe-620a-425d-a687-b15526c6fa0d: Get "https://10.1.0.19:8443/apis/camel.apache.org/v1/namespaces/test-22193ffe-620a-425d-a687-b15526c6fa0d/integrationplatforms": dial tcp 10.1.0.19:8443: connect: connection refused
    test_support.go:116: Get "https://10.1.0.19:8443/apis/image.openshift.io/v1?timeout=32s": dial tcp 10.1.0.19:8443: connect: connection refused
--- FAIL: TestNativeIntegrations (277.28s)
    --- PASS: TestNativeIntegrations/unsupported_integration_source_language (0.64s)
    --- FAIL: TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit (262.36s)

is this something we can improve?

phantomjinx force-pushed the e2e-testing branch from eb9a378 to 8194f43 Compare June 22, 2022 08:24

tadayosi requested changes Jun 23, 2022

View reviewed changes

e2e/support/test_support.go Outdated Show resolved Hide resolved

e2e/support/test_support.go Outdated Show resolved Hide resolved

phantomjinx force-pushed the e2e-testing branch from 1c85b6f to a3d08f4 Compare June 23, 2022 07:42

tadayosi approved these changes Jun 23, 2022

View reviewed changes

phantomjinx mentioned this pull request Jun 23, 2022

RBAC missing permissions for channels and inmemorychannels in messaging.knative.dev #3390

Closed

christophd reviewed Jun 28, 2022

View reviewed changes

phantomjinx force-pushed the e2e-testing branch from 45a6c6b to c19e4b1 Compare June 29, 2022 15:54

squakez approved these changes Jun 30, 2022

View reviewed changes

phantomjinx force-pushed the e2e-testing branch from c19e4b1 to aa3199e Compare June 30, 2022 13:14

phantomjinx force-pushed the e2e-testing branch from 39b9177 to 3239105 Compare July 1, 2022 13:45

phantomjinx added 9 commits July 3, 2022 11:18

Adds permissions for operator supporting knative

c357461

* Permissions necessary for knative to operate with camel-k

Completely remove all knative CRDs

234a7ec

* If not the profile's of integrations are updated to "Knative"

Removes PROBLEMATIC flag from BadRouteIntegration test

3db8cc6

Removes PROBLEMATIC flag to IntegrationScale tests

b7b5d79

Corrects syntax of Promote test

3851010

Try and improve preflight testing to avoid any pod restarts

e600cad

* Seems camel-k operator sometimes starts then immediately restarts so mitigate by extra waits and extra checks to ensure it is running and stable

Switch system puller to server or client apply

1493833

* Stop 415 error occurring on Openshift 3.11

phantomjinx force-pushed the e2e-testing branch from 3239105 to 1493833 Compare July 3, 2022 10:19

squakez merged commit a256264 into apache:main Jul 4, 2022

tadayosi mentioned this pull request Jul 11, 2022

Improve the reliability of e2e tests by installing operator globally as a pre-requisite. #3183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improves reliability of e2e testing #3382

Improves reliability of e2e testing #3382

phantomjinx commented Jun 21, 2022

tadayosi left a comment

tadayosi commented Jun 23, 2022

phantomjinx commented Jun 23, 2022

tadayosi left a comment

christophd Jun 28, 2022

phantomjinx Jun 28, 2022

christophd Jun 28, 2022

phantomjinx Jun 29, 2022

christophd Jun 29, 2022

tadayosi commented Jun 29, 2022

phantomjinx commented Jun 29, 2022

squakez commented Jun 30, 2022

phantomjinx commented Jun 30, 2022

phantomjinx commented Jul 4, 2022

tadayosi commented Jul 5, 2022

Improves reliability of e2e testing #3382

Improves reliability of e2e testing #3382

Conversation

phantomjinx commented Jun 21, 2022

tadayosi left a comment

Choose a reason for hiding this comment

tadayosi commented Jun 23, 2022

phantomjinx commented Jun 23, 2022

tadayosi left a comment

Choose a reason for hiding this comment

christophd Jun 28, 2022

Choose a reason for hiding this comment

phantomjinx Jun 28, 2022

Choose a reason for hiding this comment

christophd Jun 28, 2022

Choose a reason for hiding this comment

phantomjinx Jun 29, 2022

Choose a reason for hiding this comment

christophd Jun 29, 2022

Choose a reason for hiding this comment

tadayosi commented Jun 29, 2022

phantomjinx commented Jun 29, 2022

squakez commented Jun 30, 2022

phantomjinx commented Jun 30, 2022

phantomjinx commented Jul 4, 2022

tadayosi commented Jul 5, 2022