Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves reliability of e2e testing #3382

Merged
merged 9 commits into from
Jul 4, 2022
Merged

Conversation

phantomjinx
Copy link
Contributor

Major changes

  • Switch to global operator where possible
  • Refactored location of tests to reflect either global or namspaced testing

Copy link
Member

@tadayosi tadayosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great workflow improvements! There are only a few nitpicks.

e2e/support/test_support.go Outdated Show resolved Hide resolved
e2e/support/test_support.go Outdated Show resolved Hide resolved
@tadayosi
Copy link
Member

Also, I'm curious -- after the changes should we choose global e2e testing as default, and only go for namespaced testing only when it's truly necessary?

@phantomjinx
Copy link
Contributor Author

Also, I'm curious -- after the changes should we choose global e2e testing as default, and only go for namespaced testing only when it's truly necessary?

The way I've approached splitting the tests is only if the test has to be namespaced should it be so. For example, those tests that require adding an env var to the operator need to be namespaced whilst those tests that add changes to the integration platform can still be global. Therefore, the namespaced tests tend to be defined as install-related, cli-related and change-to-the-operator-related.

Copy link
Member

@tadayosi tadayosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -49,6 +51,8 @@ rules:
- messaging.knative.dev
resources:
- subscriptions
- channels
- inmemorychannels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being fixed by the proper addressable-resolver cluster role binding in #3400

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is no longer necessary to add these to the permissions and can be removed from this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is obsolete and you can remove that because it is already covered by the addressable-resolver cluster role provided by Knative itself. In the end Camel K just uses a cluster role binding in order to bind the operator service account to this addressable-resolver cluster role. This should give all read permissions to Knative resources such as brokers, channels, etc.

Not sure about these create/delete permissions that you have also added in this PR for brokers and routes. Do we need that for some of the e2e tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christophd
All additions to the permissions have been required due to tests failure on OCP4 and log messages to the affect that 'systemaccount... was forbidden from ${verb}ing x:y:z'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phantomjinx yep makes sense, I wonder though if the write permissions are only required for the e2e tests. I can not recall that Camel K creates Knative brokers in production code

@tadayosi
Copy link
Member

The OpenShift workflow keeps failing, so there seems to be something which may not be just flaky.

@phantomjinx
Copy link
Contributor Author

@squakez
You might like to review the changes I have made to promotion in the final commit of this series. The promotion tests now all pass successfully (locally at least) with these changes.

@squakez
Copy link
Contributor

squakez commented Jun 30, 2022

It really fails one of the promotion tests:

    --- FAIL: TestKamelCLIPromote/plain_integration_promotion (0.31s)

@phantomjinx
Copy link
Contributor Author

It really fails one of the promotion tests:

    --- FAIL: TestKamelCLIPromote/plain_integration_promotion (0.31s)

Openshift still failing but the Promote is now passing.

* Permissions necessary for knative to operate with camel-k
* Migrate to global operator when appropriate
 * Relocated tests to differentiate between cases that can run as
   with a global operator, if preferred, and those requiring a
   namespaced operator
 * New KamelInstall function that will not install an operator if
   the env var CAMEL_K_FORCE_GLOBAL_TEST is set. Instead it'll check
   if a global operator is available then install just the
   IntegrationPlatform
 * If test is being run as global test then used the global operator
   namespace rather than the given test namespace
 * Add channel to global operator install
 * Use global operator namespace for getting logs
 * Fixes OperatorPod function in test_support
 * Use KamelInstall rather than Kamel("install", ...) to support global

* Fix order of uninstall cmd steps

* Add ServicesByType function for testing a service's spec type

* Add debugging and logging

* Removes panic() in favour of failing test to allow for proper teardowns
 * Requires assigning the test (locus) to test support

* Adds pre-cleanup jobs to each workflow
 * The name of the catalogsource was being assumed buried in the
   bundle build. Instead, move it to config-cluster, allowing
   pre-clean to occur after config but before build.
 * pre-clean must be moved since it clears the catalog source
   which has just been installed.

* Rename e2e-kubernetes to e2e-common

* Separates out e2e-install tests - namespace scoped tests

* Specifies catalogsource name and namespace as inputs

* If using OLM, the CRDs will be uninstalled prior to the
  call to uninstall the integration platform resulting in an error
  that integrationPlatform type cannot be found.

* Cleanup preflight namespace after testing

* Do not throw error if uninstall returns an error condition

* Removes the catalogsource to ensure the index image of
  the catalog source is no longer cached.

* Clean up orphan resources in cluster - integrations, platforms, kamelets,
  operatorgroups

* Don't rely on kamel binary for uninstallation

* Seems on some clusters it takes tests up to 10 minutes to get
  the integrations up and running

* Sets tests to PROBLEMATIC for later fixing

* Fixes flaky tests to make more reliable

* Fix Test: Rest Query needs to look at correct namespace

* KNative test fix: Move knative install after pre-clean

* service-trait test failure
 * On OCP4, its possible for the services to be created in a
   different order and the ExternalName svc ends up being named
   "platform-http-server"
 * Adds ServicesByType function that fetches services according
   to the type field
* If not the profile's of integrations are updated to "Knative"
* Seems camel-k operator sometimes starts then immediately restarts so
  mitigate by extra waits and extra checks to ensure it is running and
  stable
* A system:puller rolebinding is necessary on openshift to ensure the
  images in one namespace can be pulled by another

* Continue collecting the source kamelets even if there are no traits

* Avoid json syntax errors if the rawmessages from the traits are empty

* When validating kamelets, need to drop any namedconfig path extensions
  from the kamelet URIs so that the names can be directly compared

* promotion_test.go
 * Have the test await for the readiness of the platform before doing the
   tests. Promotion depends on the info of the platform being complete.
* Stop 415 error occurring on Openshift 3.11
@phantomjinx
Copy link
Contributor Author

@squakez @christophd @tadayosi
Can we look at merging this now please?

The oppenshift tests are failing but the errors are much more random (quarkus-native tests not spinning up, Environment tests failing then passing ....). The same tests are comfortably passing on OCP4 test suite so the tests themselves and camel-k seem less a problem than the OCP 3.11 platform. I can continue looking but the priority feels a lot lower now.

@squakez squakez merged commit a256264 into apache:main Jul 4, 2022
@tadayosi
Copy link
Member

tadayosi commented Jul 5, 2022

@phantomjinx After merging this, it appears that the openshift-build workflow starts to be unstable again. IIRC I haven't seen this type of test failure before:

=== RUN   TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit
Integration "jvm-to-native" created
=== CONT  TestNativeIntegrations
    test_support.go:116: Get "https://10.1.0.19:8443/apis/camel.apache.org/v1/namespaces/test-22193ffe-620a-425d-a687-b15526c6fa0d/integrationkits?labelSelector=camel.apache.org%2Fkit.layout%3Dnative": http2: client connection lost
=== CONT  TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit
    testing.go:1113: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== CONT  TestNativeIntegrations
    dump.go:47: -------------------- start dumping namespace test-22193ffe-620a-425d-a687-b15526c6fa0d --------------------
    test_support.go:1851: Error while dumping namespace test-22193ffe-620a-425d-a687-b15526c6fa0d: Get "https://10.1.0.19:8443/apis/camel.apache.org/v1/namespaces/test-22193ffe-620a-425d-a687-b15526c6fa0d/integrationplatforms": dial tcp 10.1.0.19:8443: connect: connection refused
    test_support.go:116: Get "https://10.1.0.19:8443/apis/image.openshift.io/v1?timeout=32s": dial tcp 10.1.0.19:8443: connect: connection refused
--- FAIL: TestNativeIntegrations (277.28s)
    --- PASS: TestNativeIntegrations/unsupported_integration_source_language (0.64s)
    --- FAIL: TestNativeIntegrations/automatic_rollout_deployment_from_fast-jar_to_native_kit (262.36s)

is this something we can improve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants