Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3990. Test Kubernetes examples with acceptance tests #1223

Merged
merged 13 commits into from Jul 30, 2020

Conversation

elek
Copy link
Member

@elek elek commented Jul 20, 2020

What changes were proposed in this pull request?

hadoop-ozone/dist/src/main/k8s/example directory contains example Kubernetes resources to start Ozone in kubernetes environment. To make sure those resources are working and up-to-date I propose to test them during standard build.

K3s project provides a lightweight Kubernetes distribution which can be installed easily in Github Actions environment and Kubernetes based clusters can be tested.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3990

How was this patch tested?

New type of acceptance tests are executed 5 times on my fork and passed.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @elek for adding this test framework. Looks good overall, but I have a few minor change suggestions.


stop_k8s_env

flekszible generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this flekszible generate call is only needed when executed from source dir (hadoop-ozone/dist/src/main/k8s) to restore source files. I propose it to be executed as part of stop_k8s_env:

  1. I ran tests from target/..., had some errors and tried to find out what's wrong. It was confusing to see resource files referencing non-existent docker image apache/ozone:0.6.0-SNAPSHOT (plus other differences compared to the files actually used for the test).
  2. Avoid possible omission in new scripts.
  3. Reduce code duplication.
  4. Save some very minimal runtime cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a main problem: some of the Kubernetes examples couldn't be started in the test environment. For example when the example is configured to schedule one datanode per real node, but we have only one node.

Therefore, before the tests the resource files are heavily modified:

  1. instead of using latest image, the current build is mounted to the /opt/ozone (similar to the docker tests)
  2. anti-affinity rules are removed (enable to run multiple datanode on the same node)
  3. real persistence is removed

These are executed by the regenerate_resources step at the beginning of the test. Which is something like this:

flekszible generate -t mount:hostPath="$OZONE_ROOT",path=/opt/hadoop -t image:image=apache/ozone-runner:20200420-1 -t ozone/onenode

As you can see we define three new transformations on the fly:

  1. -t mount:hostPath="$OZONE_ROOT",path=/opt/hadoop --> use the files from the current build
  2. -t image:image=apache/ozone-runner:20200420-1, use standard runner
  3. -t ozone/onenode enable to schedule more datanode to the same node

The line which is commented by you:

flekszible generate

Is 100% optional. After the test, it restores the original state of the files. It can be added to the stop_k8s_env (as it can always be added).

But based on the experience with the docker test.sh files, I would prefer to use more explicit lines in the test.sh.

As test.sh files are read frequently but modified only by a few times, I tend to make it slightly more verbose, but easier to understand (you can assume that stop_k8s_env stops kubernetes pods, but I wouldn't like to hide any hidden functionlaty there.

But theses are just my thought, as it's a beginning of new test, I am really open to modify to any direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming my guess.

I'm fine with making test.sh more verbose at the cost of some duplication. However, I propose extracting this optional flekszible generate to a function, and let it check if it's being run from src dir, to address my first point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and let it check if it's being run from src dir, to address my first point

I ran tests from target/..., had some errors and tried to find out what's wrong.

I am not sure what was the problem in your case. It supposed to work from the target directory. In fact hadoop-ozone/dev-support/checks/kubernetes.sh executes all the tests from target by default.

apache/ozone:0.6.0-SNAPSHOT is used because during the release it becomes apache/ozone:0.6.0 which is the distributed container.

./test.sh (with the first flekszible generation) replaces all the images with a dev version (ozone-runner + mount).

It seems to be working for me and passed on the github CI, but if you see any error, please let me know as it should work everywhere (do you have the latest released flekszible?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me clarify: I think the script would have worked fine, but I tried to use k3d (k3s in docker) instead of plain k3s. Since the local dir was not available in the container, it failed.

The confusing part was that the resource files did not mention volume mounts, since they were converted back at the end. So only by looking at regenerate_resources in testlib did I realize what's wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what came into my mind, too.

Today, it's not possible to run test on remote clusters (k3d is a remote cluster) as local mount doesn't work.

I can introduce some options in the future (would be great to execute tests on real environments, too).

For now, I will add a log message to the regenerate_resource to print a warning to the stdout.

hadoop-ozone/dist/src/main/k8s/examples/testlib.bats Outdated Show resolved Hide resolved
.github/workflows/post-commit.yml Outdated Show resolved Hide resolved
@adoroszlai
Copy link
Contributor

Thanks @elek for updating the patch. It's fine as it is.

@adoroszlai adoroszlai merged commit 30ec0e2 into apache:master Jul 30, 2020
@adoroszlai
Copy link
Contributor

Thanks @elek for improving it further.

errose28 pushed a commit to errose28/ozone that referenced this pull request Jul 31, 2020
* master: (55 commits)
  HDDS-4052. Remove master/slave terminology from Ozone (apache#1281)
  HDDS-4047. OzoneManager met NPE exception while getServiceList (apache#1277)
  HDDS-3990. Test Kubernetes examples with acceptance tests (apache#1223)
  HDDS-4045. Add more ignore rules to the RAT ignore list (apache#1273)
  HDDS-3970. Enabling TestStorageContainerManager with all failures addressed (apache#1257)
  HDDS-4033. Make the acceptance test reports hierarchical (apache#1263)
  HDDS-3423. Enabling TestContainerReplicationEndToEnd and addressing failures (apache#1260)
  HDDS-4027. Suppress ERROR message when SCM attempt to create additional pipelines. (apache#1265)
  HDDS-4024. Avoid while loop too soon when exception happen (apache#1253)
  HDDS-3809. Make number of open containers on a datanode a function of no of volumes reported by it. (apache#1081)
  HDDS-4019. Show the storageDir while need init om or scm (apache#1248)
  HDDS-3511. Fix javadoc comment in OmMetadataManager (apache#1247)
  HDDS-4041. Ozone /conf endpoint triggers kerberos replay error when SPNEGO is enabled. (apache#1267)
  HDDS-4031. Run shell tests in CI (apache#1261)
  HDDS-4038. Eliminate GitHub check warnings (apache#1268)
  HDDS-4011. Update S3 related documentation. (apache#1245)
  HDDS-4030. Remember the selected columns and make the X-axis scrollable in recon datanodes UI (apache#1259)
  HDDS-4032. Run author check without docker (apache#1262)
  HDDS-4026. Dir rename failed when sets 'ozone.om.enable.filesystem.paths' to true (apache#1256)
  HDDS-4017. Acceptance check may run against wrong commit (apache#1249)
  ...
vivekratnavel pushed a commit to vivekratnavel/hadoop-ozone that referenced this pull request Aug 8, 2020
rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants