New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-3990. Test Kubernetes examples with acceptance tests #1223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elek for adding this test framework. Looks good overall, but I have a few minor change suggestions.
hadoop-ozone/dist/src/main/k8s/examples/getting-started/test.sh
Outdated
Show resolved
Hide resolved
|
||
stop_k8s_env | ||
|
||
flekszible generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this flekszible generate
call is only needed when executed from source dir (hadoop-ozone/dist/src/main/k8s
) to restore source files. I propose it to be executed as part of stop_k8s_env
:
- I ran tests from
target/...
, had some errors and tried to find out what's wrong. It was confusing to see resource files referencing non-existent docker imageapache/ozone:0.6.0-SNAPSHOT
(plus other differences compared to the files actually used for the test). - Avoid possible omission in new scripts.
- Reduce code duplication.
- Save some very minimal runtime cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have a main problem: some of the Kubernetes examples couldn't be started in the test environment. For example when the example is configured to schedule one datanode per real node, but we have only one node.
Therefore, before the tests the resource files are heavily modified:
- instead of using latest image, the current build is mounted to the
/opt/ozone
(similar to the docker tests) - anti-affinity rules are removed (enable to run multiple datanode on the same node)
- real persistence is removed
These are executed by the regenerate_resources
step at the beginning of the test. Which is something like this:
flekszible generate -t mount:hostPath="$OZONE_ROOT",path=/opt/hadoop -t image:image=apache/ozone-runner:20200420-1 -t ozone/onenode
As you can see we define three new transformations on the fly:
-t mount:hostPath="$OZONE_ROOT",path=/opt/hadoop
--> use the files from the current build-t image:image=apache/ozone-runner:20200420-1
, use standard runner-t ozone/onenode
enable to schedule more datanode to the same node
The line which is commented by you:
flekszible generate
Is 100% optional. After the test, it restores the original state of the files. It can be added to the stop_k8s_env (as it can always be added).
But based on the experience with the docker test.sh
files, I would prefer to use more explicit lines in the test.sh.
As test.sh
files are read frequently but modified only by a few times, I tend to make it slightly more verbose, but easier to understand (you can assume that stop_k8s_env
stops kubernetes pods, but I wouldn't like to hide any hidden functionlaty there.
But theses are just my thought, as it's a beginning of new test, I am really open to modify to any direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming my guess.
I'm fine with making test.sh
more verbose at the cost of some duplication. However, I propose extracting this optional flekszible generate
to a function, and let it check if it's being run from src
dir, to address my first point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and let it check if it's being run from src dir, to address my first point
I ran tests from target/..., had some errors and tried to find out what's wrong.
I am not sure what was the problem in your case. It supposed to work from the target directory. In fact hadoop-ozone/dev-support/checks/kubernetes.sh
executes all the tests from target by default.
apache/ozone:0.6.0-SNAPSHOT
is used because during the release it becomes apache/ozone:0.6.0
which is the distributed container.
./test.sh
(with the first flekszible generation) replaces all the images with a dev version (ozone-runner + mount).
It seems to be working for me and passed on the github CI, but if you see any error, please let me know as it should work everywhere (do you have the latest released flekszible?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify: I think the script would have worked fine, but I tried to use k3d (k3s in docker) instead of plain k3s. Since the local dir was not available in the container, it failed.
The confusing part was that the resource files did not mention volume mounts, since they were converted back at the end. So only by looking at regenerate_resources
in testlib
did I realize what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what came into my mind, too.
Today, it's not possible to run test on remote clusters (k3d is a remote cluster) as local mount doesn't work.
I can introduce some options in the future (would be great to execute tests on real environments, too).
For now, I will add a log message to the regenerate_resource
to print a warning to the stdout.
Thanks @elek for updating the patch. It's fine as it is. |
Thanks @elek for improving it further. |
* master: (55 commits) HDDS-4052. Remove master/slave terminology from Ozone (apache#1281) HDDS-4047. OzoneManager met NPE exception while getServiceList (apache#1277) HDDS-3990. Test Kubernetes examples with acceptance tests (apache#1223) HDDS-4045. Add more ignore rules to the RAT ignore list (apache#1273) HDDS-3970. Enabling TestStorageContainerManager with all failures addressed (apache#1257) HDDS-4033. Make the acceptance test reports hierarchical (apache#1263) HDDS-3423. Enabling TestContainerReplicationEndToEnd and addressing failures (apache#1260) HDDS-4027. Suppress ERROR message when SCM attempt to create additional pipelines. (apache#1265) HDDS-4024. Avoid while loop too soon when exception happen (apache#1253) HDDS-3809. Make number of open containers on a datanode a function of no of volumes reported by it. (apache#1081) HDDS-4019. Show the storageDir while need init om or scm (apache#1248) HDDS-3511. Fix javadoc comment in OmMetadataManager (apache#1247) HDDS-4041. Ozone /conf endpoint triggers kerberos replay error when SPNEGO is enabled. (apache#1267) HDDS-4031. Run shell tests in CI (apache#1261) HDDS-4038. Eliminate GitHub check warnings (apache#1268) HDDS-4011. Update S3 related documentation. (apache#1245) HDDS-4030. Remember the selected columns and make the X-axis scrollable in recon datanodes UI (apache#1259) HDDS-4032. Run author check without docker (apache#1262) HDDS-4026. Dir rename failed when sets 'ozone.om.enable.filesystem.paths' to true (apache#1256) HDDS-4017. Acceptance check may run against wrong commit (apache#1249) ...
What changes were proposed in this pull request?
hadoop-ozone/dist/src/main/k8s/example
directory contains example Kubernetes resources to start Ozone in kubernetes environment. To make sure those resources are working and up-to-date I propose to test them during standard build.K3s project provides a lightweight Kubernetes distribution which can be installed easily in Github Actions environment and Kubernetes based clusters can be tested.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-3990
How was this patch tested?
New type of acceptance tests are executed 5 times on my fork and passed.