Skip to content

HDDS-4525. Replace Hadoop variables and functions in Ozone shell scripts with Ozone-specific ones#1667

Merged
smengcl merged 28 commits intoapache:masterfrom
adoroszlai:HDDS-4525
Jan 21, 2021
Merged

HDDS-4525. Replace Hadoop variables and functions in Ozone shell scripts with Ozone-specific ones#1667
smengcl merged 28 commits intoapache:masterfrom
adoroszlai:HDDS-4525

Conversation

@adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Dec 7, 2020

What changes were proposed in this pull request?

  1. Replace shell functions and variables that were originally copied from Hadoop.
    • Deprecate HADOOP_* variables, but use their values, unless the corresponding OZONE_ variable is also defined (which indicates "new code").
    • HADOOP_CONF_DIR is replaced by OZONE_CONFIG_DIR (instead of OZONE_CONF_DIR) to workaround a behavior envtoconf that cannot handle variables named *_CONF_*. (Root cause is fixed in HDDS-4601 instead.)
  2. Drop unmaintained Windows scripts
  3. Drop empty mapreduce, yarn etc. directories in final artifact, they are no longer necessary.
  4. Fix wrong accumulation of return codes in start-ozone.sh: it was incremented for the first and third commands and unconditionally set for the second one, so the result of the first command was lost. It should be set for the first one instead.

https://issues.apache.org/jira/browse/HDDS-4525

How was this patch tested?

Added:

  • Bats test for the function that deprecates a variable
  • acceptance test for compatibility with HDFS_OM_OPTS etc. variables
  • acceptance tests for ozone classpath and ozone envvars commands

Changed MR acceptance tests to globally define both HADOOP_CLASSPATH and OZONE_CLASSPATH. This confirms that shaded Ozone FS jar being present in HADOOP_CLASSPATH does not break Ozone startup. Previously HADOOP_CLASSPATH had to be limited to the Hadoop containers.

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/405676505

Manually tested start-ozone.sh and stop-ozone.sh in ozonescripts environment (see HDDS-4556 for automating the smoketest).

@adoroszlai adoroszlai self-assigned this Dec 7, 2020
@swagle swagle requested review from elek and fapifta December 7, 2020 16:39
@adoroszlai adoroszlai marked this pull request as ready for review December 14, 2020 20:01
@fapifta
Copy link
Contributor

fapifta commented Dec 15, 2020

Hi @adoroszlai, thank you for working on this change, please find a few comments from me inline.

I have a concern about OZONE_CONFIG_DIR, we brake a convention here, as not just Hadoop, but HBase, or Hive as well for example uses the COMPONENT_CONF_DIR to specify the config dir.
Is it possible to instead changing to config and breaking the convention fix the envtoconf behaviour, I see that OZONE_CONFIG_DIR is added to docker-config files and that's how envtoconf came into the picture, previously we did not needed HADOOP_CONF_DIR there, we do we need OZONE_CONFIG_DIR in the docker-config files now?

. "${HADOOP_LIBEXEC_DIR}/hadoop-config.sh"
else
echo "ERROR: Cannot execute ${HADOOP_LIBEXEC_DIR}/hadoop-config.sh." 2>&1
if ! ozone_bootstrap; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a strange one here...
So in line 45 if ozone_bootstrap is not declared, we error out because ozone-functions.sh could not be loaded.

As I understand here we check for the exit status of ozone_bootstrap function, and if it is false we exit because we can not find ozone-config.sh. Why we need this second check? As I see the ozone_bootstrap function is not doing anything that should fail, but maybe my eye slipped through something.

This same we do in hadoop-ozone/dist/src/shell/ozone/ozone, in hadoop-ozone/dist/src/shell/ozone/start-ozone.sh and in hadoop-ozone/dist/src/shell/ozone/stop-ozone.sh files as well

@adoroszlai
Copy link
Contributor Author

Is it possible to instead changing to config and breaking the convention fix the envtoconf behaviour, I see that OZONE_CONFIG_DIR is added to docker-config files and that's how envtoconf came into the picture, previously we did not needed HADOOP_CONF_DIR there, we do we need OZONE_CONFIG_DIR in the docker-config files now?

HADOOP_CONF_DIR is already defined in the docker images we use, hence it's not needed in docker-config, eg.:

$ docker run -it --rm apache/ozone-runner:20200625-1 env | grep HADOOP
HADOOP_LOG_DIR=/var/log/hadoop
HADOOP_CONF_DIR=/etc/hadoop

Also, HADOOP_CONF_DIR is an explicitly defined exception in envtoconf:

self.excluded_envs = ['HADOOP_CONF_DIR']

We could fix this script, but it would only apply to ozone-runner containers, since it uses mounted Ozone binaries. Both ozone and hadoop containers (used in upgrade, ozone-mr, etc. tests) come with envtoconf.py baked in.

$ docker run -it --rm apache/hadoop:3 grep CONF_DIR /opt/envtoconf.py
    self.excluded_envs = ['HADOOP_CONF_DIR']
$ docker run -it --rm apache/ozone:1.0.0 grep CONF_DIR /opt/hadoop/libexec/envtoconf.py
    self.excluded_envs = ['HADOOP_CONF_DIR']

So we have 3 options:

  1. proper fix: update envtoconf in both Ozone and Hadoop and build new Docker images (apache/ozone-runner, apache/hadoop, apache/ozone, the latter two for multiple versions)
  2. hack: modify all docker-compose.yaml files to mount the fixed version of envtoconf
  3. workaround: use different variable name

@fapifta
Copy link
Contributor

fapifta commented Dec 15, 2020

Hi Attila, thank you for addressing review comments of mine, and going further and find way more cases than I did!

Regarding the CONF_DIR environment variables, I would say we should fix it properly, and I understand that updating the docker images would be a pain, and a bit of a tedious project, and this task is already tedious.

On the other hand, we learned that the envtoconf.py may change over time, as requirements are changing, I don't think that option two is hacky. It is hacky in a way that we overwrite a file backed into the docker images, but on the other hand, I would argue that the file should not be in the docker image itself, as even if this does not happen often, the file changes from time to time, so we should preserve the possibility to change it easily.
What do you think?

Ofc, a properly done option two would require to modify all the images to leave out envtoconf.py, and to modify all the docker-compose file to mount the file into the image, which is again even more tedious.
What if as part of this JIRA you add the mount to the docker compose files, and we create a follow up jira to remove the file from containers?
In return I can offer to volunteer for doing the docker image update, though I might need some help with it, but I really don't want to break this convention ;)

@codecov-io
Copy link

codecov-io commented Dec 16, 2020

Codecov Report

Merging #1667 (dfd2aaf) into master (74315ac) will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1667      +/-   ##
============================================
+ Coverage     75.70%   75.76%   +0.06%     
- Complexity    11301    11310       +9     
============================================
  Files          1048     1048              
  Lines         53898    53836      -62     
  Branches       5324     5307      -17     
============================================
- Hits          40801    40787      -14     
+ Misses        10564    10531      -33     
+ Partials       2533     2518      -15     
Impacted Files Coverage Δ Complexity Δ
.../hadoop/ozone/s3/endpoint/MultiDeleteResponse.java 25.00% <0.00%> (-20.00%) 4.00% <0.00%> (-1.00%)
...hdds/scm/container/common/helpers/ExcludeList.java 86.95% <0.00%> (-13.05%) 19.00% <0.00%> (-3.00%)
...e/hadoop/ozone/s3/endpoint/MultiDeleteRequest.java 65.00% <0.00%> (-10.00%) 4.00% <0.00%> (-1.00%)
.../apache/hadoop/ozone/s3/endpoint/EndpointBase.java 64.10% <0.00%> (-7.90%) 10.00% <0.00%> (-4.00%)
...doop/hdds/scm/container/ContainerStateManager.java 81.67% <0.00%> (-6.88%) 32.00% <0.00%> (-3.00%)
...e/commandhandler/CloseContainerCommandHandler.java 82.45% <0.00%> (-3.51%) 11.00% <0.00%> (ø%)
...ent/algorithms/SCMContainerPlacementRackAware.java 76.69% <0.00%> (-3.01%) 31.00% <0.00%> (-2.00%)
...hadoop/hdds/scm/container/SCMContainerManager.java 72.60% <0.00%> (-1.83%) 40.00% <0.00%> (-1.00%)
.../java/org/apache/hadoop/ozone/debug/DBScanner.java 74.35% <0.00%> (-0.65%) 18.00% <0.00%> (ø%)
...mon/transport/server/ratis/XceiverServerRatis.java 86.97% <0.00%> (-0.27%) 64.00% <0.00%> (+1.00%) ⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74315ac...a70def9. Read the comment docs.

@fapifta
Copy link
Contributor

fapifta commented Dec 16, 2020

Hi @adoroszlai, thank you for addressing my concern and revert back to OZONE_CONF_DIR.

Let me ask as I did not have the time to go through and understand fully the transformation.py usage and how that effects the environment processing that is problematic with env vars with names like CONF, and to understand what is happening exactly as the consequence of your change?
Now I am guessing, but based on your change envtoconf.py is using transformation.py and with adding it to the docker-compose.yaml file, we effectively overriding the original transformation.py that is backed into the image(?).
The change itself in transformation.py is to go through the items instead of the collection means that we are transforming something different in those cases and we do it similarly as the one case you mentioned working well before this change also, I am not sure though, how this solves the problem.
Can you please give me some pointers where should I see what makes the difference or can you please point out the difference for me to understand a bit more easily what is happening?

@adoroszlai
Copy link
Contributor Author

Thanks @fapifta for taking another look, and also your initial review and nudging me to fix it properly.

Now I am guessing, but based on your change envtoconf.py is using transformation.py and with adding it to the docker-compose.yaml file, we effectively overriding the original transformation.py that is backed into the image(?).

Exactly. This change is necessary only for apache/ozone and apache/hadoop images, since apache/ozone-runner requires Ozone binaries to be mounted anyway. We can remove these mounts if/when the images are rebuilt with fixed envtoconf.

Answered the other part of your question about transformation.py in the separate PR where this is being fixed (should be merged before this one).

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing what will happen during env to conf conversion with the CONF environment variables, now it is clear how the changes solve that problem. (Files might get generated, from the environment variables, but those files are not used by anything, so it is fine to left those there as garbage.

I went through the changes again, I haven't spot anything problematic, I am +1 to commit it.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for working on this. Overall looks pretty good to me.

Do we have a HADOOP_ROOT_LOGGER alternative here?

Also do we take in HADOOP_OPTS when OZONE_OPTS is not set? IIRC some CM configs may still append to HADOOP_OPTS at the moment.

Also I think it would be a good idea to put a small table in wiki or somewhere to show the Ozone equivalents of Hadoop environment variables. Should be very helpful for admins adapting Ozone from HDFS. To start:

Hadoop / HDFS Ozone Note
HADOOP_CONF_DIR OZONE_CONF_DIR When OZONE_CONF_DIR is not set, HADOOP_CONF_DIR will be used; if set, HADOOP_CONF_DIR is ignored.
HADOOP_OPTS OZONE_OPTS


ozone_add_default_gc_opts

[[ ! "$OZONE_OPTS" =~ "UseConcMarkSweepGC" ]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this patch but CMS is removed in JDK 14 and on and UseConcMarkSweepGC will be ignored by those higher version JVMs. We might want to come up with a new set of GC params soon.

@adoroszlai
Copy link
Contributor Author

Thanks @smengcl for the review.

Do we have a HADOOP_ROOT_LOGGER alternative here?

Yes, OZONE_ROOT_LOGGER. Note that the java property is still hadoop.root.logger (since I didn't find a way to deprecate these).

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Pending CI

@smengcl
Copy link
Contributor

smengcl commented Jan 21, 2021

Thanks @adoroszlai for rechecking. Looks like we got a good run. Will merge this shortly.

@smengcl smengcl merged commit bc3e3e5 into apache:master Jan 21, 2021
@adoroszlai adoroszlai deleted the HDDS-4525 branch January 22, 2021 08:33
@adoroszlai
Copy link
Contributor Author

Thanks @fapifta and @smengcl for the reviews, and @smengcl for merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants