Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][build] Add a default username in the image #21695

Merged
merged 6 commits into from
Dec 18, 2023

Conversation

zymap
Copy link
Member

@zymap zymap commented Dec 8, 2023

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

---

### Motivation

Add a default username in the pulsar image. When using HDFS
offloader, it requires a username to transfer the file.
Copy link

github-actions bot commented Dec 8, 2023

@zymap Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@zymap zymap added this to the 3.2.0 milestone Dec 8, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 8, 2023
@zymap zymap requested a review from lhotari December 8, 2023 09:28
@nodece
Copy link
Member

nodece commented Dec 8, 2023

Could you use the ARG to define the default user?

@michaeljmarshall
Copy link
Member

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

Can you provide more information about this requirement? Given that OpenShift doesn't use the username, it means the HDFS offloader won't work correctly there.

@lhotari
Copy link
Member

lhotari commented Dec 8, 2023

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

How about using HADOOP_USER_NAME instead?

@zymap
Copy link
Member Author

zymap commented Dec 14, 2023

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

Can you provide more information about this requirement? Given that OpenShift doesn't use the username, it means the HDFS offloader won't work correctly there.

@michaeljmarshall
When starting the pulsar with hdfs offloader, it will fail

2023-12-14T00:59:34,063+0000 [main] ERROR org.apache.pulsar.broker.PulsarService - Failed to start Pulsar service: org.apache.pulsar.broker.PulsarServerException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name

@zymap
Copy link
Member Author

zymap commented Dec 14, 2023

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

How about using HADOOP_USER_NAME instead?

@lhotari Seems it doesn't work... I think It should be okay to add a username for the existing user ID. Do you have any concerns?

@zymap
Copy link
Member Author

zymap commented Dec 14, 2023

/pulsarbot rerun-failure-checks

@zymap
Copy link
Member Author

zymap commented Dec 14, 2023

The OpenShift does not mention the username.

Lastly, the final USER declaration in the Dockerfile should specify the user ID (numeric value) and not the user name. This allows OpenShift Container Platform to validate the authority the image is attempting to run with and prevent running images that are trying to run as root, because running containers as a privileged user exposes potential security holes. If the image does not specify a USER, it inherits the USER from the parent image.

And I also tried the postgres images they mentioned here https://docs.openshift.com/container-platform/3.11/using_images/db_images/postgresql.html. It also has the username

(⎈|minikube:default):apache-pulsar-3.0.1.8/ $ docker run -it centos/postgresql-94-centos7:latest bash                      [9:57:14]
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
bash-4.2$ whoami
postgres

So it should be okay to add the username by default?

@lhotari @michaeljmarshall

@yuweisung
Copy link

yuweisung commented Dec 15, 2023

I think openshift doesn't like uid/gid in docker image. Openshift uses the uid/gid to map SELinux context to different syscap on the host.
The standard "rootless" image is basically put the workdir owner to non-root.
So

adduser pulsar
WORKDIR /pulsar
USER pulsar 

Should work in openshift.

https://www.redhat.com/en/blog/a-guide-to-openshift-and-uids

@lhotari
Copy link
Member

lhotari commented Dec 15, 2023

I think openshift doesn't like uid/gid in docker image. Openshift uses the uid/gid to map SELinux context to different syscap on the host. The standard "rootless" image is basically put the workdir owner to non-root. So

adduser pulsar
WORKDIR /pulsar
USER pulsar 

@yuweisung OpenShift ignores the UID in the image. In Openshift, the container user is a member of the root group and that's the only thing that matters. This is explained in https://www.redhat.com/en/blog/a-guide-to-openshift-and-uids .

The Container user is always a member of the root group, so it can read or write files accessible by GID=0. Any command invoked by the Entrypoint will be executed with this unprivileged UID and GID pair. That means, it is an unprivileged user executing the commands and the UID that will be used during execution is not known in advance. From the technical design perspective, that means, directories and files that may be written to by processes in the Container should be owned by the root group and be read/writable by GID=0. Files to be executed should also have group execute permissions.

@lhotari
Copy link
Member

lhotari commented Dec 15, 2023

So it should be okay to add the username by default?

@zymap Yes, it's fine to have a username for the existing user ID. Technically having a username only means that there's a line in the /etc/passwd file that matchers the UID. It's not more than that. The changes in this PR LGTM now.

@zymap
Copy link
Member Author

zymap commented Dec 15, 2023

@lhotari Thank you for your fix! I pushed failed, then found you already fixed it.

@lhotari
Copy link
Member

lhotari commented Dec 15, 2023

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

How about using HADOOP_USER_NAME instead?

@lhotari Seems it doesn't work... I think It should be okay to add a username for the existing user ID. Do you have any concerns?

@zymap The concern was already raised by @michaeljmarshall when he said "Given that OpenShift doesn't use the username, it means the HDFS offloader won't work correctly there.".

How did you set HADOOP_USER_NAME and how did you test it? It would need to work if there's a need to support OpenShift and other container environments that use arbitrary user ids.

@lhotari
Copy link
Member

lhotari commented Dec 15, 2023

@zymap This is how the user name is set in Hadoop Java library: https://stackoverflow.com/a/16788971

@zymap
Copy link
Member Author

zymap commented Dec 18, 2023

@zymap zymap merged commit d5f0097 into apache:master Dec 18, 2023
47 of 48 checks passed
@lhotari
Copy link
Member

lhotari commented Dec 18, 2023

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

How about using HADOOP_USER_NAME instead?

@lhotari Seems it doesn't work... I think It should be okay to add a username for the existing user ID. Do you have any concerns?

@zymap The concern was already raised by @michaeljmarshall when he said "Given that OpenShift doesn't use the username, it means the HDFS offloader won't work correctly there.".

How did you set HADOOP_USER_NAME and how did you test it? It would need to work if there's a need to support OpenShift and other container environments that use arbitrary user ids.

@zymap this issue isn't resolved. The solution in this PR doesn't work on OpenShift.

@zymap
Copy link
Member Author

zymap commented Dec 19, 2023

@lhotari
For this PR I just want to add a default username by default, I can start the pulsar hdfs offloader in the container without the error invalid null input: name.
For the OpenShift, it looks like more related to how to configure the Hadoop users on the OpenShift. Because OpenShift uses a random user ID. We should have some documents for that, but I am not familiar with the OpenShift very much. We need a volunteer with experience in deploying it on OpenShift to provide a guide.

@lhotari
Copy link
Member

lhotari commented Dec 19, 2023

@lhotari For this PR I just want to add a default username by default, I can start the pulsar hdfs offloader in the container without the error invalid null input: name. For the OpenShift, it looks like more related to how to configure the Hadoop users on the OpenShift. Because OpenShift uses a random user ID. We should have some documents for that, but I am not familiar with the OpenShift very much. We need a volunteer with experience in deploying it on OpenShift to provide a guide.

@zymap did you look at https://stackoverflow.com/a/16788971 ? Instead of relying on the OS user, the Java API of the Hadoop library should be used to configure the remote user.

@zymap
Copy link
Member Author

zymap commented Dec 20, 2023

Oh. I misunderstood. Let me try to set it in the code to see if it works.

zymap added a commit that referenced this pull request Jan 16, 2024
### Motivation

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

(cherry picked from commit d5f0097)
zymap added a commit that referenced this pull request Jan 22, 2024
### Motivation

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

(cherry picked from commit d5f0097)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 1, 2024
### Motivation

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

(cherry picked from commit d5f0097)
(cherry picked from commit c92c485)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 6, 2024
### Motivation

Add a default username in the pulsar image. When using HDFS offloader, it requires a username to transfer the file.

(cherry picked from commit d5f0097)
(cherry picked from commit c92c485)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants