Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker container for druid #6896

Merged
merged 11 commits into from Feb 8, 2019
Merged

Conversation

donbowman
Copy link
Contributor

This container is an 'omnibus' (since there is such a high overlap with the various services). It includes all contrib extension as well as the msql connector.

It is intended to be run as docker run NAME SERVICE (e.g. docker run druid:latest broker)

Signed-off-by: Don Bowman db@donbowman.ca

@donbowman donbowman force-pushed the add-container-build branch 2 times, most recently from e7d091b to 06994b2 Compare January 22, 2019 13:24
This container is an 'omnibus' (since there is such a high
overlap with the various services). It includes all contrib
extension as well as the msql connector.

It is intended to be run as `docker run NAME SERVICE`
(e.g. docker run druid:latest broker)

Signed-off-by: Don Bowman <db@donbowman.ca>
Copy link
Contributor

@drcrallen drcrallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good @donbowman

Would you mind posting in the dev list a question about if scripts that download GPL resources are ok in the repository?

.dockerignore Outdated Show resolved Hide resolved
&& ln -s /opt/apache-druid-${VER} /opt/druid

RUN cd /opt/druid/extensions/mysql-metadata-storage \
&& wget -O mysql-connector-java-5.1.38.jar http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause issues. Not sure what the limits of GPL stuff are, but having a script to download GPL components may or may not fly with Apache guidelines, will need confirmation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can be added as another (optional) layer that will probably help things out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is its own layer, but i don't really think it makes any difference. I'm not sure how i would make it optional, a sep build arg? a sep dockerfile?
the linux components (e.g. bash) are already gpl, so its not possible to have a container which doesn't have gpl.
and the gpl allows aggregation like this without attaching.
the other apache containers (e.g. tomcat, httpd, maven) all have gpl components in them.

distribution/docker/Dockerfile Outdated Show resolved Hide resolved

COPY . /src
WORKDIR /src
RUN mvn install -ff -DskipTests -Dforbiddenapis.skip=true -Pdist -Pbundle-contrib-exts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)

You can use cache mounts to make this go faster.

Something like

RUN --mount=type=cache,id=druid-m2-repo,target=/root/.m2/repository mvn install -ff -DskipTests -Dforbiddenapis.skip=true -Pdist -Pbundle-contrib-exts -f /src/pom.xml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to do this. Its extremely new (released in Docker 18.09) so this will cause trouble in e.g. CI. Also it appears to not universally function with private registry:
moby/moby#32507

distribution/docker/README.md Show resolved Hide resolved
distribution/docker/druid.sh Outdated Show resolved Hide resolved
distribution/docker/druid.sh Show resolved Hide resolved
@donbowman
Copy link
Contributor Author

This is looking pretty good @donbowman

Would you mind posting in the dev list a question about if scripts that download GPL resources are ok in the repository?

Done. I'll watch for responses.

- Remove server/target in favour of target for dockerignore
- Remove cd from wget line in favour of -O
- Add full path in sha256sums given above wget
- Remove 'cd' in .sh script
@donbowman
Copy link
Contributor Author

OK above are all addressed:

  • posted to ML about script which fetches a GPL file
  • I don't recommend moving to buildkit requirement given its not avail in CI and has issues with private repo

@drcrallen
Copy link
Contributor

drcrallen commented Jan 24, 2019

I really like this as a way to get experimental clusters up quickly. I would think that the course of adoption would go something like:

  1. Use this docker build method to get an example image built
  2. Run a docker-compose or helm chart or similar to spin up a quick cluster (maybe we can have one in a future pr? proof-of-concept Superset + Druid in a single command would rock). If it were a Mesos framework it could be a full Apache stack :)
  3. Tinker around with taking in some toy data
  4. Show your boss how awesome the tech is
  5. Get a team together to productionize the build / deploy / log / monitor the stuff in whatever way your company does such things.
  6. Profit from awesome interactive analytics
  7. Hire more developers
  8. Get them to be official ASF contributors

This is a great building block for such a funnel.

@donbowman
Copy link
Contributor Author

I really like this as a way to get experimental clusters up quickly. I would think that the course of adoption would go something like:

  1. Use this docker build method to get an example image built
  2. Run a docker-compose or helm chart or similar to spin up a quick cluster (maybe we can have one in a future pr? proof-of-concept Superset + Druid in a single command would rock). If it were a Mesos framework it could a full Apache stack :)
  3. Tinker around with taking in some toy data
  4. Show your boss how awesome the tech is
  5. Get a team together to productionize the build / deploy / log / monitor the stuff in whatever way your company does such things.
  6. Profit from awesome interactive analytics
  7. Hire more developers
  8. Get them to be official ASF contributors

This is a great building block for such a funnel.

I have the docker-compose setup w/ sample data as the next PR. I can easily add superset to it given i use it too.
6,7,8 is up to you :)

@maver1ck
Copy link
Contributor

Ad 2)
When this image would be ready I'll move my druid Chart to it. helm/charts#9314

Some environments (e.g. Kubernetes Deployments) don't resolve
hostname to IP.
The 32-bit uclibc busybox does not support 64-bit inodes
(see GoogleContainerTools/distroless#225)

Signed-off-by: Don Bowman <don@agilicus.com>
Signed-off-by: Don Bowman <don@agilicus.com>
@donbowman
Copy link
Contributor Author

there have been no comments on the mailing list.

@donbowman
Copy link
Contributor Author

I'm not sure what else is needed of me here.
No comments received.
I've made the requested changes.

distribution/docker/Dockerfile Outdated Show resolved Hide resolved
distribution/docker/Dockerfile Show resolved Hide resolved
@drcrallen
Copy link
Contributor

for posterity there is a discussion on the incubator list about docker releases (only tangentially related)

http://mail-archives.apache.org/mod_mbox/incubator-general/201902.mbox/%3CCAGaRif1NXV4ZmakmR3bhGS-UvkOf%3DmJJg3HOKCTEKtaYBW3NDQ%40mail.gmail.com%3E

and the discussion on the druid list seems to have no specific objections and pretty good counter examples:

https://lists.apache.org/thread.html/ed1483e138685207c45a65b22ecc5e83a0236edd285aeb393318b052@%3Cdev.druid.apache.org%3E

@drcrallen
Copy link
Contributor

team city issues look unrelated

@donbowman
Copy link
Contributor Author

donbowman commented Feb 7, 2019

I have added a sample docker-compose and environment file. This allows ~30s time to get a new local environment up.

This completes the dev, will address any issues.

Signed-off-by: Don Bowman <don@agilicus.com>
This works around issue apache#3770
apache#3770

Signed-off-by: Don Bowman <don@agilicus.com>
Signed-off-by: Don Bowman <don@agilicus.com>
@donbowman donbowman closed this Feb 7, 2019
@donbowman donbowman reopened this Feb 7, 2019
@donbowman donbowman closed this Feb 7, 2019
@donbowman donbowman reopened this Feb 7, 2019
@dylwylie dylwylie merged commit b3dcbe7 into apache:master Feb 8, 2019
@maver1ck
Copy link
Contributor

maver1ck commented Feb 8, 2019

Are we planning to automatically build those images and push to dockerhub ?

@donbowman
Copy link
Contributor Author

@maver1ck see discussion on dev@druid.apache.org.

My vote is obviously yes.

@rae89
Copy link

rae89 commented Mar 25, 2019

I am having issues building the docker image using the command in the docker readme: docker build -t druid:0.14.0 -f distribution/docker/Dockerfile .

Here is the build failure output:

[INFO] druid-time-min-max ................................. SUCCESS [ 2.133 s]
[INFO] druid-google-extensions ............................ SUCCESS [ 6.508 s]
[INFO] druid-virtual-columns .............................. SUCCESS [ 2.119 s]
[INFO] druid-thrift-extensions ............................ SUCCESS [ 9.566 s]
[INFO] ambari-metrics-emitter ............................. FAILURE [ 2.370 s]
[INFO] sqlserver-metadata-storage ......................... SKIPPED
[INFO] kafka-emitter ...................................... SKIPPED
[INFO] druid-redis-cache .................................. SKIPPED
[INFO] druid-opentsdb-emitter ............................. SKIPPED
[INFO] materialized-view-maintenance ...................... SKIPPED
[INFO] materialized-view-selection ........................ SKIPPED
[INFO] druid-momentsketch ................................. SKIPPED
[INFO] distribution ....................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:02 min
[INFO] Finished at: 2019-03-25T23:08:02Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project ambari-metrics-emitter: Could not resolve dependencies for project org.apache.druid.extensions.contrib:ambari-metrics-emitter:jar:0.15.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.ambari:ambari-metrics-common:jar:2.4.1.0.22 from/to hortonworks (http://repo.hortonworks.com/content/repositories/releases): Failed to transfer file http://repo.hortonworks.com/content/repositories/releases/org/apache/ambari/ambari-metrics-common/2.4.1.0.22/ambari-metrics-common-2.4.1.0.22.jar with status code 503 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :ambari-metrics-emitter
The command '/bin/sh -c mvn install -ff -DskipTests -Dforbiddenapis.skip=true -Pdist -Pbundle-contrib-exts' returned a non-zero code: 1

@donbowman
Copy link
Contributor Author

donbowman commented Mar 26, 2019 via email

@rae89
Copy link

rae89 commented Mar 26, 2019

I still get the error

[ERROR] Failed to execute goal on project ambari-metrics-emitter: Could not resolve dependencies for project org.apache.druid.extensions.contrib:ambari-metrics-emitter:jar:0.15.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.ambari:ambari-metrics-common:jar:2.4.1.0.22 from/to hortonworks (http://repo.hortonworks.com/content/repositories/releases): Failed to transfer file http://repo.hortonworks.com/content/repositories/releases/org/apache/ambari/ambari-metrics-common/2.4.1.0.22/ambari-metrics-common-2.4.1.0.22.jar with status code 503

I also posted on here since it might be related that closed issue:
#4537

@donbowman
Copy link
Contributor Author

I still get the error

[ERROR] Failed to execute goal on project ambari-metrics-emitter: Could not resolve dependencies for project org.apache.druid.extensions.contrib:ambari-metrics-emitter:jar:0.15.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.ambari:ambari-metrics-common:jar:2.4.1.0.22 from/to hortonworks (http://repo.hortonworks.com/content/repositories/releases): Failed to transfer file http://repo.hortonworks.com/content/repositories/releases/org/apache/ambari/ambari-metrics-common/2.4.1.0.22/ambari-metrics-common-2.4.1.0.22.jar with status code 503

I also posted on here since it might be related that closed issue:
#4537

is it possible you have a proxy or firewall in the path?
I can download the referenced file w/o the 503 you get.
maybe a cache?

@donbowman donbowman deleted the add-container-build branch March 28, 2019 01:04
@rae89
Copy link

rae89 commented Mar 28, 2019

After finally getting it to build and run, I was wondering how one goes about loading the wikipedia tutorial data described in the loading a file tutorial from the druid docs? http://druid.io/docs/latest/tutorials/tutorial-batch.html

The overlord container does not seem to have bash and python installed to be able to run the script bin/post-index-task.

Or am I missing something here?

@donbowman
Copy link
Contributor Author

you run those from your host, outside the container, pointing @ the ports you exposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants