Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-2883 Provide means to run server as a persistent service in Docker container #691

Closed
wants to merge 13 commits into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jun 28, 2021

This is a WIP for https://issues.apache.org/jira/browse/NUTCH-2883 feedback would be appreciated.

@lewismc lewismc marked this pull request as draft June 28, 2021 03:26
Copy link
Contributor

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (but haven't tried it).

docker/Dockerfile Outdated Show resolved Hide resolved
@lewismc
Copy link
Member Author

lewismc commented Jun 28, 2021

Something I would like feedback on is whether we want to run startserver and/or webapp by default? That is the behaviour implemented in this PR.

I haven't found a way to make this conditional i.e. if we were to build the image as follows

docker build -t apache/nutch . --build-arg startserver=true --build-arg webapp=true

Then we would expose the ports (these could also be configurable) by implementing --build-arg server_port=XXXX --build-arg webapp_port=XXXX and also activate the ENTRYPOINT for supervisord.

Also, it's important for me to mention that I am aware of the one-service-one-container rule. The reason I chose to try supervisord in this case was that these services are entirely complimentary... I didn't see the reason to complicate things further by using docker-compose.

@lewismc lewismc self-assigned this Jul 1, 2021
@lewismc lewismc changed the title WIP NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container Jul 1, 2021
@lewismc lewismc marked this pull request as ready for review July 1, 2021 05:51
@lewismc
Copy link
Member Author

lewismc commented Jul 1, 2021

OK this PR is ready for full review. I've added in the conditional logic as explained in the README.
There are three build modes which can be activated using the --build-arg MODE=0 flag. All values used here are defaults.

  • 0 == Nutch master branch source install with crawl and nutch scripts on $PATH
  • 1 == Same as mode 0 with addition of Nutch REST Server; additional build args --build-arg SERVER_PORT=8081 and --build-arg SERVER_HOST=0.0.0.0
  • 2 == Same as mode 1 with addition of Nutch WebApp; additional build args --build-arg WEBAPP_PORT=8080

I augmented a the documentation so the above is clear. I also showed how we can browse container logs for server and webapp if they are running.

Copy link
Contributor

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lewis,

see the comments regarding the supervisord log and pid files.

Good solution with the build modes.

For now the server/webapp does not start, looks like the environment variables setting the ports are not visible to the supervisor and then cannot be passed to its children resp. the build args are not visible when the environment variables are defined:

docker build -t apache/nutch . --build-arg BUILD_MODE=2 --build-arg SERVER_PORT=8081 --build-arg SERVER_HOST=0.0.0.0 --build-arg WEBAPP_PORT=8080

docker run -t -i -d -p 8080:8080 -p 8081:8081 --name nutchcontainer apache/nutch

docker exec nutchcontainer /bin/bash

bash-5.1# tail /var/log/supervisord/*
==> /var/log/supervisord/nutchserver_stderr.log <==
        at java.base/java.lang.Integer.parseInt(Integer.java:614)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.service.NutchServer.main(NutchServer.java:186)Exception in thread "main" java.lang.NumberFormatException: null
        at java.base/java.lang.Integer.parseInt(Integer.java:614)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.service.NutchServer.main(NutchServer.java:186)
Exception in thread "main" java.lang.NumberFormatException: null
        at java.base/java.lang.Integer.parseInt(Integer.java:614)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.service.NutchServer.main(NutchServer.java:186)

==> /var/log/supervisord/nutchserver_stdout.log <==

==> /var/log/supervisord/nutchwebapp_stderr.log <==
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.webui.NutchUiServer.main(NutchUiServer.java:63)
Exception in thread "main" java.lang.NumberFormatException: null
        at java.base/java.lang.Integer.parseInt(Integer.java:614)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.webui.NutchUiServer.main(NutchUiServer.java:63)
Exception in thread "main" java.lang.NumberFormatException: null
        at java.base/java.lang.Integer.parseInt(Integer.java:614)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.nutch.webui.NutchUiServer.main(NutchUiServer.java:63)

==> /var/log/supervisord/nutchwebapp_stdout.log <==

bash-5.1# cat /proc/1/environ | tr '\000' '\n'
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=47570450a581
TERM=xterm
JAVA_HOME=/usr/lib/jvm/java-11-openjdk
NUTCH_HOME=/root/nutch_source/runtime/local
SERVER_PORT=
SERVER_HOST=
WEBAPP_PORT=
HOME=/root

docker/config/supervisord_startserver.conf Outdated Show resolved Hide resolved
docker/config/supervisord_startserver.conf Outdated Show resolved Hide resolved
@lewismc
Copy link
Member Author

lewismc commented Jul 8, 2021

@sebastian-nagel here's what I tried. First prune entire Docker development cache

docker system prune -a
...
Total reclaimed space: 14.1GB

The build

docker % docker build -t apache/nutch . --build-arg BUILD_MODE=2 --build-arg SERVER_PORT=8081 --build-arg SERVER_HOST=0.0.0.0 --build-arg WEBAPP_PORT=8080
...
[+] Building 743.6s (17/17) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                     0.0s
 => => transferring dockerfile: 4.42kB                                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/library/alpine:3.13                                                                                                                                                                           1.4s
 => [base 1/8] FROM docker.io/library/alpine:3.13@sha256:f51ff2d96627690d62fee79e6eecd9fa87429a38142b5df8a3bfbb26061df7fc                                                                                                                0.6s
 => => resolve docker.io/library/alpine:3.13@sha256:f51ff2d96627690d62fee79e6eecd9fa87429a38142b5df8a3bfbb26061df7fc                                                                                                                     0.0s
 => => sha256:f51ff2d96627690d62fee79e6eecd9fa87429a38142b5df8a3bfbb26061df7fc 1.64kB / 1.64kB                                                                                                                                           0.0s
 => => sha256:def822f9851ca422481ec6fee59a9966f12b351c62ccb9aca841526ffaa9f748 528B / 528B                                                                                                                                               0.0s
 => => sha256:6dbb9cc54074106d46d4ccb330f2a40a682d49dda5f4844962b7dce9fe44aaec 1.47kB / 1.47kB                                                                                                                                           0.0s
 => => sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba 2.81MB / 2.81MB                                                                                                                                           0.4s
 => => extracting sha256:540db60ca9383eac9e418f78490994d0af424aab7bf6d0e47ac8ed4e2e9bcbba                                                                                                                                                0.2s
 => [internal] load build context                                                                                                                                                                                                        0.0s
 => => transferring context: 2.74kB                                                                                                                                                                                                      0.0s
 => [base 2/8] WORKDIR /root/                                                                                                                                                                                                            0.0s
 => [base 3/8] RUN apk update                                                                                                                                                                                                            1.0s
 => [base 4/8] RUN apk --no-cache add apache-ant bash git openjdk11 supervisor                                                                                                                                                          18.6s
 => [base 5/8] RUN echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk' >> $HOME/.bashrc                                                                                                                                                 0.2s
 => [base 6/8] RUN git clone https://github.com/apache/nutch.git nutch_source &&      cd nutch_source &&      ant runtime &&      rm -rf build/ &&      rm -rf /root/.ivy2/                                                            716.3s
 => [base 7/8] RUN ln -sf /root/nutch_source/runtime/local/bin/nutch /usr/local/bin/                                                                                                                                                     0.3s
 => [base 8/8] RUN ln -sf /root/nutch_source/runtime/local/bin/crawl /usr/local/bin/                                                                                                                                                     0.2s
 => [branch-version-2 1/3] RUN echo "Nutch master branch source install with 'crawl' and 'nutch' scripts on PATH, Nutch REST Server on 0.0.0.0:8081 and WebApp on this container port 8080"                                              0.3s
 => [branch-version-2 2/3] RUN mkdir -p /var/log/supervisord                                                                                                                                                                             0.3s
 => [branch-version-2 3/3] COPY ./config/supervisord_startserver_webapp.conf /etc/supervisord.conf                                                                                                                                       0.0s
 => [final 1/1] RUN echo "Successfully built image, see https://s.apache.org/m5933 for guidance on running a container instance."                                                                                                        0.2s
 => exporting to image                                                                                                                                                                                                                   4.0s
 => => exporting layers                                                                                                                                                                                                                  4.0s
 => => writing image sha256:0b526f0c7743d74fa9afa1db86956e1f2d88974f460bd64d94b6492bfc8b65de                                                                                                                                             0.0s
 => => naming to docker.io/apache/nutch

Note the above output where the correct supervisord file is being COPYd and the REST server and WebApp details are all consistent.
I then run

docker run -t -i -d -p 8080:8080 -p 8081:8081 --name nutchcontainer apache/nutch

and curl the REST endpoint

curl http://localhost:8081/admin
{"startDate":1625713304457,"configuration":["default"],"jobs":[],"runningJobs":[]}

I can also reach the WebApp on 8081.
I can exec into contained as follows

docker exec -it 1cda988617a6 /bin/bash
...
bash-5.1# tail /var/log/supervisord/*
==> /var/log/supervisord/nutchserver_stderr.log <==

==> /var/log/supervisord/nutchserver_stdout.log <==
Starting NutchServer on 0.0.0.0:8081  ...
Started Nutch Server on 0.0.0.0:8081 at 1625713304457

==> /var/log/supervisord/nutchwebapp_stderr.log <==
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.springframework.cglib.core.ReflectUtils$2 (file:/root/nutch_source/runtime/local/lib/spring-core-4.0.9.RELEASE.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of org.springframework.cglib.core.ReflectUtils$2
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

==> /var/log/supervisord/nutchwebapp_stdout.log <==

==> /var/log/supervisord/supervisord.log <==
2021-07-08 03:01:41,515 INFO Set uid to user 0 succeeded
2021-07-08 03:01:41,517 INFO supervisord started with pid 1
2021-07-08 03:01:42,500 INFO spawned: 'nutchserver' with pid 8
2021-07-08 03:01:42,503 INFO spawned: 'nutchwebapp' with pid 9
2021-07-08 03:01:43,505 INFO success: nutchserver entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2021-07-08 03:01:43,506 INFO success: nutchwebapp entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

==> /var/log/supervisord/supervisord.pid <==
1

All is good...

@sebastian-nagel
Copy link
Contributor

Hi @lewismc! Sorry, it took longer to get back...

Finally found, why server and webapp didn't work (docker prune didn't help):

Another point: after the webapp was moved into separate repository (see NUTCH-2886), it needs to be installed separately in the Dockerfile. Right now it fails with:

==> /var/log/supervisord/nutchwebapp_stderr.log <==
Error: Could not find or load main class webapp
Caused by: java.lang.ClassNotFoundException: webapp

@lewismc
Copy link
Member Author

lewismc commented Nov 24, 2021

I'll update this to remove the webapp.

@lewismc lewismc changed the title NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container NUTCH-2883 Provide means to run server as a persistent service in Docker container Nov 24, 2021
@sebastian-nagel
Copy link
Contributor

Hi @lewismc, all your changes have been integrated into #748 which is now merged. Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants