SOLR-14789: Absorb the docker-solr repo. (#1769)

apache · Sep 14, 2020 · 86c633d · 86c633d
1 parent 7a8da0a
commit 86c633d
Show file tree

Hide file tree

Showing 51 changed files with 2,860 additions and 0 deletions.
diff --git a/build.gradle b/build.gradle
@@ -25,6 +25,7 @@ plugins {
   id 'de.thetaphi.forbiddenapis' version '3.0.1' apply false
   id "de.undercouch.download" version "4.0.2" apply false
   id "net.ltgt.errorprone" version "1.2.1" apply false
+  id "com.palantir.docker" version "0.25.0" apply false
 }
 
 apply from: file('gradle/defaults.gradle')

diff --git a/settings.gradle b/settings.gradle
@@ -70,3 +70,5 @@ include "solr:solr-ref-guide"
 include "solr:example"
 
 include "solr:packaging"
+include "solr:docker"
+include "solr:docker:package"
diff --git a/solr/docker/Docker-FAQ.md b/solr/docker/Docker-FAQ.md
@@ -0,0 +1,320 @@
+
+Docker Solr FAQ
+===============
+
+
+How do I persist Solr data and config?
+--------------------------------------
+
+Your data is persisted already, in your container's filesystem.
+If you `docker run`, add data to Solr, then `docker stop` and later
+`docker start`, then your data is still there. The same is true for
+changes to configuration files.
+
+Equally, if you `docker commit` your container, you can later create a new
+container from that image, and that will have your data in it.
+
+For some use-cases it is convenient to provide a modified `solr.in.sh` file to Solr.
+For example to point Solr to a ZooKeeper host:
+
+```
+docker create --name my_solr -P solr
+docker cp my_solr:/opt/solr/bin/solr.in.sh .
+sed -i -e 's/#ZK_HOST=.*/ZK_HOST=cylon.lan:2181/' solr.in.sh
+docker cp solr.in.sh my_solr:/opt/solr/bin/solr.in.sh
+docker start my_solr
+# With a browser go to http://cylon.lan:32873/solr/#/ and confirm "-DzkHost=cylon.lan:2181" in the JVM Args section.
+```
+
+But usually when people ask this question, what they are after is a way
+to store Solr data and config in a separate [Docker Volume](https://docs.docker.com/userguide/dockervolumes/).
+That is explained in the next two questions.
+
+
+How can I mount a host directory as a data volume?
+--------------------------------------------------
+
+This is useful if you want to inspect or modify the data in the Docker host
+when the container is not running, and later easily run new containers against that data.
+This is indeed possible, but there are a few gotchas.
+
+Solr stores its core data in the `server/solr` directory, in sub-directories
+for each core. The `server/solr` directory also contains configuration files
+that are part of the Solr distribution.
+Now, if we mounted volumes for each core individually, then that would
+interfere with Solr trying to create those directories. If instead we make
+the whole directory a volume, then we need to provide those configuration files
+in our volume, which we can do by copying them from a temporary container.
+For example:
+
+```
+# create a directory to store the server/solr directory
+$ mkdir /home/docker-volumes/mysolr1
+
+# make sure its host owner matches the container's solr user
+$ sudo chown 8983:8983 /home/docker-volumes/mysolr1
+
+# copy the solr directory from a temporary container to the volume
+$ docker run -it --rm -v /home/docker-volumes/mysolr1:/target apache/solr cp -r server/solr /target/
+
+# pass the solr directory to a new container running solr
+$ SOLR_CONTAINER=$(docker run -d -P -v /home/docker-volumes/mysolr1/solr:/opt/solr/server/solr apache/solr)
+
+# create a new core
+$ docker exec -it --user=solr $SOLR_CONTAINER solr create_core -c gettingstarted
+
+# check the volume on the host:
+$ ls /home/docker-volumes/mysolr1/solr/
+configsets  gettingstarted  README.txt  solr.xml  zoo.cfg
+```
+
+Note that if you add or modify files in that directory from the host, you must `chown 8983:8983` them.
+
+
+How can I use a Data Volume Container?
+--------------------------------------
+
+You can avoid the concerns about UID mismatches above, by using data volumes only from containers.
+You can create a container with a volume, then point future containers at that same volume.
+This can be handy if you want to modify the solr image, for example if you want to add a program.
+By separating the data and the code, you can change the code and re-use the data.
+
+But there are pitfalls:
+
+- if you remove the container that owns the volume, then you lose your data.
+  Docker does not even warn you that a running container is dependent on it.
+- if you point multiple solr containers at the same volume, you will have multiple instances
+  write to the same files, which will undoubtedly lead to corruption
+- if you do want to remove that volume, you must do `docker rm -v containername`;
+  if you forget the `-v` there will be a dangling volume which you can not easily clean up.
+
+Here is an example:
+
+```
+# create a container with a volume on the path that solr uses to store data.
+docker create -v /opt/solr/server/solr --name mysolr1data solr /bin/true
+
+# pass the volume to a new container running solr
+SOLR_CONTAINER=$(docker run -d -P --volumes-from=mysolr1data apache/solr)
+
+# create a new core
+$ docker exec -it --user=solr $SOLR_CONTAINER solr create_core -c gettingstarted
+
+# make a change to the config, using the config API
+docker exec -it --user=solr $SOLR_CONTAINER curl http://localhost:8983/solr/gettingstarted/config -H 'Content-type:application/json' -d'{
+    "set-property" : {"query.filterCache.autowarmCount":1000},
+    "unset-property" :"query.filterCache.size"}'
+
+# verify the change took effect
+docker exec -it --user=solr $SOLR_CONTAINER curl http://localhost:8983/solr/gettingstarted/config/overlay?omitHeader=true
+
+# stop the solr container
+docker exec -it --user=solr $SOLR_CONTAINER bash -c 'cd server; java -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -jar start.jar --stop'
+
+# create a new container
+SOLR_CONTAINER=$(docker run -d -P --volumes-from=mysolr1data apache/solr)
+
+# check our core is still there:
+docker exec -it --user=solr $SOLR_CONTAINER ls server/solr/gettingstarted
+
+# check the config modification is still there:
+docker exec -it --user=solr $SOLR_CONTAINER curl http://localhost:8983/solr/gettingstarted/config/overlay?omitHeader=true
+```
+
+
+Can I use volumes with SOLR_HOME?
+---------------------------------
+
+Solr supports a SOLR_HOME environment variable to point to a non-standard location of the Solr home directory.
+You can use this in Solr docker, in combination with volumes:
+
+```
+docker run -it -v $PWD/mysolrhome:/mysolrhome -e SOLR_HOME=/mysolrhome apache/solr
+```
+
+This does need a pre-configured directory at that location.
+
+To make this easier, Solr docker supports a INIT_SOLR_HOME setting, which copies the contents
+from the default directory in the image to the SOLR_HOME (if it is empty).
+
+```
+mkdir mysolrhome
+sudo chown 8983:8983 mysolrhome
+docker run -it -v $PWD/mysolrhome:/mysolrhome -e SOLR_HOME=/mysolrhome -e INIT_SOLR_HOME=yes apache/solr
+```
+
+Note: If SOLR_HOME is set, the "solr-precreate" command will put the created core in the SOLR_HOME directory
+rather than the "mycores" directory.
+
+
+Can I run ZooKeeper and Solr clusters under Docker?
+---------------------------------------------------
+
+At the network level the ZooKeeper nodes need to be able to talk to eachother,
+and the Solr nodes need to be able to talk to the ZooKeeper nodes and to each other.
+At the application level, different nodes need to be able to identify and locate each other.
+In ZooKeeper that is done with a configuration file that lists hostnames or IP addresses for each node.
+In Solr that is done with a parameter that specifies a host or IP address, which is then stored in ZooKeeper.
+
+In typical clusters, those hostnames/IP addresses are pre-defined and remain static through the lifetime of the cluster.
+In Docker, inter-container communication and multi-host networking can be facilitated by [Docker Networks](https://docs.docker.com/engine/userguide/networking/).
+But, crucially, Docker does not normally guarantee that IP addresses of containers remain static during the lifetime of a container.
+In non-networked Docker, the IP address seems to change everytime you stop/start.
+In a networked Docker, containers can lose their IP address in certain sequences of starting/stopping, unless you take steps to prevent that.
+
+IP changes causes problems:
+
+- If you use hardcoded IP addresses in configuration, and the addresses of your containers change after a stops/start, then your cluster will stop working and may corrupt itself.
+- If you use hostnames in configuration, and the addresses of your containers change, then you might run into problems with cached hostname lookups.
+- And if you use hostnames there is another problem: the names are not defined until the respective container is running,
+So when for example the first ZooKeeper node starts up, it will attempt a hostname lookup for the other nodes, and that will fail.
+This is especially a problem for ZooKeeper 3.4.6; future versions are better at recovering.
+
+Docker 1.10 has a new `--ip` configuration option that allows you to specify an IP address for a container.
+It also has a `--ip-range` option that allows you to specify the range that other containers get addresses from.
+Used together, you can implement static addresses. See [this example](docs/docker-networking.md).
+
+
+Can I run ZooKeeper and Solr with Docker Links?
+-----------------------------------------------
+
+Docker's [Legacy container links](https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/) provide a way to
+pass connection configuration between containers. It only works on a single machine, on the default bridge.
+It provides no facilities for static IPs.
+Note: this feature is expected to be deprecated and removed in a future release.
+So really, see the "Can I run ZooKeeper and Solr clusters under Docker?" option above instead.
+
+But for some use-cases, such as quick demos or one-shot automated testing, it can be convenient.
+
+Run ZooKeeper, and define a name so we can link to it:
+
+```console
+$ docker run --name zookeeper -d -p 2181:2181 -p 2888:2888 -p 3888:3888 jplock/zookeeper
+```
+
+Run two Solr nodes, linked to the zookeeper container:
+
+```console
+$ docker run --name solr1 --link zookeeper:ZK -d -p 8983:8983 \
+      apache/solr \
+      bash -c 'solr start -f -z $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT'
+
+$ docker run --name solr2 --link zookeeper:ZK -d -p 8984:8983 \
+      apache/solr \
+      bash -c 'solr start -f -z $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT'
+```
+
+Create a collection:
+
+```console
+$ docker exec -i -t solr1 solr create_collection \
+        -c gettingstarted -shards 2 -p 8983
+```
+
+Then go to `http://localhost:8983/solr/#/~cloud` (adjust the hostname for your docker host) to see the two shards and Solr nodes.
+
+
+How can I run ZooKeeper and Solr with Docker Compose?
+-----------------------------------------------------
+
+See the [docker compose example](docs/docker-compose.yml).
+
+
+I'm confused about the different invocations of solr -- help?
+-------------------------------------------------------------
+
+The different invocations of the Solr docker image can look confusing, because the name of the
+image is "apache/solr" and the Solr command is also "solr", and the image interprets various arguments in
+special ways. I'll illustrate the various invocations:
+
+
+To run an arbitrary command in the image:
+
+```
+docker run -it apache/solr date
+```
+
+here "apache/solr" is the name of the image, and "date" is the command.
+This does not invoke any solr functionality.
+
+
+To run the Solr server:
+
+```
+docker run -it apache/solr
+```
+
+Here "apache/solr" is the name of the image, and there is no specific command,
+so the image defaults to run the "solr" command with "-f" to run it in the foreground.
+
+
+To run the Solr server with extra arguments:
+
+```
+docker run -it apache/solr -h myhostname
+```
+
+This is the same as the previous one, but an additional argument is passed.
+The image will run the "solr" command with "-f -h myhostname"
+
+To run solr as an arbitrary command:
+
+```
+docker run -it apache/solr solr zk --help
+```
+
+here the first "apache/solr" is the image name, and the second "solr"
+is the "solr" command. The image runs the command exactly as specified;
+no "-f" is implicitly added. The container will print help text, and exit.
+
+If you find this visually confusing, it might be helpful to use more specific image tags,
+and specific command paths. For example:
+
+```
+docker run -it apache/solr:6 bin/solr -f -h myhostname
+```
+
+Finally, the Solr docker image offers several commands that do some work before
+then invoking the Solr server, like "solr-precreate" and "solr-demo".
+See the README.md for usage.
+These are implemented by the `docker-entrypoint.sh` script, and must be passed
+as the first argument to the image. For example:
+
+```
+docker run -it apache/solr:6 solr-demo
+```
+
+It's important to understand an implementation detail here. The Dockerfile uses
+`solr-foreground` as the `CMD`, and the `docker-entrypoint.sh` implements
+that by by running "solr -f". So these two are equivalent:
+
+```
+docker run -it apache/solr:6
+docker run -it apache/solr:6 solr-foreground
+```
+
+whereas:
+
+```
+docker run -it apache/solr:6 solr -f
+```
+
+is slightly different: the "solr" there is a generic command, not treated in any
+special way by `docker-entrypoint.sh`. In particular, this means that the
+`docker-entrypoint-initdb.d` mechanism is not applied.
+So, if you want to use `docker-entrypoint-initdb.d`, then you must use one
+of the other two invocations.
+You also need to keep that in mind when you want to invoke solr from the bash
+command. For example, this does NOT run `docker-entrypoint-initdb.d` scripts:
+
+```
+docker run -it -v $PWD/set-heap.sh:/docker-entrypoint-initdb.d/set-heap.sh \
+    apache/solr:6 bash -c "echo hello; solr -f"
+```
+
+but this does:
+
+```
+docker run -it $PWD/set-heap.sh:/docker-entrypoint-initdb.d/set-heap.sh \
+    apache/solr:6 bash -c "echo hello; /opt/docker-solr/scripts/docker-entrypoint.sh solr-foreground"
+```