archivesunleashed · sepastian · Feb 24, 2020 · ruebot · Feb 24, 2020 · sepastian
diff --git a/Dockerfile b/Dockerfile
@@ -11,9 +11,10 @@ LABEL website="http://archivesunleashed.org/"
 ARG SPARK_VERSION=2.4.5
 
 # Git and Wget
-RUN apk add --update \
+RUN apk --no-cache --virtual build-dependencies add --update \
     git \
-    wget
+    wget \
+    && apk add --update python
 
 # Sample resources
 RUN git clone https://github.com/archivesunleashed/aut-resources.git
@@ -31,4 +32,7 @@ RUN mkdir /spark \
     && tar -xf "/tmp/spark-$SPARK_VERSION-bin-hadoop2.7.tgz" -C /spark --strip-components=1 \
     && rm "/tmp/spark-$SPARK_VERSION-bin-hadoop2.7.tgz"
 
+# Cleanup package manager
+RUN apk del build-dependencies
+
 CMD /spark/bin/spark-shell --packages "io.archivesunleashed:aut:0.50.1-SNAPSHOT"
diff --git a/README.md b/README.md
@@ -45,13 +45,13 @@ You can also build this Docker image locally with the following steps:
 
 ### Overrides
 
-You can add any Spark flags to the build if you need too.
+You can add any Spark flags when starting the container, if you need too.
 
 ```
 $ docker run --rm -it archivesunleashed/docker-aut:0.17.0 /spark/bin/spark-shell --packages "io.archivesunleashed:aut:0.17.0" --conf spark.network.timeout=100000000 --conf spark.executor.heartbeatInterval=6000s
 ```
 
-Once the build finishes, you should see:
+Once the container has started, you should see:
 
 ```bash
 $ docker run --rm -it aut
@@ -64,7 +64,7 @@ Welcome to
     _\ \/ _ \/ _ `/ __/  '_/
    /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
       /_/
-         
+
 Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
 Type in expressions to have them evaluated.
 Type :help for more information.
@@ -73,6 +73,16 @@ scala>
 
 ```
 
+### Python - PySpark
+
+You can start a Python shell (pyspark) with the following command:
+
+``` shell
+docker run --rm -it -v "$(pwd)/your/datadir:/data" aut /spark/bin/pyspark --py-files /aut/target/aut.zip --jars /aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar
+```
+
+See [the official documentation](https://github.com/archivesunleashed/aut-docs/tree/master/current#the-archives-unleashed-toolkit-latest-documentation) for usage examples in Python.
+
 ## Example
 
 When the image is running, you will be brought to the Spark Shell interface. Try running the following command.