SparkCluster

Docker configuration for spark cluster

Overview

This Docker container contains a full Spark distribution with the following components:

Oracle JDK 8
Hadoop 2.7.5
Scala 2.11.12
Spark 2.2.1

It also includes the Apache Toree installation.

Docker Swarm

A docker-compose.yml file is provided to run the spark-cluster in the Docker Swarm environment

Usage

Type the following commands to run the stack provided with the docker-compose.yml. It contains a spark master service and a worker instance.

docker network create -d overlay --attachable --scope swarm core  
docker stack deploy -c docker-compose.yml <stack-name>

Multi-host swarm

To run the stack in cluster mode, create the swarm before creating the overlay network.
Otherwise the stack will deployed in a single swarm node --- the manager.

To stop the container type:

docker stack rm <stack-name>

Scaling

If you need more worker instances, consider to scale the number of instances by typing the following command:

docker service scale <stack-name>_worker=<num_of_task>

Data & Code

If you need to inject data and code into the containers use data and code volumes respectively in /home/data and /home/code.

Toree

Apache Toree notebook is already built, to launch a spark notebook follow the following commands:

docker exec -it <stack-name>_master.<id> bash
SPARK_OPTS='--master=spark://master:7077' jupyter notebook --ip 0.0.0.0 --allow-root

The last command allows the notebook to execute jobs in cluster mode rather than in local mode.

Apache Toree includes SparkR, PySpark, Spark Scala and SQL.

TODOs

Separating Jupyter notebook into a different

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
hadoop		hadoop
old		old
spark		spark
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkCluster

Table of contents

Overview

Docker Swarm

Usage

Multi-host swarm

Scaling

Data & Code

Toree

TODOs

About

Releases

Packages

Languages

DanielMorales9/SparkCluster

Folders and files

Latest commit

History

Repository files navigation

SparkCluster

Table of contents

Overview

Docker Swarm

Usage

Multi-host swarm

Scaling

Data & Code

Toree

TODOs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages