Skip to content

codiply/big-data-playground

Repository files navigation

Big Data Playground

This is a personal repo for experimenting with big data technologies including

Docker containers

Prerequisites

You will need to install

Build the docker images

You can build all docker images in advance by running

scripts/docker-compose/build-all

Run the containers

The following containers are defined in groups in docker/compose/docker-compose.{group}.yml.

To start a group of containers run the corresponding script

scripts/docker-compose/{group}-up

To remove the containers for a group run the corresponding script

scripts/docker-compose/{group}-down

List of containers

Aerospike

  • aerospike-1
  • aeropsike-2
  • aeropsike-2

Apache Cassandra cluster

  • cassandra-1
  • cassandra-2
  • cassandra-3

Graphite - Grafana

  • graphite

Jupyter

  • jupyter

Apache Kafka cluster

  • kafka-1
  • kafka-2
  • kafka-3
  • zookeeper-1
  • zookeeper-2
  • zookeeper-3

Probe

  • probe

Used for running applications from the command line from within the docker network.

PostgreSQL

  • postgres

Apache Spark cluster

  • spark-master
  • spark-worker (scaled to 3 workers in spark-up script)

Superset

  • superset

Apache Zeppelin

  • zeppelin

Addresses on the host

Whenever suitable I have bound ports to the host so that for example access to various UI's is possible.

Access Spark workers

There are links on the Spark Cluster UI that lead you to spark workers.

To access these from the host you will need to install sshuttle and run the scripts/ssh/sshuttle-via-probe script. The password is root.

Using the probe

  • Start the probe container
  • SSH to it using the scripts/ssh/ssh-probe script. The password is root.
  • Use command line tools and access other containers using their hostname

Run an application/main written in Scala

Prerequisites

  • Install sbt (Scala Built Tool).

Run them on Probe container

  • Build a fat jar big-data-playground.jar with sbt assembly. This is placed under /target/big-data-playground/. (Directory /target/big-data-playground/ is mount at /playground/on theprobe` container.)
  • SSH to the probe container (see above)
  • Run a main with java -cp /playground/big-data-playground.jar com.codiply.bdpg.SomeClassWithMain

Inspect a Kafka topic

SSH to the probe container and then inspect a topic with kafkacat

kafkacat -C -b kafka-1 -t topic-name -f 'Topic %t[%p], offset: %o, key: %k, payload: %S bytes: %s\n'

replacing topic-name with the actual topic name.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published