This repo dockerizes the Streamsets tutorials. https://github.com/streamsets/tutorials. The tutorials have been used as starting point for a major project.
Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
build-es-index
images
streamsets
.gitignore
consumer.json
docker-compose.yml
producer.json
readme.md

readme.md

Dockerizing Streamsets tutorials

This branch takes care of setting up the Streamsets datacollector tutorial 2 via Docker.

Requirements:

  • Docker
  • Docker compose
  • Streamsets sample data. Place it into the folder: streamsets/data/tutorial_data

Instructions:

Once this repo has been cloned and sample data has been downloaded, open your command line and initialize the docker containers using: $ docker-compose up. This can take a while.

Once the containers are up and running import.

1.-Import the producer pipeline and consumer pipeline into streamsets by going to http://localhost:18630

You should see the following pipelines (Filesystem was used instead of AWS S3)

consumer producer

Preview or Start the pipelines right away.

start-pipeline

Access Kibana via http://localhost:5601.

This is what the folder structure should look like after including the necesary data and executing the pipeline:

.
├── build
│   ├── Dockerfile
│   └── start.sh
├── consumer.json
├── docker-compose.yml
├── images
│   ├── ...
├── producer.json
├── readme.md
├── streamsets
│   ├── Dockerfile
│   ├── data
│       ├── pipelines
│       │   └── ...
│       ├── runInfo
│       │   └── ...
│       ├── sdc.id
│       └── tutorial_data
│           ├── ccsample