Splunk Lab

This project lets you stand up a Splunk instance in Docker on a quick and dirty basis.

But what is Splunk? Splunk is a platform for big data collection and analytics. You feed your events from syslog, webserver logs, or application logs into Splunk, and can use queries to extract meaningful insights from that data.

Quick Start!

Paste either of these on the command line:

bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-lab/master/go.sh)

bash <(curl -Ls https://bit.ly/splunklab)

...and the script will print up what directory it will ingest logs from, your password, etc. Follow the on-screen instructions for setting environment variables and you'll be up and running in no time! Whatever logs you had sitting in your logs/ directory will be searchable in Splunk with the search index=main.

If you want to see neat things you can do in Splunk Lab, check out the Cookbook section.

Also, the script will craete a directory called bin/ with some helper scripts in it. Be sure to check them out!

Useful links after starting

https://localhost:8000/ - Default port to log into the local instance. Username is admin, password is what was set when starting Splunk Lab.
Splunk Dashboard Examples - Wanna see what you can do with Splunk? Here are some example dashboards.

Features

App databoards can be stored in the local filesystem (they don't dissappear when the container exits)
Ingested data can be stored in the local filesystem
Multiple REST and RSS endpoints "built in" to provide sources of data ingestion
Integration with REST API Modular Input
Splunk Machine Learning Toolkit included
/etc/hosts can be appended to with local ip/hostname entries
Ships with Eventgen to populate your index with fake webserver events for testing.

Screenshots

These are screenshots with actual data from production apps which I built on top of Splunk Lab:

Splunk Lab Cookbook

What can you do with Splunk Lab? Here are a few examples of ways you can use Splunk Lab:

Ingest some logs for viewing, searching, and analysis

Drop your logs into the logs/ directory.
bash <(curl -Ls https://bit.ly/splunklab)
Go to https://localhost:8000/
Ingsted data will be written to data/ which will persist between runs.

Ingest some logs for viewing, searching, and analysis but DON'T keep ingested data between runs

SPLUNK_DATA=no bash <(curl -Ls https://bit.ly/splunklab)
Note that data/ will not be written to and launching a new container will cause logs/ to be indexed again.
- This will increase ingestion rate on Docker for OS/X, as there are some issues with the filesystem driver in OS/X Docker.

Play around with synthetic webserver data

SPLUNK_EVENTGEN=1 bash <(curl -Ls https://bit.ly/splunklab)
Fake webserver logs will be written every 10 seconds and can be viewed with the query index=main sourcetype=nginx. The logs are based on actual HTTP requests which have come into the webserver hosting my blog.

Adding Hostnames into /etc/hosts

Edit a local hosts file
ETC_HOSTS=./hosts bash <(curl -Ls https://bit.ly/splunklab)
This can be used in conjunction with something like Splunk Network Monitor to ping hosts that don't have DNS names, such as your home's webcam. :-)

Get the Docker command line for any of the above

Run any of the above with PRINT_DOCKER_CMD=1 set, and the Docker command line that's used will be written to stdout.

Run Splunk Lab in Development Mode with a bash Shell

This would normally be done with the script ./bin/devel.sh when running from the repo, but if you're running Splunk Lab just with the Docker image, here's how to do it:

docker run -p 8000:8000 -e SPLUNK_PASSWORD=password1 -v $(pwd)/data:/data -v $(pwd)/logs:/logs --name splunk-lab --rm -it -v $(pwd):/mnt -e SPLUNK_DEVEL=1 dmuth1/splunk-lab bash

This is useful mainly if you want to poke around in Splunk Lab while it's running. Note that you could always just run docker exec splunk-lab bash instead of doing all of the above. :-)

Splunk Apps Included

The following Splunk apps are included in this Docker image:

All apps are covered under their own license. Please check the Apps page for more info.

Splunk has its own license. Please abide by it.

Free Sources of Data

I put together this curated list of free sources of data which can be pulled into Splunk via one of the included apps:

RSS
REST (you will need to set $REST_KEY when starting Splunk Lab)
- Non-streaming
  - Philadelphia Public Transit API
    - Regional Rail Train Data
  - Coinbase API
  - National Weather Service
  - Philadelphia Forecast
  - Philadelphia Hourly Forecast
  - Alpha Vantage - Free stock quotes
- Streaming
  - Meetup RSVPs
  - RSVP Endpoint

Apps Built With Splunk Lab

Since building Splunk Lab, I have used it as the basis for building other projects:

SEPTA Stats
- Website with real-time stats on Philadelphia Regional Rail.
- Pulled down over 60 million train data points over 4 years using Splunk.
Splunk Twint
- Splunk dashboards for Twitter timelines downloaded by Twint. This now a part of the TWINT Project.
Splunk Yelp Reviews
- This project lets you pull down Yelp reviews for venues and view visualizations and wordclouds of positive/negative reviews in a Splunk dashboard.
Splunk Glassdoor Reviews
- Similar to Splunk Yelp, this project lets you pull down company reviews from Glassdoor and Splunk them
Splunk Telegram
- This app lets you run Splunk against messages from Telegram groups and generate graphs and word clouds based on the activity in them.
Splunk Network Health Check
- Pings 1 or more hosts and graphs the results in Splunk so you can monitor network connectivity over time.
Splunk Fitbit
- Analyzes data from your Fitbit
Splunk for AWS S3 Server Access Logs
- App to analyize AWS S3 Access Logs

Here's all of the above, presented as a graph:

Building Your Own Apps Based on Splunk Lab

A sample app (and instructions on how to use it) are in the sample-app directory.
Feel free to expand on that app for your own apps.

A Word About Security

HTTPS is turned on by default. Passwords such as password and 12345 are not permitted.

Please, for the love of god, use a strong password if you are deploying this on a public-facing machine.

FAQ

How do I get a valid SSL cert on localhost?

Yes, you can!

First, install mkcert and then run mkcert -install && mkcert localhost 127.0.0.1 ::1 to generate a local CA and a cert/key combo for localhost.

Then, when you run Splunk Lab, set the environment variables SSL_KEY and SSL_CERT and those files will be pulled into Splunk Lab.

Example: SSL_KEY=./localhost.key SSL_CERT=./localhost.pem ./go.sh

How do I get this to work in Vagrant?

TL;DR If you're on a Mac, use OrbStack.

If you're running Docker in Vagrant, or just plain Vagrant, you'll run into issues because Splunk does some low-level stuff with its Vagrant directory that will result in errors in splunkd.log that look like this:

11-15-2022 01:45:31.042 +0000 ERROR StreamGroup [217 IndexerTPoolWorker-0] - failed to drain remainder total_sz=24 bytes_freed=7977 avg_bytes_per_iv=332 sth=0x7fb586dfdba0: [1668476729, /opt/splunk/var/lib/splunk/_internaldb/db/hot_v1_1, 0x7fb587f7e840] reason=st_sync failed rc=-6 warm_rc=[-35,1]

To work around this, disable sharing of Splunk's data directory by setting SPLUNK_DATA=no, like this:

SPLUNK_DATA=no SPLUNK_EVENTGEN=yes ./go.sh

By doing this, any data ingested into Spunk will not persist between runs. But to be fair, Splunk Lab is meant for development usage of Splunk, not long-term usage.

Does this work on Macs?

Sure does! I built this on a Mac. :-)

For best results, run under OrbStack.

Development

I wrote a series of helper scripts in bin/ to make the process easier:

./bin/download.sh - Download tarballs of various apps and splits some of them into chunks
- If downloading a new version of Splunk, edit bin/lib.sh and bump the SPLUNK_VERSION and SPLUNK_BUILD variables.
./bin/build.sh [ --force ] - Build the containers.
- Note that this downloads packages from an AWS S3 bucket that I created. This bucket is set to "requestor pays", so you'll need to make sure the aws CLI app set up.
- If you are (re)building Splunk Lab, you'll want to use --force.
./bin/upload-file-to-s3.sh - Upload a specific file to S3. For rolling out new versions of apps
./bin/devel.sh - Build and tag the container, then start it with an interactive bash shell.
- This is a wrapper for the above-mentioned go.sh script. Any environment variables that work there will work here.
- To force rebuilding a container during development touch the associated Dockerfile in docker/. E.g. touch docker/1-splunk-lab to rebuild the contents of that container.
./bin/push.sh - Tag and push the container.
./bin/create-1-million-events.py - Create 1 million events in the file 1-million-events.txt in the current directory.
- If not in logs/ but reachable from the Docker container, the file can then be oneshotted into Splunk with the following command: /opt/splunk/bin/splunk add oneshot ./1-million-events.txt -index main -sourcetype oneshot-0001
./bin/kill.sh - Kill a running splunk-lab container.
./bin/attach.sh - Attach to a running splunk-lab container.
./bin/clean.sh - Remove logs/ and/or data/ directories.
./bin/tarsplit - Local copy of my pacakge from https://github.com/dmuth/tarsplit

Building a New Version of Splunk

Bump version number and build number in bin/lib.sh
Run ./bin/build.sh, use --force if necessary
- This can take several MINUTES, especially if no apps are cached locally
Run SPLUNK_EVENTGEN=yes SPLUNK_ML=yes ./bin/devel.sh
- This will build and tag the container, and spawn an interactive shell
- Run /opt/splunk/bin/splunk version inside the container to verify the version number
Go to https://localhost:8000/ and verify you can log into Splunk
- Run the query index=main earliest=-1d and verify Eventgen events are coming in
- Go to https://localhost:8000/en-US/app/Splunk_ML_Toolkit/contents and verify that the ML Toolkit has been installed.
Type exit in the shell to shut down the server
Run ./bin/push.sh to deploy the image. This will take awhile.

Building Container Internals

Here's the layout of the cache/ directory
- cache/ - Where tarballs for Splunk and its apps hang out. These are downloaded when bin/download.sh is run for the first time.
- cache/deploy/ - When creating a specific Docker image, files are copied here so the Dockerfile can ingest them. (Or rather hardlinked to the files in the parent directory.)
- cache/build/ - 0-byte files are written here when a specific container is built, and on future builds, the age of that file is checked against the Dockerfile. If the Dockerfile is newer, then the container is (re-)built. Otherwise, it is skipped. This shortens a run of bin/devel.sh where no containers need to be built from 12 seconds on my 2020 iMac to 0.2 seconds.

A word on default/ and local/ directories

I had to struggle with this for awhile, so I'm mostly documenting it here.

When in devel mode, /opt/splunk/etc/apps/splunk-lab/ is mounted to ./splunk-lab-app/ via go.sh and the entrypoint script inside of the container symlinks local/ to default/. This way, any changes that are made to dashboards will be propagated outside of the container and can be checked in to Git.

When in production mode (e.g. running ./go.sh directly), no symlink is created, instead local/ is mounted by whatever $SPLUNK_APP is pointing to (default is app/), so that any changes made by the user will show up on their host, with Splunk Lab's default/ directory being untouched.

Additional Reading

Splunk Network Health Check

Notes/Bugs

The Docker containers are dmuth1/splunk-lab and dmuth1/splunk-lab-ml. The latter has all of the Machine Learning apps built in to the image. Feel free to extend those for your own projects.
If I run ./bin/create-test-logfiles.sh 10000 and then start Splunk Lab on a Mac, all of the files will be Indexed without any major issues, but then the CPU will spin, and not from Splunk.
- The root cause is that the filesystem code for Docker volume mappings on OS/X's Docker implementation is VERY inefficient in terms of both CPU and memory usage, especially when there are 10,000 files involved. The overhead is just crazy. When reading events from a directory mounted through Docker, I see about 100 events/sec. When the directory is local to the container, I see about 1,000 events/sec, for a 10x difference.
The HTTPS cert is self-signed with Splunk's own CA. If you're tired of seeing a Certificate Error every time you try connecting to Splunk, you can follow the instructions at https://stackoverflow.com/a/31900210/196073 to allow self-signed certificates for localhost in Google Chrome.
- Please understand the implications before you do this.

Credits

Splunk N' Box - Splunk N' Box is used to create entire Splunk clusters in Docker. It was the first actual use of Splunk I saw in Docker, and gave me the idea that hey, maybe I could run a stand-alone Splunk instance in Docker for ad-hoc data analysis!
Splunk, for having such a fantastic product which is also a great example of Operational Excellence!
Eventgen is a super cool way of generating simulating real data that can be used to generate dashboards for testing and training purposes.
This text to ASCII art generator, for the logo I used in the script.
The logo was made over at https://www.freelogodesign.org/
Lars Wirzenius for a review of this README.

Copyrights

Splunk is copyright by Splunk, Inc. Please stay within the confines of the 500 MB/day free license when using Splunk Lab, unless you brought your own license along.
The various apps are copyright by the creators of those apps.

Contact

My email is doug.muth@gmail.com. I am also @dmuth on Twitter and Facebook!

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
.github		.github
bin		bin
docker		docker
img		img
logs		logs
sample-app		sample-app
splunk-config		splunk-config
splunk-lab-app		splunk-lab-app
vendor		vendor
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
go.sh		go.sh

License

dmuth/splunk-lab

Folders and files

Latest commit

History

Repository files navigation