Skip to content
main
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
bin
 
 
 
 
img
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Splunk Lab

This project lets you stand up a Splunk instance in Docker on a quick and dirty basis.

Quick Start!

Paste either of these on the command line:

bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-lab/master/go.sh)

bash <(curl -Ls https://bit.ly/splunklab)

...and the script will print up what directory it will ingest logs from, your password, etc. Follow the on-screen instructions for setting environment variables and you'll be up and running in no time! You can find your logs with the search index=main.

If you want to see neat things you can do in Splunk Lab, check out the Cookbook section.

Useful links after starting

  • https://localhost:8000/ - Default port to log into the local instance. Username is admin, password is what was set when starting Splunk Lab.
  • Splunk Dashboard Examples - Wanna see what you can do with Splunk? Here are some example dashboards.

Features

  • App databoards can be stored in the local filesystem (they don't dissappear when the container exits)
  • Ingested data can be stored in the local filesystem
  • Multiple REST and RSS endpoints "built in" to provide sources of data ingestion
  • Integration with REST API Modular Input
  • Splunk Machine Learning Toolkit included
  • /etc/hosts can be appended to with local ip/hostname entries
  • Ships with Eventgen to populate your index with fake webserver events for testing.

Screenshots

These are screenshots with actual data from production apps which I built on top of Splunk Lab:

Splunk Lab Cookbook

What can you do with Splunk Lab? Here are a few examples of ways you can use Splunk Lab:

Ingest some logs for viewing, searching, and analysis

  • Drop your logs into the logs/ directory.
  • bash <(curl -Ls https://bit.ly/splunklab)
  • Go to https://localhost:8000/
  • Ingsted data will be written to data/ which will persist between runs.

Ingest some logs for viewing, searching, and analysis but DON'T keep ingested data between runs

  • SPLUNK_DATA=no bash <(curl -Ls https://bit.ly/splunklab)
  • Note that data/ will not be written to and launching a new container will cause logs/ to be indexed again.
    • This will increase ingestion rate on Docker for OS/X, as there are some issues with the filesystem driver in OS/X Docker.

Play around with synthetic webserver data

  • SPLUNK_EVENTGEN=1 bash <(curl -Ls https://bit.ly/splunklab)
  • Fake webserver logs will be written every 10 seconds and can be viewed with the query index=main sourcetype=nginx. The logs are based on actual HTTP requests which have come into the webserver hosting my blog.

Adding Hostnames into /etc/hosts

  • Edit a local hosts file
  • ETC_HOSTS=./hosts bash <(curl -Ls https://bit.ly/splunklab)
  • This can be used in conjunction with something like Splunk Network Monitor to ping hosts that don't have DNS names, such as your home's webcam. :-)

Get the Docker command line for any of the above

  • Run any of the above with PRINT_DOCKER_CMD=1 set, and the Docker command line that's used will be written to stdout.

Run Splunk Lab in Development Mode with a bash Shell

This would normally be done with the script ./bin/devel.sh when running from the repo, but if you're running Splunk Lab just with the Docker image, here's how to do it:

docker run -p 8000:8000 -e SPLUNK_PASSWORD=password1 -v $(pwd)/data:/data -v $(pwd)/logs:/logs --name splunk-lab --rm -it -v $(pwd):/mnt -e SPLUNK_DEVEL=1 dmuth1/splunk-lab bash

This is useful mainly if you want to poke around in Splunk Lab while it's running. Note that you could always just run docker exec splunk-lab bash instead of doing all of the above. :-)

Splunk Apps Included

The following Splunk apps are included in this Docker image:

All apps are covered under their own license. Please check the Apps page for more info.

Splunk has its own license. Please abide by it.

Free Sources of Data

I put together this curated list of free sources of data which can be pulled into Splunk via one of the included apps:

Apps Built With Splunk Lab

Since building Splunk Lab, I have used it as the basis for building other projects:

Here's all of the above, presented as a graph:

Building Your Own Apps Based on Splunk Lab

A sample app (and instructions on how to use it) are in the sample-app directory.
Feel free to expand on that app for your own apps.

A Word About Security

HTTPS is turned on by default. Passwords such as password and 12345 are not permitted.

Please, use a strong password if you are deploying this on a public-facing machine.

FAQ

Does this work on Macs?

Sure does! I built this on a Mac. :-)

Development

I wrote a series of helper scripts in bin/ to make the process easier:

  • ./bin/build.sh - Build the containers.
    • Note that this downloads packages from an AWS S3 bucket that I created. This bucket is set to "requestor pays", so you'll need to make sure the aws CLI app set up.
  • ./bin/download.sh - Download tarballs of various apps and splits some of them into chunks
  • ./bin/upload-file-to-s3.sh - Upload a specific file to S3. For rolling out new versions of apps
  • ./bin/push.sh - Tag and push the container.
  • ./bin/devel.sh - Build and tag the container, then start it with an interactive bash shell.
    • This is a wrapper for the above-mentioned go.sh script. Any environment variables that work there will work here.
  • ./bin/create-1-million-events.py - Create 1 million events in the file 1-million-events.txt in the current directory.
    • If not in logs/ but reachable from the Docker container, the file can then be oneshotted into Splunk with the following command: /opt/splunk/bin/splunk add oneshot ./1-million-events.txt -index main -sourcetype oneshot-0001
  • ./bin/kill.sh - Kill a running splunk-lab container.
  • ./bin/attach.sh - Attach to a running splunk-lab container.
  • ./bin/clean.sh - Remove logs/ and/or data/ directories.
  • ./bin/tarsplit - Local copy of my pacakge from https://github.com/dmuth/tarsplit

Building Container Internals

  • Here's the layout of the cache/ directory
    • cache/ - Where tarballs for Splunk and its apps hang out. These are downloaded when bin/download.sh is run for the first time.
    • cache/deploy/ - When creating a specific Docker image, files are copied here so the Dockerfile can ingest them. (Or rather hardlinked to the files in the parent directory.)
    • cache/build/ - 0-byte files are written here when a specific container is built, and on future builds, the age of that file is checked against the Dockerfile. If the Dockerfile is newer, then the container is (re-)built. Otherwise, it is skipped. This shortens a run of bin/devel.sh where no containers need to be built from 12 seconds on my 2020 iMac to 0.2 seconds.

A word on default/ and local/ directories

I had to struggle with this for awhile, so I'm mostly documenting it here.

When in devel mode, /opt/splunk/etc/apps/splunk-lab/ is mounted to ./splunk-lab-app/ via go.sh and the entrypoint script inside of the container symlinks local/ to default/. This way, any changes that are made to dashboards will be propagated outside of the container and can be checked in to Git.

When in production mode (e.g. running ./go.sh directly), no symlink is created, instead local/ is mounted by whatever $SPLUNK_APP is pointing to, so that any changes made by the user will show up on their host, with Splunk Lab's default/ directory being untouched.

Additional Reading

Notes/Bugs

  • The Docker containers are dmuth1/splunk-lab and dmuth1/splunk-lab-ml. The latter has all of the Machine Learning apps built in to the image. Feel free to extend those for your own projects.
  • If I run ./bin/create-test-logfiles.sh 10000 and then start Splunk Lab on a Mac, all of the files will be Indexed without any major issues, but then the CPU will spin, and not from Splunk.
    • The root cause is that the filesystem code for Docker volume mappings on OS/X's Docker implementation is VERY inefficient in terms of both CPU and memory usage, especially when there are 10,000 files involved. The overhead is just crazy. When reading events from a directory mounted through Docker, I see about 100 events/sec. When the directory is local to the container, I see about 1,000 events/sec, for a 10x difference.
  • The HTTPS cert is self-signed with Splunk's own CA. If you're tired of seeing a Certificate Error every time you try connecting to Splunk, you can follow the instructions at https://stackoverflow.com/a/31900210/196073 to allow self-signed certificates for localhost in Google Chrome.
    • Please understand the implications before you do this.

Credits

  • Splunk N' Box - Splunk N' Box is used to create entire Splunk clusters in Docker. It was the first actual use of Splunk I saw in Docker, and gave me the idea that hey, maybe I could run a stand-alone Splunk instance in Docker for ad-hoc data analysis!
  • Splunk, for having such a fantastic product which is also a great example of Operational Excellence!
  • Eventgen is a super cool way of generating simulating real data that can be used to generate dashboards for testing and training purposes.
  • This text to ASCII art generator, for the logo I used in the script.
  • The logo was made over at https://www.freelogodesign.org/

Copyrights

  • Splunk is copyright by Splunk, Inc. Please stay within the confines of the 500 MB/day free license when using Splunk Lab, unless you brought your own license along.
  • The various apps are copyright by the creators of those apps.

Contact

My email is doug.muth@gmail.com. I am also @dmuth on Twitter and Facebook!

About

Create a lab instance of Splunk for ad hoc data analytics. Includes Splunk's Machine Learning app!

Resources

License

Releases

No releases published

Sponsor this project

Packages

No packages published