Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kubernetes deployment files and instructions. #397

Closed
wants to merge 42 commits into from

Conversation

Jeroen0494
Copy link
Contributor

@Jeroen0494 Jeroen0494 commented Mar 10, 2022

Hi,

I've successfully ported SELKS to Kubernetes, and I'd like to share my work with you.

Suricata, Scirius Evebox and NGINX are running in the suricata namespace (you could change this to the SELKS namespace or something). I've moved the ELK stack to it's own namespace, because I'd like to reuse these components. ConfigMaps holding component configuration files reflect this change.

Suricata is running as a DaemonSet on every node, using the host network and connecting to the host network interface. Because of this, there are usually more then one, so I've added Filebeat as a DaemonSet to ship the logs to a single Logstash instance. I don't have a good way yet to differentiate the PCAP files generated by Suricata yet. It seems simple variables like $HOSTNAME are not expanded in the configuration file. This means the moment you run more then one Suricata instance, they will overwrite each others PCAP files (or simply crash). Any ideas on how to solve this?

NGINX is running with a service using NodePort to expose the application. I've added Arkime (Moloch) to the configuration file, which seems to be currently missing in the Docker deployment.

I've added readiness and liveness probes based on the official Elastic ones, and to the best of my knowledge. Also I've applied some basic security settings in the security context. Secrets are used for default passwords instead of directly in the deployments. Priviledged are dropped where possible, but containers like Scirius and Evebox refuse to run with anything but root. I've also added some resource requests and limits, but I'd recommend updating these according to your needs. Mine are pretty basic.

The alpine container will load the SELKS dashboards into Kibana.

suricata-stdout.yaml contains a very basic Suricata DaemonSet with log output to separate containers, which I found on the internet and improved a bit. This can pose as a simple alternative IDS for people with more simple use cases.

I hope you find this useful, or helps somebody searching this repo for a Kubernetes deployment option. The next step for me is to figure out how to run Suricata in IPS mode. I'm thinking of creating a virtual interface using the Linux dummy module and connection Kubernetes to it, then using Suricata to move packets between the virtual interface and the real network port connected to the rest of the network. This would make sure cluster communication doesn't pass Suricata, preventing it from spamming it's logs and using resource. If you have any ideas in this regard, please let me know!

BTW the secrets are test versions, no critical information is being exposed here.

@Jeroen0494
Copy link
Contributor Author

Jeroen0494 commented Mar 11, 2022

If I set options for suricata on the command line:

- name: SURICATA_OPTIONS
  value: -i enx24b6ff700899 -vvv --set sensor-name=suricata --set outputs.7.pcap-log.filename=log.$(NODE_NAME).%n.%t.pcap --set outputs.7.pcap-log.enabled=yes

They are successfully applied, with variables expanded:

# echo $SURICATA_OPTIONS
-i enx24b6ff700899 -vvv --set sensor-name=suricata --set outputs.7.pcap-log.filename=log.jeroen-xps-13-9370.%n.%t.pcap --set outputs.7.pcap-log.enabled=yes

However, they don't seem to be actually applied at runtime:

# suricata --dump-config | grep pcap
outputs.1.eve-log.pcap-file = false
outputs.7 = pcap-log
outputs.7.pcap-log = (null)
outputs.7.pcap-log.enabled = no
outputs.7.pcap-log.filename = log.%n.%t.pcap
outputs.7.pcap-log.limit = 10mb
outputs.7.pcap-log.max-files = 20
outputs.7.pcap-log.mode = multi
outputs.7.pcap-log.dir = /var/log/suricata/fpc/
outputs.7.pcap-log.use-stream-depth = no
outputs.7.pcap-log.honor-pass-rules = no
pcap = (null)
pcap.0 = interface
pcap.0.interface = eth0
pcap.1 = interface
pcap.1.interface = default
pcap-file = (null)
pcap-file.checksum-checks = auto
profiling.pcap-log = (null)
profiling.pcap-log.enabled = no
profiling.pcap-log.filename = pcaplog_stats.log
profiling.pcap-log.append = yes

The sensor name isn't applied either:

# suricata --dump-config | grep sensor-name
[root@XXXXXXXXXXX /]#

The suricata process does seem to have the arguments applied:

# cat /proc/1/cmdline
/usr/bin/suricata--usersuricata--groupsuricata-ienx24b6ff700899-vvv--setsensor-name=suricata--setoutputs.7.pcap-log.filename=log.XXXXXXXXXXX.%n.%t.pcap--setoutputs.7.pcap-log.enabled=yes

@pevma
Copy link
Member

pevma commented Mar 11, 2022 via email

@Jeroen0494
Copy link
Contributor Author

You're most welcome!

Do you happen to know the solution to Suricata not setting my configuration via the --set command line option?

I also noticed that Logstash is eating up all my RAM so I'm in the process of replacing Filebeat + Logstash with FluentBit + Fluentd. I'm hoping memory usage will go down, converting the Logstash configuration is proving a bit difficult for now :)

@yodapotatofly
Copy link
Collaborator

yodapotatofly commented Mar 12, 2022

@Jeroen0494
I don't know much about kubernetes, however --set sensor-name=suricata always worked for me when running docker containers :

suricata:
container_name: suricata
image: jasonish/suricata:master-amd64
entrypoint: /etc/suricata/new_entrypoint.sh
restart: ${RESTART_MODE:-unless-stopped}
depends_on:
scirius:
condition: service_healthy
environment:
- SURICATA_OPTIONS=${INTERFACES} -vvv --set sensor-name=suricata
cap_add:
- NET_ADMIN
- SYS_NICE
network_mode: host

@Jeroen0494
Copy link
Contributor Author

@Jeroen0494 I don't know much about kubernetes, however --set sensor-name=suricata always worked for me when running docker containers :

suricata:
container_name: suricata
image: jasonish/suricata:master-amd64
entrypoint: /etc/suricata/new_entrypoint.sh
restart: ${RESTART_MODE:-unless-stopped}
depends_on:
scirius:
condition: service_healthy
environment:
- SURICATA_OPTIONS=${INTERFACES} -vvv --set sensor-name=suricata
cap_add:
- NET_ADMIN
- SYS_NICE
network_mode: host

Thanks for your comment.

Unfortunately, this doesn't seem to work either. The interface option is evaluated, because if the interface doesn't exist, the container fails and exists. But none op the --set options are used.

version: '3.4'

networks:
  network:

volumes:
  suricata-rules: #for suricata rules transfer between scirius and suricata and for persistency
  suricata-run: #path where the suricata socket resides
  suricata-logs:

services:
  suricata:
    container_name: suricata
    image: jasonish/suricata:master-amd64
    entrypoint: /etc/suricata/new_entrypoint.sh
    environment:
        - SURICATA_OPTIONS=-i eno1 -vvv --set sensor-name=suricata --set outputs.7.pcap-log.enabled=yes
    cap_add:
      - NET_ADMIN
      - SYS_NICE
    network_mode: host
    volumes:
       - ./containers-data/suricata/logs:/var/log/suricata
       - suricata-rules:/etc/suricata/rules
       - suricata-run:/var/run/suricata/
       - ./containers-data/suricata/etc:/etc/suricata
[jeroen@mediaserver docker]$ docker-compose up -d
WARNING: Some networks were defined but are not used by any service: network
Creating suricata ... done
[jeroen@mediaserver docker]$ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED         STATUS         PORTS                              NAMES
7b53c7cd3bb1   jasonish/suricata:master-amd64   "/etc/suricata/new_e…"   5 seconds ago   Up 4 seconds                                      suricata
aa48609a05e9   ghcr.io/linuxserver/nzbget       "/init"                  2 weeks ago     Up 7 hours     6789/tcp, 0.0.0.0:7000->7000/tcp   nzbget
[jeroen@mediaserver docker]$ docker exec -ti 7b53c7cd3bb1 bash
[root@mediaserver /]# suricata --dump-config | grep sensor
[root@mediaserver /]#
[root@mediaserver /]# suricata --dump-config | grep pcap
outputs.1.eve-log.pcap-file = false
outputs.7 = pcap-log
outputs.7.pcap-log = (null)
outputs.7.pcap-log.enabled = no
outputs.7.pcap-log.filename = log.%n.%t.pcap
outputs.7.pcap-log.limit = 10mb
outputs.7.pcap-log.max-files = 20
outputs.7.pcap-log.mode = multi
outputs.7.pcap-log.dir = /var/log/suricata/fpc/
outputs.7.pcap-log.use-stream-depth = no
outputs.7.pcap-log.honor-pass-rules = no
pcap = (null)
pcap.0 = interface
pcap.0.interface = eth0
pcap.1 = interface
pcap.1.interface = default
pcap-file = (null)
pcap-file.checksum-checks = auto
profiling.pcap-log = (null)
profiling.pcap-log.enabled = no
profiling.pcap-log.filename = pcaplog_stats.log
profiling.pcap-log.append = yes

@Jeroen0494
Copy link
Contributor Author

I've updated my Pull Request and separated component info folders. The ELK stack is now in the loggin namespace. I've included configuration to replace Logstash + Filebeat with Fluentd + Fluent-bit. For me, this makes a HUGE impact in terms of memory usage. Logstash uses 1Gb when doing nothing and 1,5Gb when working, Fluentd uses about 100Mb, irregardless. Filebeat uses about 150Mb, Fluent-bit uses 10Mb. For my single node server with 16Gb memory, this makes a huge difference.

@pevma
Copy link
Member

pevma commented Mar 22, 2022

@Jeroen0494 - what is the best way to test this out :)

@Jeroen0494
Copy link
Contributor Author

@Jeroen0494 - what is the best way to test this out :)

Good question! The nice thing about Kubernetes is that the API is the same across distribution, so any Kubernetes installation will work. Personally, I use k3s because of the lower resource usage. Both my laptop and server have 16G of memory, so the smaller the Kubernetes image the better. The installation instructions are on their website.

I've included a readme in this pull requests with instructions on how to deploy the resources to Kubernetes. You need to update the PV to both set your hostname for the nodeAffinity, and set the path to existing folders. I've included commands you can use to create the folders and set the permissions in the readme. You could also change the PV to use whatever storage medium you want that Kubernetes supports, for example NFS, SMB or Ceph.

You can make changes to the install.sh file according to your own needs, the default will provide you with a vanilla SELKS installation.

Because I run it locally via k3s, NGINX is exposed via https://localhost:/

@pevma
Copy link
Member

pevma commented Mar 22, 2022

ok, sounds like a plan Thank you !

Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
@Jeroen0494
Copy link
Contributor Author

So my laptop is powerful enough to run the SELKS stack. I've added the laptop as a node to my cluster and manually set taints, tolerations and nodeAffinity to schedule everything SELKS on the laptop. It runs just fine.

I've updated the securityContext of a lot of resources to what is minimally required. I've also given everything a version bump, except for Kibana which gives some weird errors:

  Warning  Unhealthy  24s (x2 over 64s)  kubelet            Liveness probe failed: Error: Got HTTP code 000 but expected a 200
sh: 16: [[: not found
  Warning  Unhealthy  24s (x3 over 64s)  kubelet  Readiness probe failed: Error: Got HTTP code 000 but expected a 200
sh: 16: [[: not found
  Warning  Unhealthy  4s (x4 over 54s)  kubelet  Liveness probe failed: Error: Got HTTP code 200 but expected a 200
sh: 16: [[: not found
  Normal   Killing    4s (x2 over 44s)  kubelet  Container kibana failed liveness probe, will be restarted
  Warning  Unhealthy  4s (x4 over 54s)  kubelet  Readiness probe failed: Error: Got HTTP code 200 but expected a 200
sh: 16: [[: not found
  Warning  Unhealthy  4s                kubelet  Readiness probe failed:
  Normal   Pulling    4s (x3 over 83s)  kubelet  Pulling image "docker.elastic.co/kibana/kibana:7.17.9"

This PR could probably use some commit squashing when done, 32 commits is a bit mush. Could probably quash to 1 commit.

Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
@pevma
Copy link
Member

pevma commented Mar 17, 2023

@Jeroen0494 - totally understand and much appreciate the update. Awesome to see it running.
Pease don't hesitate to ping when you think all is good to go or if you need something / some test confirmation etc.
Thank you !

Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
@Jeroen0494
Copy link
Contributor Author

@pevma I think it's done and ready for review! There are still some outstanding issues:

Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
@Jeroen0494
Copy link
Contributor Author

I made some more improvements. CronJobs now have a proper securityContext, and Suricata has probes to check whether it is still running.

Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
Signed-off-by: Jeroen Rijken <jeroen.rijken@xs4all.nl>
@pevma
Copy link
Member

pevma commented Mar 24, 2023

Thank you for the contribution !
As we spoke on discord we will give it a spin in the next cpl weeks in the lab and feedback, much appreciated !

@maxgio92
Copy link

maxgio92 commented May 20, 2023

Hi @pevma and @Jeroen0494! This is amazing, and I'm going to work on this just right before discovering this PR!
Is there work in progress or news in the meantime?
Can I support on something?

I just opened a tracking issue for this great feature request: #438.

@Jeroen0494
Copy link
Contributor Author

Hi @maxgio92 ,

Thank you for your interest. Currently the PR is ready to merge and in review by the Stamus Networks team since. This PR could probably also use a squash before merging.

Things you could potentially work on:

  • The network interface Suricata listens on is currently hardcoded and statically defined. You could potentially help with writing some sort of script that selects the correct interface.
  • Certain options set on the command line in the daemonset aren't applied to the runtime, you could look into this. See the second comment in this PR.
  • I started working on eBPF support for the Docker image on a blue monday, never finished it: Enable eBPF support for Suricata. jasonish/docker-suricata#27

@maxgio92
Copy link

Thank you @Jeroen0494!
I'll definitely give a check on them.

@regit
Copy link
Member

regit commented Jul 31, 2023

Merged and push to master. Thanks a lot!!! And sorry for the crazy delay.

@regit regit closed this Jul 31, 2023
@Jeroen0494
Copy link
Contributor Author

@regit Nice, thank you.

Could I ask you why instead of mering the PR you fetch the commits and push them to master?

@regit
Copy link
Member

regit commented Jul 31, 2023

@regit Nice, thank you.

Could I ask you why instead of mering the PR you fetch the commits and push them to master?

We have updated internal process at Stamus to be more reactive on MR and this is part of the updated process.

@pevma
Copy link
Member

pevma commented Aug 1, 2023

Big thank you @Jeroen0494 for the awesome contribution of time and code !

@Jeroen0494
Copy link
Contributor Author

Big thank you @Jeroen0494 for the awesome contribution of time and code !

You're most welcome!

@Jeroen0494 Jeroen0494 deleted the feat/kubernetes branch August 1, 2023 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants