Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package request: Wazuh-agent #1

Open
V3ntus opened this issue Apr 17, 2024 · 39 comments
Open

Package request: Wazuh-agent #1

V3ntus opened this issue Apr 17, 2024 · 39 comments

Comments

@V3ntus
Copy link
Owner

V3ntus commented Apr 17, 2024

Moved from: NixOS#230623

@sjdwhiting
Copy link

Huzzuh! (Wazuh?)

Ok so yea, I think I mentioned already that I'm also learning Nix as I go here.

What command are you running to manually start the service?

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 17, 2024

Right now, after the ExecPreStart finishes copying the build result to /var/ossec, you can do /var/ossec/bin/wazuh-control start. But since the service is broken, I would clone the repo and run nix-build ./pkgs/tools/security/wazuh/default.nix inside it.
See nealfennimore@b2e53bd#commitcomment-140938027 to reproduce

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 18, 2024

5125b09 should make the Wazuh service build successfully. My VM ran out of space so I'll try again tomorrow

@sjdwhiting
Copy link

Tested it this morning. It builds but does not start. After getting it up and running the manual way yesterday, I reverted back to setting it up in configuration.nix to test.

For clarity, I'm sourcing the wazuh-agent branch here as my nixpkgs and then I am simply adding the following to my configuration.nix

services.wazuh.agent.enable = true;
services.wazuh.agent.managerIP = <IP>;

Here is the error:

[sebastian@nixos:~]$ systemctl status wazuh-agent.service 
× wazuh-agent.service - Wazuh agent
     Loaded: loaded (/etc/systemd/system/wazuh-agent.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2024-04-18 10:02:14 CDT; 10s ago
    Process: 96965 ExecStartPre=/nix/store/q17hsdfh1rz63qp3xidr4b4jj0qzxqbl-unit-script-wazuh-agent-pre-start/bin/wazuh-agent-pre-start (code=exited, status=200/CHDIR)
         IP: 0B in, 0B out
        CPU: 457us

Apr 18 10:02:14 nixos systemd[1]: Starting Wazuh agent...
Apr 18 10:02:14 nixos (re-start)[96965]: wazuh-agent.service: Changing to the requested working directory failed: No such file or directory
Apr 18 10:02:14 nixos systemd[1]: wazuh-agent.service: Control process exited, code=exited, status=200/CHDIR
Apr 18 10:02:14 nixos systemd[1]: wazuh-agent.service: Failed with result 'exit-code'.
Apr 18 10:02:14 nixos systemd[1]: Failed to start Wazuh agent.

It is upset trying to change directories so I'll be looking at that next.

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 18, 2024

It might be related to this? I don't think this line is important anyways

WorkingDirectory = stateDir;

Yeah I think it's trying to cd into the working directory, but in your case, it hasn't run the preStart yet that contains the code to create that directory and copy stuff into it. Thus the directory doesn't exist and the service fails to start

@sjdwhiting
Copy link

sjdwhiting commented Apr 18, 2024

Yea, I think that makes sense. I added some debugging statements and they don't ever run so its definitely like preStart isn't actually running. I'm going to try killing that line.

I got mixed up for a bit by my system config too. Forgot to run nix flake update and was very confused by why my changes weren't showing up.

@sjdwhiting
Copy link

That did the trick, kind of. Now the preStart script is running into permission errors:

mkdir: cannot create directory ‘/var/ossec’: Permission denied

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 18, 2024

Weird. Guess I'll have to wipe my VM and start fresh with debugging

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 18, 2024

Starting from a fresh VM on a NixOS host, removing the WorkingDirectory option fixed that first error of not being able to cd into it, but nothing about denied permissions.

I am still getting wazuh-execd did not start, and it's possible it could be this? wazuh/wazuh#15640

Edit: Nope, ps is available to systemd
image

@sjdwhiting
Copy link

I'm getting the same error. I also made a small change to my fork that alleviated permission errors. It uses systemd.tmpfiles.rules.

So the issue seems to that wazuh-control can't sort out the pid of wazuh-execd in the context of the systemd service since starting the binary manually works. And each attempt to start the service results in another instance of wazuh-execd.

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 19, 2024

I read up on systemd tempfiles a bit, sounds like a worthy solution for the state directory permission issues you were having. I suppose we can consider the state directory volatile then?

Seems like the appropriate debugging to narrow down the cause of wazuh-control not sorting out the process must happen here:
https://github.com/wazuh/wazuh/blob/3bf19121e8604c99566fc5e78267648a5161b062/src/init/wazuh-client.sh#L165-L182
And here:
https://github.com/wazuh/wazuh/blob/3bf19121e8604c99566fc5e78267648a5161b062/src/init/wazuh-client.sh#L195-L222

@sjdwhiting
Copy link

sjdwhiting commented Apr 19, 2024

The info on the implementation of tempfiles seems a bit vague and poorly documented. I came across a number of posts where people used them just fine for a persistent state directory and said it persists on reboot. Downside is if you manually deleted it, restarting the service doesn't restore it and from what I can tell neither does running nixos-rebuild switch

So I've just been using sed to slowly add debug statements into the wazuh-control script and suddenly it started working... I have no idea. Maybe we need to insert some sleep/wait statements to slow it down?

Looking at number of times it iterated, i'm pretty sure it is some sort of race condition and the insertion of extra statements gave it enough breathing room to work.

Screenshot 2024-04-19 at 3 03 20 PM

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 19, 2024

LOL wow nice job! I have no idea how sed caused it to work, it adds hardly any overhead. That's awesome though!

@sjdwhiting
Copy link

Well not sed itself but the execution of the debug statements I put in using it. Maybe those fractions of milliseconds added up haha.

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 19, 2024

Well if you'd like, definitely recommend eventually putting an issue up on https://github.com/wazuh/wazuh/issues. If anything, just to get them aware of this weird behavior and what their input is.

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 22, 2024

Weird logs from wazuh-agentd doing some experimenting:
image
preStart script looks like this:

cp -rf ${pkg}/* ${stateDir}
touch /tmp/wazuhpwd
sed -i '204i ls $\{DIR}/var/run/$\{pfile}-*.pid 2>&1 | tee /tmp/wazuhpwd; echo $? >> /tmp/wazuhpwd' ${stateDir}/bin/wazuh-control

find ${stateDir} -type f -exec chmod 644 {} \;
find ${stateDir} -type d -exec chmod 750 {} \;
chmod u+x ${stateDir}/bin/*
chmod u+x ${stateDir}/active-response/bin/*
chown -R ${wazuhUser}:${wazuhGroup} ${stateDir}

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 22, 2024

Somehow /var/ossec/etc/client.keys is being overwritten or used as a log:
image

@sjdwhiting
Copy link

That is pretty odd. This is my current preStart which I updated today. Took some trial and error but replaced the debugging statements with sleep statements that seem to have kept it working. I also checked out that file on my machine and no such issues.

        cp -rf ${pkg}/* ${stateDir}
        sed -i '12i sleep 0.1s;' ${stateDir}/bin/wazuh-control
        
        sed -i '209i sleep 0.2s ' ${stateDir}/bin/wazuh-control

        find ${stateDir} -type f -exec chmod 644 {} \;
        find ${stateDir} -type d -exec chmod 750 {} \;
        chmod u+x ${stateDir}/bin/*
        chmod u+x ${stateDir}/active-response/bin/*
        chown -R ${wazuhUser}:${wazuhGroup} ${stateDir}

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 25, 2024

I still find it kinda funny that adding more sleep statements have been making this work knowing that wazuh-control already has sleep statements in that same loop lol.

I tried to replicate with your previous working iteration, but I'll try with your new iteration.

FWIW, I'm now using a NixOS host and building VMs declaratively referencing this article: https://nix.dev/tutorials/nixos/nixos-configuration-on-vm.html

Script I'm running:

rm -rf ./result nixos.qcow2  # the VM automatically creates the qcow disk to persist state
nix-build '<nixpkgs/nixos>' \
	-A vm \
	-I nixpkgs=$PWD/nixpkgs \  # point this to your nixpkgs repo path
	-I nixos-config=./configuration.nix \  # see below for example config
	--show-trace && \  # debugging flag, optional
	./result/bin/run-nixos-vm -nographic && \  # run the QEMU VM
	reset  # reset the terminal after the VM shuts off

Example VM config:

{ config, pkgs, ... }:

{
  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  # Wazuh stuff I added
  services.wazuh.agent.enable = true;
  services.wazuh.agent.managerIP = "192.168.2.11";

  users.users.joe = {  # change as desired
    isNormalUser = true;
    extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
    packages = with pkgs; [
      git
    ];
    initialPassword = "password";  # change as desired
  };

  system.stateVersion = "23.11";
}

Seems like a proper, immutable testing environment instead of dealing with manual VM stuff. The VM build doesn't take long either.

@sjdwhiting
Copy link

Nice, that seems like a solid approach. I have a flake based setup which is spread across multiple files. I'm using my remote git repo as the source for my nixpkgs at the moment.

inputs = {
    nixpkgs = {
      url = "github:sjdwhiting/nixpkgs/wazuh-agent";
    };

    flake-utils.url = "github:numtide/flake-utils";

    users-flake.url = "../../users";
    users-flake.inputs.nixpkgs.follows = "nixpkgs";

    packages-flake.url = "../../systemPackages";
    packages-flake.inputs.nixpkgs.follows = "nixpkgs";

    

  };

And then I have the configuration.nix which has the Wazuh services setup. So after I push to remote repo, I just run a rebuild. Only downside to that is that I have to push everything to remote so it means a lot more commits and pushes.

But once you get tisi running, we need to decide what is next to get this to a minimum level of functionality. My gut says we need to start assessign what default Wazuh checks do and don't work on a NixOS system. I have a feeling that in a lot instances, they will work since NixOS often symlinks to the nix store from the normal locaiton for a file so Wazuh will still find things.

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 26, 2024

Awesome! Yeah for sure. I can't remember if I posted a related comment, but I agree, we do need to look through the Wazuh default config (it's currently set up for debian/common FHS) and make sure we're linking stuff properly. Still waiting on journald support from Wazuh, that's the big blocker for that task. I believe it's going through their QA currently?

Edit:
It's a PR in review wazuh/wazuh#23137

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 30, 2024

I believe I've identified the root cause of wazuh-control not seeing the spawned processes.
image

I was able to narrow it down to our usage of the systemd path option. Specifying pkgs.busybox was not the solution after all, we only needed to give it /run/current-system/sw:

systemd.services.wazuh-agent = mkIf cfg.agent.enable {
    path = [
      "/run/current-system/sw"
    ];
	...
}

Boom, it's registered!
image

View the last commit with the fixes here:
bcd8281

@V3ntus
Copy link
Owner Author

V3ntus commented Apr 30, 2024

I'm putting the branch up for review in this PR:
NixOS#308041

Should allow it to be globally available for others to test and work with while we wait for Wazuh to implement journald support.

@sjdwhiting
Copy link

Good work! We should definitely keep it globally available. I don't see a compelling reason to wait for journald support. Yes, it does limit some of the functionality of the agent but there are other ways to ship journald logs to Wazuh. The Wazuh manager is also a SIEM so in the meantime, someone could ship the journald logs using something like fluent-bit/fluentd or filebeat.

@sjdwhiting
Copy link

I noticed the build is failing on that PR. I'm getting the same error testing on my machine. Looking into a bit this morning, will update if I find anything.

@sjdwhiting
Copy link

Fixed it: #2

@V3ntus
Copy link
Owner Author

V3ntus commented May 1, 2024

Interesting, wasn't getting that error, but that makes sense to me. Thanks for spotting it!

@sjdwhiting
Copy link

Ok, new issue. The way we are handling the /var/ossec/ folder today is causing it to be wiped out on a reboot which is problematic because the wazuh-agent stores its key at /var/ossec/etc/client.keys.

My guess is that this is because of the preStart script copying the files over so its overwriting them. Maybe we could use something like rysnc instead of cp to avoid wiping out client.keys

@V3ntus
Copy link
Owner Author

V3ntus commented May 2, 2024

That is a big design flaw, good catch. Ideally we do want a mutable /var/ossec but some of these can be kept as immutable and declarative, such as the configuration pieces. rsync would work, but in the case of needing to update the entirety of /var/ossec when the Wazuh package is updated, how do we determine what we can and cannot overwrite?

@sjdwhiting
Copy link

I got it to work here: #3

While it is possible to take a declarative approach to client.keys, I don't think that makes sense. If someone wanted to use the same config on multiple machines, they would need someway to register dynamically plus in a large environment you wouldn't want to basically manually register them all.

In my opinion, this is a good approach for any dynamically created files that we want to have persistence which shouldn't be many since the manager is responsible for storage of actual data.

Long term, I think we will have to just wait and see what breaks and update the service as needed, either using that same rsync command or something else, to protect those files.

@sjdwhiting
Copy link

Today I tested the group functionality. So I created a NixOS group on the manager, enrolled my NixOS machine to it, and then placed a custom agent.conf file into the relevant directory on the manager which would have been /var/ossec/etc/shared/NixOS.

I verified that it pushed to my machine then reverified its presence after both a reboot and rebuild.

Since I didn't protect that file in anyway, my assumption is that the manager likely replaced it as soon as it saw it was missing. That said, I think we don't have to worry too much about most of the files in there since the Manager will likely fix those issues. Time will tell though!

On another note, what do we have left to actually get this merged? I've read through some of the docs on contributing and I get the feeling we are just waiting on someone to review it?

@V3ntus
Copy link
Owner Author

V3ntus commented May 3, 2024

Awesome! Good to know that it'll persist configurations through syncing.

I'm not quite sure, I'm definitely waiting and looking for a review from the NixOS community just to make sure things look good.

Also, just so we're on the same page, being listed as a maintainer for wazuh on nixpkgs is ideally opt-in. I think it would be beneficial wazuh to have maintainers listed, so I'll put my name on there, but obviously I did not want to put anyone else's name if they were unwilling. Relevant doc: https://github.com/NixOS/nixpkgs/blob/master/maintainers/README.md

@V3ntus
Copy link
Owner Author

V3ntus commented May 6, 2024

Well that wasn't supposed to happen.
NixOS#308041 (comment)

I'll work on squashing and cleaning up this branch, and resubmitting a new PR. And because I don't 100% my git-fu, I made a backup branch just in case lol. https://github.com/V3ntus/nixpkgs/tree/wazuh-agent-pre-rebase

@V3ntus
Copy link
Owner Author

V3ntus commented May 6, 2024

New PR up at NixOS#309573

@sjdwhiting
Copy link

Lol.

Yea, I'm down to be on the maintainers list. Is it safe to open up a PR or are we going to blow it up again?

@sjdwhiting
Copy link

#4

PR for adding me to the maintainers as well.

@V3ntus
Copy link
Owner Author

V3ntus commented May 6, 2024

Yea, I'm down to be on the maintainers list. Is it safe to open up a PR or are we going to blow it up again?

Should be good! Looked to be a flaky workflow that caused it after a terrible go at a force push on my end.

@V3ntus
Copy link
Owner Author

V3ntus commented May 13, 2024

Didn't see this. Probably would have to look into using this process with the recent reviews on the Nixpkgs PR.
https://documentation.wazuh.com/current/user-manual/reference/unattended-installation.html

@V3ntus
Copy link
Owner Author

V3ntus commented Jun 20, 2024

First 4.9.0 alpha is out which includes the added journald support. Hoping I can get traction back to testing.
https://github.com/wazuh/wazuh/releases/tag/v4.9.0-alpha1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants