Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent lost log wazuh agent #23377

Closed
protocolpaladin opened this issue May 10, 2024 · 6 comments
Closed

Prevent lost log wazuh agent #23377

protocolpaladin opened this issue May 10, 2024 · 6 comments
Assignees
Labels

Comments

@protocolpaladin
Copy link

protocolpaladin commented May 10, 2024

Hello,

Is it possible to add a log loss prevention feature when the wazuh agent is unable to obtain communication with the wazuh server.

The problem in this case is that the wazuh agent does not return logs since the loss of connection to the server.

@pereyra-m
Copy link
Member

Hello @protocolpaladin !

There is a feature that allows logcollector to read all the logs generated while the agent was stopped, it is disabled by default
only-future-events

Nevertheless, I have done some tests and entries that were written during the loss of communication with the Wazuh manager are properly read and sent after the connection is re-established.
Could you describe in more detail the steps to reproduce the issue? What is your Wazuh version?

Regards.

@protocolpaladin
Copy link
Author

protocolpaladin commented May 13, 2024

I have a wazuh client agent started on a workstation that temporarily does not have access to the wazuh server network and when it is reconnected to the network that does have access to the wazuh server, no activity logs are kept during the period of activity on the network that does not have access to the wazuh server.

In all the above cases, the agent should start up configured and functional to store and check the missing delta with the server during the next functional communication to send all unsent events since the last connection with the server.

I have the latest version of the server and agent

@pereyra-m
Copy link
Member

Hello again @protocolpaladin

The feature I mentioned above is designed for the time the agent stopped. The situation you've described (the one I tested also) is covered by Anti-flooding mechanism.

The agent has a buffer to temporarily save the events until they can be sent to the manager. But if you restart the agent during that moment, or there are just too many events, some of them will be lost.
This is the configuration block in the agent, confirm this feature isn't disabled

<client_buffer>
  <!-- Agent buffer options -->
  <disabled>no</disabled>
  <queue_size>5000</queue_size>
  <events_per_second>500</events_per_second>
</client_buffer>

Also, if you find log entries like these in your agent you might have to increase the buffer size

wazuh-agentd: WARNING: Agent buffer is flooded: Producing too many events.
wazuh-agentd: WARNING: Agent buffer is full: Events may be lost

The manager generates alerts in this situation, please confirm if they are present in yours

** Alert 1715699924.1334625: - wazuh,agent_flooding,pci_dss_10.6.1,gdpr_IV_35.7.d,
2024 May 14 15:18:44 (eb22cd0fb770) any->wazuh-agent
Rule: 203 (level 9) -> 'Agent event queue is full. Events may be lost.'
wazuh: Agent buffer: 'full'.
level: full

** Alert 1715699970.1334875: mail  - wazuh,agent_flooding,pci_dss_10.6.1,gdpr_IV_35.7.d,
2024 May 14 15:19:30 (eb22cd0fb770) any->wazuh-agent
Rule: 204 (level 12) -> 'Agent event queue is flooded. Check the agent configuration.'
wazuh: Agent buffer: 'flooded'.
level: flooded

Regards.

@protocolpaladin
Copy link
Author

Hello Alain @pereyra-m,

And thank you for your feedback. Perhaps I misspoke at the outset, I know these mechanisms and they're already in place.

But that's the whole problem with this operation, which for me is a standard functionality mode in an eDR, XDR, siem mechanism and which is not implemented as such on wazuh.

The anti-flooding system is set to maximum, but that's not enough to keep logs in a nomadic mode without access to the server. So yes, you could say it should be exposed on the Internet and connected all the time. However, company policy may differ, and on alternative solutions (commercial not open source) it's standard to define a retention threshold in the event of non-connection, e.g. 5GB or 10GB for unsynchronized logs. And restarting the service should not be a reason for losing synchronization and temporarily storing unsynchronized events, moreover, even if the anti-flooding mechanism isn't really valid in this mode, because logs can be very verbose very quickly, depending on the policy and mechanisms implemented by the company, and since the intelligence is server-side, the agent keeps all the logs on the local side in its anti-flooding mechanism, or to send everything (within its capacity limit), if the intelligence of events with alerts were also agent-side, this would generate more availability in the anti-flooding mechanism, which doesn't enought capacity have to keep all the events to store, we'd gain more retentions, but it wouldn't be valid for the "keep all logs" mode, and in any case it's not viable to envisage a loss of logs in the event of a service restart.

Could we integrate this mode of operation with the agent redesign project in order to finish a personalized log retention threshold via the agent? For example, I authorize up to 10 GB of buffer memory, or significantly increase the mechanism?

@pereyra-m
Copy link
Member

Hi @protocolpaladin

Much clearer now, thank you.
I've created an issue with the feature request so the corresponding team can analyze it and include it in the backlog.

#23446

Any update should be posted there.

Regards.

@sebasfalcone
Copy link
Member

This inquire will be tracked on the feature request issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants