Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lsyncd causes load while monitoring large filesystem. #202

Open
perfectayush opened this issue May 2, 2013 · 8 comments
Open

Lsyncd causes load while monitoring large filesystem. #202

perfectayush opened this issue May 2, 2013 · 8 comments

Comments

@perfectayush
Copy link

Hi,

I am using lsyncd to monitor an entire filesystem (~500 gb with 1250000 folder watches). Using lsyncd level 2 config, I call a shell script to echo the event.pathname to a timestamped file. This is done for backup purpose using a custom rsync script to sync this timestamped file containing list of path of files that have changed. Using the inbuilt level 4 lsyncd config doesn't work for me coz ,level 4 spawns a single rsync for each file changed. It doesn't keeps up with the bulk of file changed on my server.

Initially there weren't any issues besides high cpu utilization by lsyncd. But now it causes load average on the system to be increased too much. Shutting lsyncd down brings down the load average so its obvious that this issue is being caused by lsyncd.

I even tried replacing the shell scripts with lua code in the lsyncd-config, although the cpu utilization got reduced but the load average problem still persisted.

here is a link to the lsyncd-config i wrote
https://gist.github.com/perfectayush/5502216

any idea why this happens and what can be done to tackle this.

@axkibe
Copy link
Collaborator

axkibe commented May 3, 2013

coz ,level 4 spawns a single rsync for each file changed. It doesn't
keeps up with the bulk of file changed on my server.

This is not the case. default behavior is waiting for the defined delay
timeout and then send out one single rsync which gets the list of files
that have changes transfered through a pipe (or 1000 affected files,
whichever comes first). I putted a lot of effort into that to make this
possible :-)

If you set delay to zero, Lsyncd has no possibility to aggregate changes,
since it has zero time to do it.

Initially there weren't any issues besides high cpu utilization by lsyncd.
But now it causes load average on the system to be increased too much.
Shutting lsyncd down brings down the load average so its obvious that this
issue is being caused by lsyncd.

How much file changes per second are we talking here? So far CPU for Lsyncd
has never been a problem for anyone I heared of. Memory can be tough! Since
the kernel keeps aprox. 1KB of unswapable memory per watch this can add up.
So in your case this add ups to aprox. 1GB. Maybe your system is running
out of memory?

I'm afraight right now there isn't much away around that limit. Its a
limitation built into inotify.

  • Axel

@perfectayush
Copy link
Author

I know level 4 has a delay which aggregates filelist before spawning rsync. But we deal with around 40 small files being changed every second (based on the log that I created there were 145000 files affected in an hour). Setting delay to a large value didn't work either, seems like there is a limit to it. Level 4 couldn't keep up in one of the test i did. Memory is not an issue we have 24 GB RAM on the server. And lsyncd uses around 500 mb ram.

@axkibe
Copy link
Collaborator

axkibe commented May 3, 2013

The aggregation is limited to 1000 events in the queue, since checking each event for an event that is already in the queue has n^2 runtime performance. This could be changed to n * log n by clever use of lookup tables but isn't been done so far.

Runnig the Lua Profiler with a default configuration would be helpful to see where the CPU is eaten, if its in Lua itself and not the kernel after all.

@axkibe axkibe closed this as completed May 3, 2013
@axkibe axkibe reopened this May 3, 2013
@izzy
Copy link

izzy commented Jan 16, 2017

Can still confirm this. Tried it with ~1,000,000 directories on a 3TB file system and apart from the initial rsync taking a long time the inotify_add_watch for every folder took up an enormous amount of time and (single-core) cpu load, which seemingly lead to lsync repeatedly doing an init or at least part of it. I didn't see the rsync again after it finished, but checking on it after two days with strace I realized it was adding watches for folders I had already seen beeing added.

@axkibe
Copy link
Collaborator

axkibe commented Jan 16, 2017

If Lsyncd encounters an inotify queue overflow event, it fully restarts.

Otherwise if a path is moved or deleted and recreated it will of course add new watches for that path.

@izzy
Copy link

izzy commented Jan 16, 2017

If Lsyncd encounters an inotify queue overflow event, it fully restarts.

Is that preventable somehow? Also, would it be possible to multithread the add_watch process or increase it's performance otherwise?

@axkibe
Copy link
Collaborator

axkibe commented Jan 16, 2017

You can increase the inotify queue length as kernel parameter (via sys dev)

/proc/sys/filesystem/inotify/max_queued_events

This is an issue only if events pile up faster than Lsyncd can empty them out. The basic possiblity of this faucet to be faster than Lsyncd can drain it is not avoidable however. It can only be made less likely. For this you'd have to go GlusterFS or DBRD or so, where the sync is controlled on device level and thus these can throttle incoming events.

@axkibe
Copy link
Collaborator

axkibe commented Jan 16, 2017

PS: You should be able to see in the log if there was an Overflow, or why a watch for a folder has been readed (like the folder was moved)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants