Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing IN_CREATE events #2

Closed
seb-m opened this issue Sep 15, 2010 · 24 comments
Closed

Missing IN_CREATE events #2

seb-m opened this issue Sep 15, 2010 · 24 comments

Comments

@seb-m
Copy link
Owner

seb-m commented Sep 15, 2010

Duplicated from Pyinotify Trac - Issue 8

Consider this case:

  • /wdir: is a watched directory, where each new created directory is automatically watched
  • /sdir: is a directory with several small files (file1, file2,..., file10)

Now, /sdir and its content are copied to /wdir:

 cp -a /sdir /wdir

Under these circumstances if we had started pyinotify like this:

 $ python pyinotify.py -a -e IN_CREATE /wdir

We would have observed the following output (assuming files1-10 are copied in that order):

 <Event dir=True mask=0x40000100 maskname=IN_CREATE|IN_ISDIR name=sdir path=/wdir pathname=/wdir/sdir wd=1 >
 # ... missed IN_CREATE events for file1 thru file3 ...
 <Event dir=True mask=0x40000000 maskname=IN_CREATE name=file4 path=/wdir/sdir pathname=/wdir/sdir/file4 wd=1 >
 <Event dir=True mask=0x40000000 maskname=IN_CREATE name=file5 path=/wdir/sdir pathname=/wdir/sdir/file5 wd=1 >
 <Event dir=True mask=0x40000000 maskname=IN_CREATE name=file6 path=/wdir/sdir pathname=/wdir/sdir/file6 wd=1 >

What happened is that there is a short laps of time between the moment /sdir is copied into /wdir and the moment where pyinotify effectively starts watching it. And during that time where /wdir/sdir is not yet watched the first files file1, file2, file3 are copied into this directory and mistakenly do not generate IN_CREATE events as expected.

(Thanks to Peter Smith for his initial report)

@njvack
Copy link

njvack commented Nov 8, 2010

I've tried to replicate this with the following bash command:

mkdir foo; touch foo/{bar,baz,corge}

and it reliably picks up all the create events. Is this a fixed bug (there seems to be a little gymnastics in process_IN_CREATE to handle this case), or a flaw in my reproducing technique, or simply a difference in systems?

@evillagran
Copy link

I tried this:
mkdir /tmp/{foo,bar}
touch /tmp/foo/{uno,dos,tres,cuatro,cinco}
pyinotify -a -e IN_CREATE /tmp/bar/
cp /tmp/foo/* /tmp/bar/

Stdout show:





I'm using python-inotify-0.9.0-2.fc14.noarch

@seb-m
Copy link
Owner Author

seb-m commented Feb 8, 2011

Thanks guys for your updates on this issue. Indeed, I've just tried with a recent kernel (2.6.35) and given this limited example (but with 100 files) I can't reproduce this issue anymore. However I don't know if it is definitively fixed or if it is somehow mitigated. If someone knows more about it, please share!

@Karmak23
Copy link

Hi,

I'm using pyinotify (stock version on Ubuntu Maverick & Lucid servers) in a heavy parallel-changing environment (10k watches or more) where we need custom ACLs to be applyed in subdirs, based on special rules.

There, monitoring a dir where a user tar -xzf anything > 100 files and subdirs (roughly) triggers the "bug": some events are missed. Untaring a pyudev tarball doesn't trigger it if the machine is not loaded, untaring nautilus or samba will trigger it in 99% of cases (my test case consists of untaring pyudev, then nautilus, then samba, then a linux kernel archive, to check ACLs are applyed on every file and every subdir created, and add them automatically to monitored files and dirs).

But I believe this is not a pyinotify "real bug", because it relies on concurrent accesses in the kernel, the libc and every part of the multi-user system: if the inotify call to create a watch for a sub-dir arrives after file creation in the given subdir, then the IN_CREATE of the file is missed.

I intimately feel this as a perfectly normal situation, due to the concurrent nature of the environment (and the fast nature of a untar operation on very small files, compared to multi-layer OO interpreted python code). But this more a feeling than a real proof.

Still, it is a problem from the developer stand-point, in normal conditions (where we want all events, without any doubt ;-) ), and I wrote some kind of "rewalk_directory()" helper to catch missed files and dirs. My own implementation of the ThreadedNotifier and its helpers now have something that will "complement" the _eventq (directories and files are not walked twice, when possible).

Perhaps the generic parts of my rewalk_() method could be integrated into the SysProcessEvent() ? I don't know. Note that my rewalk() code is higly (who said "uglily ?") dependant on other things in my app, so it's useless to use it verbatim, but I find it quite readable and hope you will get the idea, perhaps find it interesting.

Project: http://dev.licorn.org/
Inotifier event dispatcher: http://dev.licorn.org/browser/core/classes.py#L1764
Rewalk method: http://dev.licorn.org/browser/core/classes.py#L1853

Best regards from a fellow user,
Olivier

@seb-m
Copy link
Owner Author

seb-m commented Mar 15, 2011

Thanks for your feedback!!, I'll try to find time to read your rewalk() method. Just out of curiosity when do you trigger the call to this method?

@Karmak23
Copy link

The rewalk is trigerred on every dir only CREATE | MOVED_TO. It is internally recursive, but avoids re-triggers on already watched dirs, and already checked files (with a controlled expiration mechanism). The rewalk is synchronous, but all ACL applying are asynchronous (delegated to another thread), and pre-fill a list of expected ATTRIB events, to avoid loops.

To monitor my watched dirs, I only take IN_CREATE | IN_ATTRIB | IN_MOVED_TO | IN_MOVED_FROM | IN_DELETE_SELF into account.

regards,
Olivier

@fredrick
Copy link

Oliver, I too am experiencing this exact sort of behavior. I might have to implement some sort of rewalk method too.
Here's a snippet of my code that adds the watch on the path:
mask = pyinotify.IN_CREATE | pyinotify.IN_DELETE | pyinotify.IN_MOVED_TO | pyinotify.IN_ONLYDIR | pyinotify.IN_CLOSE_WRITE
wm.add_watch(path, mask, rec=True, auto_add=True)
For example, when untaring/unzipping, seems like pyinotify can't keep up with files being added to newly unzipped sub-directories, but as soon as the archive has finished uncompressing, future IN_CREATE's within those subdirectories are registered. Here's a raw event dump of downloading Wordpress, then doing a tar -xzvf on it:
pyinotify event dump

@seb-m
Copy link
Owner Author

seb-m commented Mar 17, 2011

I think the rewalk hack should be investigated indeed. I did not reply in depth to olivier because I must say I don't have the whole problem in mind right now :( so I didn't want to rush making comment that would be false, I need to find time to think about it. However, what olivier have implemented (at least from my understanding) seems very appealing and promising. I think rewalk() is called at a good time where the missed directories/files would likely be found and also because add_watch() is synchronous thus every iterated file/directory can be tested against the dictionary storing all watches for deciding if the file/directory was missed or not and call add_watch() if needed.

@karlgrz
Copy link

karlgrz commented Apr 4, 2011

I can confirm this behavior. I have been implementing some event listening code and have been noticing situations where copying a folder with a few subfolders is dropping events. Sometimes they hit...sometimes they don't (almost 50/50).

I'm going to check into your rewalk logic and see if that solves my issue.

@karlgrz
Copy link

karlgrz commented Apr 4, 2011

Hey Olivier,

I tried to look at your rewalk method, but the link is broken. Can you point me to where it is? I'd like to give working that into the pyinotify a go.

Thanks,
Karl

@fredrick
Copy link

fredrick commented Apr 4, 2011

Karl, here's a code fragment that I'm using to process events that may be missed as a result of IN_CREATE|IN_ISDIR or IN_MOVED_TO|IN_ISDIR. It may be a little too implementation specific (self.event_create is a method I have defined somewhere else that would handle the file creation), but you should get the idea behind the logic:

import os
import multiprocessing


class Event(object):
"""Mock event type"""

def __init__(self, **kwargs):
    for key in kwargs:
        setattr(self, key, kwargs[key])


"""Scan directory for new files to be indexed

Walks through event.pathname on directory created/modified
"""
def directory_rewalk(self, pathname):
    for root, edges, nodes in os.walk(pathname):
        for file_node in nodes:
            event = Event(
                pathname=os.path.join(root, file_node),
                name=file_node,
                dir=False)
            process = multiprocessing.Process(target=self.event_create, args=(event,))
            process.start()

Fred

@karlgrz
Copy link

karlgrz commented Apr 5, 2011

Thanks for the code example, Fred. I'll be working on a this patch this week, hopefully we can get something out of this that will satisfy all of or needs.

@seb-m
Copy link
Owner Author

seb-m commented Apr 5, 2011

Hi guys,

This morning I took time to rethink this issue and re-read relevant parts of pyinotify's code.

I did a small change that should (I hope) help mitigating this issue. Pyinotify now generates IN_CREATE events for all files contained in a newly created and watched directory. Please consult and test this commit.

I did not apply this behavior to IN_MOVE_TO events because I think it would be wrong to generate IN_CREATE events on moved files.

To test this commit I only used this tiny script, so I encourage you to test it more extensively in your projects.

#!/bin/bash
BASEDIR=/tmp/test

for i in {0..9}
do
    BASEDIR=$BASEDIR/d$i
    mkdir -p "$BASEDIR"

    for j in {0..29}
    do
    touch "$BASEDIR/f$j"
    done
done

@karlgrz
Copy link

karlgrz commented Apr 5, 2011

That's a welcome responsive reply, Sebastien! I will document my tests this morning to share. Thanks!

Karl

@fredrick
Copy link

fredrick commented Apr 5, 2011

@seb-m I definitely think this is the way to go, in terms of approach. I'll give it a go and share what I find.

@karlgrz
Copy link

karlgrz commented Apr 5, 2011

Just from the few tests I've run this morning so far this is looking great, Sebastien. I will provide my logs and test documentation after lunch for you to confirm.

@seb-m
Copy link
Owner Author

seb-m commented Apr 5, 2011

@karlgrz @wayoutmind great, if it confirms this is great news.

@karlgrz
Copy link

karlgrz commented Apr 5, 2011

Here's a link to my pre-patch log file and my post-patched log file:

https://gist.github.com/904199

Here is the directory structure that I used in the test:

/srv
    /test1
        /content1           
            /Dailies
                SAB_D_001_20101111_10001_01_001.m4v
                SAB_D_001_20101111_10001_01_002.m4v
            sab.jpg
        /content2
            /Dailies
                MOB_D_2014_20110323_201410_02_063.m4v
                MOB_D_2014_20110323_201410_02_064.m4v
            mob.jpg
        /content3
            /Dailies
                PLL_D_122_20110121_25_08_001.m4v
                PLL_D_122_20110121_25_08_002.m4v
            pll.jpg

In my pre-patch log I am NOT logging the IN_CLOSE_WRITE event, but in the post-patch I am, just to show that the patch is working a little more clearly.

Pre-patch, every single time I would START my script it would seem to catch all the events. As the script was running, though, it would eventually start to drop the /srv/content3/pll.jpg file very consistently.

Post-patch, every event seems to be caught and accounted for correctly.

This patch has fixed the problem I was having with dropped events. Thanks, Sebastien!

@fredrick
Copy link

@seb-m Your patch seems to have caught all IN_CLOSE events as a result of creating a directory. Here is my log output for a tar -xzvf on the latest Wordpress download. It caught everything, in a very deep and complicated directory hierarchy!

As far as I can tell, this fixes my problem completely. Thanks again for the great work.

@seb-m
Copy link
Owner Author

seb-m commented Apr 11, 2011

@wayoutmind @karlgrz thanks for your testing, I think it has come time to release a new version of Pyinotify with these changes. I will prepare it soon.

I think we should let this issue open at least for now, just to give peoples a heads-up on the current situation of this issue and for new comers informing them on the subtilities of the way how inotify works and how this issue is currently mitigated.

@karlgrz
Copy link

karlgrz commented Apr 11, 2011

@seb-m @wayoutmind no problems. Thanks for the responsiveness. This is working great for us.

@herve76
Copy link

herve76 commented May 27, 2011

Hi,

has a new version with this fix been released? Which version is it?

@fredrick
Copy link

@herve76 Yes, v0.9.2 includes this fix

@seb-m seb-m closed this as completed Sep 15, 2011
@Karmak23
Copy link

Karmak23 commented Nov 2, 2011

Hi folks. Sorry for the delay, I was much too busy... I will test 0.9.2 as soon as possible and report about the results. @seb-m, thanks a lot for taking time to fix this issue. BTW, the links are back. Too late. Sorry again for delay. Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants