New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::Find does not search recursively in perl #910

Closed
Pyxzure opened this Issue Aug 16, 2016 · 14 comments

Comments

Projects
None yet
6 participants
@Pyxzure

Pyxzure commented Aug 16, 2016

  • A brief description

File::Find does not search recursively in perl

root@local:~/Test# tree
.
└── a
    ├── b
    │   ├── f1.txt
    │   └── f2.txt
    └── c
        ├── f1.txt
        └── f2.txt

3 directories, 4 files
root@local:~/Test# perl -e 'use File::Find;find(sub {print $File::Find::name,$/;},".");'
.
./a
root@local:~/Test# perl -e 'use File::Find;find(sub {print $File::Find::name,$/;},"./a");'
./a
./a/b
./a/c
  • Your Windows build number

14393.51

build-essential is installed

@benhillis benhillis added the bug label Aug 16, 2016

@mbeijen

This comment has been minimized.

Show comment
Hide comment
@mbeijen

mbeijen Sep 25, 2016

Hi @Pyxzure

This problem is caused by the file system that is in use by Ubuntu on Windows.

The solution is to edit Config.pm:

sudo vi /usr/lib/perl/5.18.2/Config.pm

Set dont_use_nlink to 'define':

dont_use_nlink => 'define',

Now your example works!

mbeijen commented Sep 25, 2016

Hi @Pyxzure

This problem is caused by the file system that is in use by Ubuntu on Windows.

The solution is to edit Config.pm:

sudo vi /usr/lib/perl/5.18.2/Config.pm

Set dont_use_nlink to 'define':

dont_use_nlink => 'define',

Now your example works!

@benhillis

This comment has been minimized.

Show comment
Hide comment
@benhillis

benhillis Oct 18, 2016

Member

This should now be fixed in Insider builds.

Member

benhillis commented Oct 18, 2016

This should now be fixed in Insider builds.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 5, 2016

Not fixed in 14965, and over at #186 we're discussing a change to File::Find to resolve it. If WSL will be changed so that nlink works, work on changing File::Find should stop.

yorickdowne commented Dec 5, 2016

Not fixed in 14965, and over at #186 we're discussing a change to File::Find to resolve it. If WSL will be changed so that nlink works, work on changing File::Find should stop.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

@benhillis I'd love to see this working in the next public release. Can you confirm which insider build your fix will or should have landed on, so we can check whether it resolves the issue?

yorickdowne commented Dec 6, 2016

@benhillis I'd love to see this working in the next public release. Can you confirm which insider build your fix will or should have landed on, so we can check whether it resolves the issue?

@benhillis

This comment has been minimized.

Show comment
Hide comment
@benhillis

benhillis Dec 6, 2016

Member

@JasonLinMS is much more familiar with what is required here. Jason would you mind describing the issue?

Member

benhillis commented Dec 6, 2016

@JasonLinMS is much more familiar with what is required here. Jason would you mind describing the issue?

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

Thanks! From our discussion over at #186, it looks like the number of links in a directory is not as expected in a Linux environment. From the File::Find description:

$dont_use_nlink
You can set the variable $File::Find::dont_use_nlink to 1 if you want to force File::Find to always stat directories. This was used for file systems that do not have an nlink count matching the number of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS (OS/2 file system), FAT (DOS file system) and a couple of others.
You shouldn't need to set this variable, since File::Find should now detect such file systems on-the-fly and switch itself to using stat. This works even for parts of your file system, like a mounted CD-ROM.
If you do set $File::Find::dont_use_nlink to 1, you will notice slow-downs.

Here's my test of this in build 14965 (*). Note the Links count always being 0.

tbehrens@DESKTOP-J1SS16J:~$ tree Test
Test
└── a
    ├── b
    │   ├── f1.txt
    │   └── f2.txt
    └── c

3 directories, 2 files
tbehrens@DESKTOP-J1SS16J:~$ stat Test
  File: 'Test'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 1125899906978852  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880282000 -0500
Modify: 2016-12-06 11:08:09.880575400 -0500
Change: 2016-12-06 11:08:09.880575400 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a
  File: 'Test/a'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 1125899906978859  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880509600 -0500
Modify: 2016-12-06 11:08:12.544527000 -0500
Change: 2016-12-06 11:08:12.544527000 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a/b
  File: 'Test/a/b'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 5910974511060019  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880637000 -0500
Modify: 2016-12-06 11:08:22.798697400 -0500
Change: 2016-12-06 11:08:22.798697400 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a/c
  File: 'Test/a/c'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 3096224743953465  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:12.544411100 -0500
Modify: 2016-12-06 11:08:12.544411100 -0500
Change: 2016-12-06 11:08:12.544411100 -0500
 Birth: -

(*) Because of the UUP work that started Friday the 2nd, I can't test on a newer build. Once Insider builds come to Fast ring again, I will test on the then-current build.

yorickdowne commented Dec 6, 2016

Thanks! From our discussion over at #186, it looks like the number of links in a directory is not as expected in a Linux environment. From the File::Find description:

$dont_use_nlink
You can set the variable $File::Find::dont_use_nlink to 1 if you want to force File::Find to always stat directories. This was used for file systems that do not have an nlink count matching the number of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS (OS/2 file system), FAT (DOS file system) and a couple of others.
You shouldn't need to set this variable, since File::Find should now detect such file systems on-the-fly and switch itself to using stat. This works even for parts of your file system, like a mounted CD-ROM.
If you do set $File::Find::dont_use_nlink to 1, you will notice slow-downs.

Here's my test of this in build 14965 (*). Note the Links count always being 0.

tbehrens@DESKTOP-J1SS16J:~$ tree Test
Test
└── a
    ├── b
    │   ├── f1.txt
    │   └── f2.txt
    └── c

3 directories, 2 files
tbehrens@DESKTOP-J1SS16J:~$ stat Test
  File: 'Test'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 1125899906978852  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880282000 -0500
Modify: 2016-12-06 11:08:09.880575400 -0500
Change: 2016-12-06 11:08:09.880575400 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a
  File: 'Test/a'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 1125899906978859  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880509600 -0500
Modify: 2016-12-06 11:08:12.544527000 -0500
Change: 2016-12-06 11:08:12.544527000 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a/b
  File: 'Test/a/b'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 5910974511060019  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:09.880637000 -0500
Modify: 2016-12-06 11:08:22.798697400 -0500
Change: 2016-12-06 11:08:22.798697400 -0500
 Birth: -
tbehrens@DESKTOP-J1SS16J:~$ stat Test/a/c
  File: 'Test/a/c'
  Size: 0               Blocks: 0          IO Block: 512    directory
Device: 11h/17d Inode: 3096224743953465  Links: 0
Access: (0777/drwxrwxrwx)  Uid: ( 1000/tbehrens)   Gid: ( 1000/tbehrens)
Access: 2016-12-06 11:08:12.544411100 -0500
Modify: 2016-12-06 11:08:12.544411100 -0500
Change: 2016-12-06 11:08:12.544411100 -0500
 Birth: -

(*) Because of the UUP work that started Friday the 2nd, I can't test on a newer build. Once Insider builds come to Fast ring again, I will test on the then-current build.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

I think I have to walk my statement back. Upon further testing, recursive File::Find appears to work, and I can install modules from cpan.

yorickdowne commented Dec 6, 2016

I think I have to walk my statement back. Upon further testing, recursive File::Find appears to work, and I can install modules from cpan.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

Can you confirm that your intended fix was to set the link count to 0 on directories so that dont_use_nlink detection will trigger? The "nlink hack" in File::Find (and find(1), as I understand it) assumes that link count = subdirectories + 2, so if the link count is 0, that would make File::Find disable its nlink assumption automatically.

I think. Working my way through this. If the outcome is "it is indeed fixed", that'd be swell!

yorickdowne commented Dec 6, 2016

Can you confirm that your intended fix was to set the link count to 0 on directories so that dont_use_nlink detection will trigger? The "nlink hack" in File::Find (and find(1), as I understand it) assumes that link count = subdirectories + 2, so if the link count is 0, that would make File::Find disable its nlink assumption automatically.

I think. Working my way through this. If the outcome is "it is indeed fixed", that'd be swell!

@benhillis

This comment has been minimized.

Show comment
Hide comment
@benhillis

benhillis Dec 6, 2016

Member

@yorickdowne - My understanding is that due to how NTFS is implemented there is no quick way to query this number (without doing an entire directory listing which would be very expensive from a performance perspective). @JasonLinMS can correct me if I'm wrong.

Member

benhillis commented Dec 6, 2016

@yorickdowne - My understanding is that due to how NTFS is implemented there is no quick way to query this number (without doing an entire directory listing which would be very expensive from a performance perspective). @JasonLinMS can correct me if I'm wrong.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

@benhillis Setting it to 0 is completely acceptable, as it fixes the issue. Previously, the links value returned was "2" and that threw File::Find off. But with the value always being "0" for a directory, File::Find turns off its optimization and works recursively.

yorickdowne commented Dec 6, 2016

@benhillis Setting it to 0 is completely acceptable, as it fixes the issue. Previously, the links value returned was "2" and that threw File::Find off. But with the value always being "0" for a directory, File::Find turns off its optimization and works recursively.

@benhillis

This comment has been minimized.

Show comment
Hide comment
@benhillis

benhillis Dec 6, 2016

Member

@yorickdowne - glad to hear it.

Member

benhillis commented Dec 6, 2016

@yorickdowne - glad to hear it.

@JasonLinMS

This comment has been minimized.

Show comment
Hide comment
@JasonLinMS

JasonLinMS Dec 6, 2016

Thanks @benhillis , you are correct.
@yorickdowne Yeah, returning 2 means "trust me, this directory has no sub-directories", and returning 0 means "I have no idea, go check yourself". We recently switched from 2 to 0, and it'll most likely stay 0 for the foreseeable future.

JasonLinMS commented Dec 6, 2016

Thanks @benhillis , you are correct.
@yorickdowne Yeah, returning 2 means "trust me, this directory has no sub-directories", and returning 0 means "I have no idea, go check yourself". We recently switched from 2 to 0, and it'll most likely stay 0 for the foreseeable future.

@yorickdowne

This comment has been minimized.

Show comment
Hide comment
@yorickdowne

yorickdowne Dec 6, 2016

@JasonLinMS, thank you for confirming. So that raises a question: If this "likely stays zero for the foreseeable future", do you recommend that the perl5 folk go ahead with the patch to File::Find that will detect they are inside WSL and go check for themselves, anyway? Likely doesn't mean definitely.
If this ever goes back to "trust me, but my answer is wrong", then it'll break perl5 and its package management system cpan, so there's a desire to make sure perl5 stays operational going forward.

yorickdowne commented Dec 6, 2016

@JasonLinMS, thank you for confirming. So that raises a question: If this "likely stays zero for the foreseeable future", do you recommend that the perl5 folk go ahead with the patch to File::Find that will detect they are inside WSL and go check for themselves, anyway? Likely doesn't mean definitely.
If this ever goes back to "trust me, but my answer is wrong", then it'll break perl5 and its package management system cpan, so there's a desire to make sure perl5 stays operational going forward.

@JasonLinMS

This comment has been minimized.

Show comment
Hide comment
@JasonLinMS

JasonLinMS Dec 19, 2016

@yorickdowne Sorry for the delayed response. We hope to avoid being special-cased as much as possible. For this particular issue, we will definitely keep the return value at zero until we can actually provide the correct value (which will not be any time soon).

JasonLinMS commented Dec 19, 2016

@yorickdowne Sorry for the delayed response. We hope to avoid being special-cased as much as possible. For this particular issue, we will definitely keep the return value at zero until we can actually provide the correct value (which will not be any time soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment