Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[urar helper] fix filenames with spaces handling with unrar v5 #3073

Closed
mc-butler opened this issue Sep 11, 2013 · 30 comments
Closed

[urar helper] fix filenames with spaces handling with unrar v5 #3073

mc-butler opened this issue Sep 11, 2013 · 30 comments
Assignees
Labels
area: vfs Virtual File System support prio: low Minor problem or easily worked around
Milestone

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/3073
Reporter post-factum (@pfactum)
Mentions dnh@….org
Keywords urar, unrar

unrar v5 helper script doesn't print filenames with spaces. The following pull request fixes that.

#28

Note

Original attachments:

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Sep 24, 2013 at 7:48 UTC (comment 1)

  • Owner set to andrew_b
  • Branch state changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.11
  • Votes set to andrew_b
  • Status changed from new to accepted

Branch: 3073_urar5_spaces
[c4a546a]

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Oct 1, 2013 at 10:43 UTC (comment 2)

  • Votes changed from andrew_b to andrew_b slavazanko
  • Branch state changed from on review to approved

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 1, 2013 at 12:35 UTC (comment 3)

  • Branch state changed from approved to merged
  • Status changed from accepted to testing
  • Resolution set to fixed
  • Votes changed from andrew_b slavazanko to committed-master

Merged to master: [c4a546a].

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 1, 2013 at 12:36 UTC (comment 4)

  • Status changed from testing to closed

@mc-butler
Copy link
Author

Changed by Hamer13 (h13-mc@….net) on Oct 26, 2013 at 12:50 UTC (comment 5)

  • Status changed from closed to reopened
  • Resolution fixed deleted

I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk. For example: "File_name__with_spaces". "_" means whitespace. Awk will squeeze this double (treble and so on) whitespaces into one whitespace.
I propose this patch:

$ diff -ub /usr/lib/mc/extfs.d/urar.orig /usr/lib/mc/extfs.d/urar
--- /usr/lib/mc/extfs.d/urar.orig       2013-09-17 12:19:11.000000000 +0300
+++ /usr/lib/mc/extfs.d/urar    2013-10-26 15:22:22.000000000 +0300
@@ -43,13 +43,14 @@
 BEGIN { flag=0 }
 /^-----------/ { flag++; if (flag > 1) exit 0; next }
 flag==1 {
+    str=substr($0, 65)
     split($5, a, "-")
     if (index($1, "D") != 0)
        $1="drwxr-xr-x"
     else
        if (index($1, ".") != 0)
            $1="-rw-r--r--"
-    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8
+    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, str
 }'
 }

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 28, 2013 at 7:08 UTC (comment 5.6)

Replying to Hamer13:

I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk.

Agree.

+    str=substr($0, 65)

Why 65?

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 28, 2013 at 7:23 UTC (comment 6.7)

Replying to andrew_b:

Why 65?

Ah, sorry... No question.

I'm not sure that width of 'Size' and 'Packet' fields is constant always. Can it be wider for huge files?

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 14, 2013 at 5:25 UTC (comment 8)

  • Votes changed from committed-master to andrew_b
  • Branch state changed from merged to on review

Branch: 3073_urar5_spaces
[af4d458981f499d27d8b6d3c5b0c8ce2406d385e]

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 15, 2013 at 4:29 UTC (comment 9)

How about this patch?

nf = split($0, fields, " ", seps);

It's not portable. seps a) is a gawk extension and b) gawk >= 4.0.0. is required.

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 15, 2013 at 8:20 UTC (comment 10)

  • Cc set to dnh@….org

*ARGH* *darn* Too good to be portable ... Thanks a bundle andrew_b. Didn't
notice that I need gawk >= 4.0.0! And no wonder I did not know that feature
before, but it came in just too handy ;) Maybe "keep in in mind" though ;)

I still dislike indexing by the CRC field though, just on general principle.

I'll ponder programming around that and submit if I come up with anything
more useful, but for the time being, I guess that indexing by the CRC-field
should work. For openSUSE I'll just require gawk >= 4.0.0 and keep (adjust)
my patch, I think it's just somewhat more robust than the indexing ...

PS: I'm one of the maintainers of the mc package for openSUSE

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Nov 22, 2013 at 10:28 UTC (comment 11)

Dnh, we have to maintain lot of the things related to backward compatibility (such as: support of OpenWrt-distros, BusyBox-toolset and so on). Would be great to find a way to split few first parameters and to stay 'filename' parameter 'as is' with spaces. Your patch looks pretty good, but as Andrew say, your way will have restricted use cases.

Probably, we should check a version of gawk in configure.ac script and next we may set up some flag to switch between algorithms in the VFS-helper...

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 22, 2013 at 18:59 UTC (comment 12)

Once you think about the change it was easy to replace the four-arg split. The only problem was getting the seps[i+1] right.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 25, 2013 at 9:26 UTC

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 25, 2013 at 9:33 UTC (comment 13)

I've attached the test archive with the following content

testfile1
<space>testfile2
testfile3<space>
test<space>file<3 spaces>4

<spase> above is the mark of space in the file name.

This patch has some trouble with "testfile3 " file (name is finished with space). Trailing space is lost. Other names are processed correctly.

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 25, 2013 at 22:03 UTC (comment 14)

Yes, some oSuSE user (not sure if he wants to be mentioned here) already
noticed and notified me, even though he suggested a slightly different
solution (saving n-seps, checking that for greater nameparts, and add
'seps[n-seps]' if true ;). I thought about it quite a bit before getting up
last night ;) I'll ask him if and how he wants to be credited here for the
patch if "you" deem it better than the indexing by the CRC field version
currently used and integrate it into coming versions (He found the problem
independently of this existing bug (ticket) here). He suggested solutions, I
rewrote/improved those (significantly), so it's quite a shared credit if any!
I have a pending mail with him on the topic and will ask him about it (and
he's to go first in the mention if...)) (and will divulge his mail address to
the maintainers so they can ask themselves, just not as publicly as here).

I'll update the attached patch. My solution is to add a 'name=name seps[i]'
after the name-concatenating loop. If it's "filled", trailing seps get added,
if there are none, awk autovivifies the array-variable to the empty string, so
no extra checking is needed. "Blindly" concatenating seps[i] (and 'i' was
in-loop incremented as required, the 'i++' is executed "post-loop" (I checked
albeit only with gawk this time ;)) 'seps[i]' is just the last part of any
trailing seps characters.

This still get's torpedoed for names less than 13 chars length by the
following pruning of trailing spaces (which I have not checked in unrar's
sources if unrar5 still adds those flaky padding spaces sometimes. I just took
that "as true" from the existing urar helper/patch. IIRC. If that's wrong I'm
the first to say to leave that out ;) Maybe it's better to just "pass on"
that IMO bug of unrar to our users and document it. I've no idea. I just found
it in the existing "logic" and I tried not to change that (it's not in 4.8.10, but in the patch I started from). So, if in doubt, just prune those lines

+ ### remove padding blanks from short names
+ if (length(name)<13) {
+ sub(" *$", "", name);
+ }

from the patch.

BTW: shouldn't we gather some "testing" .rar (and other archive) files in the
"tests" directory? I've got a couple more ;) You rarely need more than
"touched" files, so archive size is basically archive header size + length of
included filenames, a couple of KB maybe ... Maybe create one really "sicko"
archive and pack that differently (of course, not all archivers may handle all
filenames, e.g. how does .zip handle '::' ?). We could even distribute that "sicko" archive as it's useful when your developing e.g. shell scripts :)

Sorry this got so long ...

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 25, 2013 at 22:04 UTC

fix to add trailing seps[*] stuff

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 26, 2013 at 7:35 UTC (comment 14.15)

Indeed, in output if unrar v or unrar l there are padding spaces after file name.

mprintf(L"%-12ls",Name);

They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.

But output of unrar vt contains correct file names w/o any padding:

$ unrar vt test.rar              

UNRAR 5.00 freeware      Copyright (c) 1993-2013 Alexander Roshal

Archive: test.rar
Details: RAR 4

        Name: testfile1
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name:  testfile2
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name: testfile3 
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name: test file   4
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

I think, we can try to parse this output. We need only following three lines:

        Name: testfile1
        Size: 0
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 27, 2013 at 0:57 UTC (comment 15.16)

Replying to andrew_b:

Indeed, in output if unrar v or unrar l there are padding spaces after file name.

mprintf(L"%-12ls",Name);

They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.

Exactly. And short filenames with trailing spaces are probably more seldom than
short ones without, so stripping spaces is/was the correct approach IMO. Upstream
bug. And perl's Archive::Rar is also no solution as it uses the external 'rar'.

But output of unrar vt contains correct file names w/o any padding:

..

I think, we can try to parse this output. We need only following three lines:

        Name: testfile1
        Size: 0
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--

Got a seemingly working first draft already. At least working with "my" test archive containing cases mentionend in this ticket. Even though I wrote it pretty tired I'll attach the new patch for you all to test on whatever "weird" archives you have and for portability issues.

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Nov 27, 2013 at 0:59 UTC

Reimplementation of mcrar5fs_list in vfs/extfs/helpers/urar using 'unrar vt'

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 28, 2013 at 6:17 UTC (comment 17)

Great!

Branch: 3073_urar5_spaces (rebased)
[6d9f09b81e2a2f9c04b929ed8075ac7d8e6c7063]

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Nov 29, 2013 at 10:07 UTC (comment 18)

  • Branch state changed from on review to approved
  • Votes changed from andrew_b to andrew_b slavazanko

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 29, 2013 at 10:12 UTC (comment 19)

  • Status changed from reopened to closed
  • Resolution set to fixed

Merged to master: [9d190f6].

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 29, 2013 at 10:14 UTC (comment 20)

  • Branch state changed from approved to merged
  • Votes changed from andrew_b slavazanko to committed-master

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Dec 12, 2013 at 14:56 UTC

*argh* missed a '' in a regex ... Should not be critical, but please add ...

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Dec 12, 2013 at 14:57 UTC (comment 21)

  • Status changed from closed to reopened
  • Resolution fixed deleted

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Dec 13, 2013 at 9:46 UTC (comment 22)

  • Votes committed-master deleted
  • Blocked by set to #3113
  • Milestone changed from 4.8.11 to 4.8.12
  • Branch state changed from merged to no branch

@mc-butler
Copy link
Author

Changed by slodki (slodki_dom@….onet.pl) on Dec 31, 2013 at 15:09 UTC (comment 23)

  • Priority changed from minor to trivial

Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):

--- libexec/mc/extfs.d/urar.org 2013-10-12 02:25:29.000000000 +0200
+++ libexec/mc/extfs.d/urar.my  2013-12-31 15:50:54.000000000 +0100
@@ -49,7 +49,7 @@
     else
        if (index($1, ".") != 0)
            $1="-rw-r--r--"
-    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8
+    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, substr($0,index($0,$8))
 }'
 }

Testes with mc v4.8.10 and rar v5.01.

@mc-butler
Copy link
Author

Changed by dnh (dnh@….org) on Dec 31, 2013 at 18:34 UTC (comment 23.24)

Replying to slodki:

Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):

The problem is, that 'unrar [vl]' is padding short filenames with spaces at the end! So 'unrar vt' is used as of 4.8.11, see the changeset above in this ticket and the rest of this ticket.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Feb 10, 2014 at 12:30 UTC

  • Blocked by #3113 deleted

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Feb 10, 2014 at 12:33 UTC (comment 26)

  • Resolution set to fixed
  • Status changed from reopened to closed
  • Branch state changed from no branch to merged

[48352fc]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: vfs Virtual File System support prio: low Minor problem or easily worked around
Development

No branches or pull requests

2 participants