Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(path): Rework basename and dirname to be coreutils compatible #3089

Merged
merged 4 commits into from Jan 26, 2023

Conversation

kamilogorek
Copy link
Contributor

@kamilogorek kamilogorek commented Jan 3, 2023

Note: tests are failing due to parse assertions being affected by this change.


This whole PR will invalidate some of the behaviors of parse function, which means that it would need to be reworked as well. However it's slightly more work, so before doing this, I'd like to ask what you even think about the expected behavior.

Should we follow coreutils? Should we follow Node.js?

Would it also be reasonable to implement parse as a superset of basename/dirname/extname calls instead? This way it would never get out of sync.


basename now passes all coreutils tests // https://github.com/coreutils/coreutils/blob/master/tests/misc/basename.pl which it did not before.

After reworking, the whole core is shared between posix and win32 modules.

Some behaviors, like basename(".js", ".js") === "" are not valid anymore, as per spec, the suffix cannot consume the whole path.

If the suffix operand is present (...) and is identical to a suffix of the characters remaining in string, the suffix suffix shall be removed from string.

Tests had to be factored out into table-test, as they also should be shared between win32/posix.
Some test variants are definitely doubled, but I did not want to remove any of them just for the sake of removing lines of code.

Implemented behavior is based on https://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html and https://pubs.opengroup.org/onlinepubs/9699919799/functions/basename.html.

Fixes #3063

path/_util.ts Outdated
@@ -26,8 +26,12 @@ export function isPosixPathSeparator(code: number): boolean {
return code === CHAR_FORWARD_SLASH;
}

export function isWindowsPathSeparator(code: number): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In windows system, both the forward slash and back slash are directory separator. See https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I only use it in isPathSeprator, but agree that the naming might be confusing. Will change that for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you should not export it. Right?

@kt3k
Copy link
Member

kt3k commented Jan 4, 2023

Aligning to coreutils makes sense to me.

Some behaviors, like basename(".js", ".js") === "" are not valid anymore, as per spec, the suffix cannot consume the whole path.

This also seems ok to me.

@kamilogorek
Copy link
Contributor Author

path.basename and path.dirname are now coreutils compatible. However, we cannot use dirname and basename in parse/format tests anymore, as they are not aligned and I'm not sure if they will/should ever be.

I explained the issue in this thread: https://twitter.com/kamilogorek/status/1610743549510680590

@kt3k
Copy link
Member

kt3k commented Jan 6, 2023

Can you give more examples of breaking changes by this PR?

path.basename and path.dirname are now coreutils compatible. However, we cannot use dirname and basename in parse/format tests anymore, as they are not aligned and I'm not sure if they will/should ever be.

If we introduce this change, I think we should also update base part of parse result.

And because of that, p === path.format(path.parse(p)) would effectively stop holding true for all the cases.

I think p === path.format(path.parse(p)) already doesn't hold for p === "//", or am I missing something?

@kt3k
Copy link
Member

kt3k commented Jan 6, 2023

It's true that path.format(path.parse("/foo///bar")) becomes "/foo///bar" (triple (or more) slashes in the middle are kept), but I'm not sure that's desired/intended behavior. (Is it useful for any meaningful case?)

@kamilogorek kamilogorek marked this pull request as ready for review January 8, 2023 09:38
@kamilogorek kamilogorek changed the title ref(path): Rework path.basename from the ground up to be coreutils-compatible ref(path): Rework basename and dirname from the ground up to be coreutils-compatible Jan 8, 2023
@kamilogorek kamilogorek changed the title ref(path): Rework basename and dirname from the ground up to be coreutils-compatible ref(path): Rework basename and dirname to be coreutils compatible Jan 8, 2023
@kamilogorek
Copy link
Contributor Author

kamilogorek commented Jan 8, 2023

This PR is now complete, with some breaking changes, that are all making it coreutils-compliant.

All coreutils tests are now also passing with both dirname and basename.
All previous tests were preserved and optionally updated if needed.

Changes:

  • basename prefix cannot be longer than path itself, it it is, skip trimming
  • basename prefix cannot consume whole path, if it does, skip trimming
  • basename will trim all trailing slashes
  • basename will fallback to / in case nothing is found
  • dirname will trim all trailing slashes
  • dirname will fallback to . in case no path is provided
  • dirname will fallback to / in case there is no match, be it only leading slashes or a single-segment path
  • parse and will processed path in the exact same way as basename and dirname (including changes listed above), which means invalid paths like /foo//bar// will be correctly parsed to /foo and bar, then formatted to /foo/bar
  • format will return dir in case base === "/" to prevent default value from returning ///

path/_util.ts Outdated
let matchedNonSeparator = false;
let end = path.length;

for (let i = path.length - 1; i >= 0; --i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i >= 0 should be i >= start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It also exposed one bug, which I also fixed.

[["/aaa/bbb", "bb"], "b"],
[["C:"], ""],
[["C:."], "."],
[["C:\\"], ""],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the result of posix.basename('/') is /, so i think the result of win32.basename('C:\\') is \\.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

[["aaa\\bbb", "bb"], "b"],
[["aaa\\bbb", "b"], "bb"],
[["/aaa/bbb", "bb"], "b"],
[["C:"], "\\"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the result be ""? The result of posix.basename('') is "".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input here is C:, which is just a "root" directory, which for posix returns a fallback value /, see line 15: [["/"], "/"],

~ ❯ basename /                                                                                                                                                                                          
/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your test assumes an empty input, which would be [[""], [""]], and we do have this test for both win32 and posix :)

~ ❯ basename ""                                                                                                                                                                                         
             

Copy link
Contributor

@familyboat familyboat Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C: represents a relative path from the current directory of the C: drive. C:\\ represents a root directory of the C: dirve. I get it from https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great catch, thanks! Updated

@kt3k
Copy link
Member

kt3k commented Jan 23, 2023

Trying to summarize the major breaking changes introduced by this, by examples:

  • basename("abc", "abc") => BEFORE "", AFTER "abc"
  • basename("/") => BEFORE "", AFTER "/"
  • dirname("aaa///bbb") => BEFORE "aaa//", AFTER "aaa"

These changes are introuced by aligning the API to basename (coreutils, unix command). I think these all make more sense than the current behavior.

parse will process path in the exact same way as basename and dirname (including changes listed above), which means invalid paths like /foo//bar// will be correctly parsed to /foo and bar, then formatted to /foo/bar

I think this change just make sense. It feels rather strange if it keeps double or triple slashes in the middle.

The below change is also introduced inspired by the change of basename("/") => "/".

  • win32.basename("c:\\") => BEFORE "", AFTER "\\"

This also looks reasonable to me.

I'm in favor of landing these changes.

Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kt3k kt3k merged commit 3e338fd into denoland:main Jan 26, 2023
@kamilogorek kamilogorek deleted the rework-basename branch January 26, 2023 09:02
bartlomieju pushed a commit to bartlomieju/deno_std that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

basename cant work correctly
3 participants