Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Path.RemoveRelativeSegments Api #2162

Open
Tracked by #64596
Anipik opened this issue Jun 27, 2018 · 29 comments
Open
Tracked by #64596

Add Path.RemoveRelativeSegments Api #2162

Anipik opened this issue Jun 27, 2018 · 29 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.IO help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@Anipik
Copy link
Contributor

Anipik commented Jun 27, 2018

Edit by carlossanlop: I marked this issue as up-for-grabs. If you're interested in taking it, please read this comment. This feature is fully implemented already, but the only thing pending to fix is a Unix performance regression before merging.

Rationale

Currently there is no direct way to normalize the path or to remove the relative Segments from any path string. We can use GetFullPath to normalize the path but it will resolve it relative to root directory or working directory which is not always desired.
eg

  • working with relative paths in archives where we may require normalized paths that are "full" paths in said archive, but not on the system

  • Will help in building and working with relative urls.

Proposed Api

namespace System.IO
{
    public class Path
    {
        public static string RemoveRelativeSegments(string path);
        public static string RemoveRelativeSegments(ReadOnlySpan<char> path);
        public static bool TryRemoveRelativeSegments(ReadOnlySpan<char> path, span<char> buffer, out int charsWritten);
    }
}

The rootlength is the length of the path which will never be trimmed while removing the relativeSegments.

Behavior

  • Path.DirectorySeparatorChar and Path.AltDirectorySeparatorChar are considered path "segment" separators (\ and / on Windows, / on Unix)
  • Sequential separators will be collapsed to a single separator
  • Any Path.AltDirectorySeparatorChar characters will be changed to Path.DirectorySeparatorChar (only relevant on Windows).
  • Segments of one period only . will be removed.
  • Segments of two periods only .. will be removed along with the parent segment, if any.

Implementation

This api is already implemented as an internal api and is being used by GetFullPath api to remove relative segments in case of device paths.

internal static bool RemoveRelativeSegments(ReadOnlySpan<char> path, int rootLength, ref ValueStringBuilder sb)

Usage

// removing the relative segment
Assert.Equal("C:\temp", PathInternal.RemoveRelativeSegments("C:\git\..\temp"));

// removing the relative segment but not eating the root
Assert.Equal("C:\temp", PathInternal.RemoveRelativeSegments("C:\..\..\temp"));

// Input is relative path
Assert.Equal("temp", PathInternal.RemoveRelativeSegments("git\..\temp"));

// Multiple types of relative Segments together
Assert.Equal("temp", PathInternal.RemoveRelativeSegments("git\.\..\temp"));
Assert.Equal("git\temp", PathInternal.RemoveRelativeSegments("git\\\\temp"));

// Normalizing the path
Assert.Equal("git\temp\src\corefx", PathInternal.RemoveRelativeSegments("git\temp/src\corefx"));

More Usages can be find here
dotnet/corefx#37225
I have modified the internal tests to api proposal

Some Design Answers

Resolving this https://github.com/dotnet/corefx/issues/4208

Can directly call RemoveRelativeSegments(RelativePath);

Exception Handling

  • return empty string if the path is empty or null.
  • throw Argument Exception if the bufferLength is less than the required in TryRemoveRelativeSegments api

cc @danmosemsft @JeremyKuhne

related implementation prs dotnet/coreclr#24273 dotnet/corefx#37225

@Anipik Anipik self-assigned this Jun 27, 2018
@danmoseley
Copy link
Member

Do 1, 2, 3 above also apply to forward slash exactly the same? On all platforms?

@Anipik
Copy link
Contributor Author

Anipik commented Jun 27, 2018

Do 1, 2, 3 above also apply to forward slash exactly the same? On all platforms?

yes, the same will apply to forward slash and on all platforms

@danmoseley
Copy link
Member

This does not make sense to me

// Not removing the relative segments if it occurs in the skip length
Assert.Equal("git\..\temp", PathInternal.RemoveRelativeSegments("git\..\temp\.\", 7));

What is the skip length? When does it remove \..\ ? This test does not exist in PathInternal.Tests.cs

For yoru clarifications, please update the top post.

Otherwise it seems reasonable to me, Jeremy can figure whether to flip the label to ready for review

@Anipik
Copy link
Contributor Author

Anipik commented Jun 29, 2018

What is the skip length? When does it remove ..\ ?

I updated the skipLength to rootLength in the post . The Api starts removing the relative segments from the path string after the rootlength.

@JeremyKuhne
Copy link
Member

Can you add a Span version of this please? TryRemoveRelativeSegments() will never return a longer output than input- as such it is pretty easy to pass a sufficient output buffer.

For ease of use I'd also add string RemoveRelativeSegments(ReadOnlySpan<char>). It should be a pretty common scenario to want the "normalized" result on the heap. You can take spans of an unnormalized path with our other APIs (GetDirectoryName for example). It would be nice to be able to call a chain of these with only one output string allocation.

You need to be very explicit about what normalizing the separator means. Path.AltDirectorySeparatorChar -> Path.DirectorySeparatorChar. MSBuild turns \ into / on Unix. My gut feel is that we don't need to provide that option to start, but we should call it out as a potential future feature should we get enough demand.

Please call out error handling for this as well. I presume given what we've done for Path.Join(), that we'd want empty/null to return empty/null. We should explicitly discuss in API review.

@jnm2
Copy link
Contributor

jnm2 commented Jun 30, 2018

@JeremyKuhne I'd like to add a vote for normalizing the separator. This is something I'm using string.Replace for today.

@svick
Copy link
Contributor

svick commented Jun 30, 2018

The rootlength is the length of the path which will never be trimmed while removing the relativeSegments.

Do I understand it correctly that it's length in characters, not in path segments? Why?

In any case, I think this should be clarified.

@JeremyKuhne
Copy link
Member

Do I understand it correctly that it's length in characters, not in path segments? Why?

Because we can't always understand the relevance of separators. Notably with UNCs for paths and device paths on Windows (\\server\share, \\?\UNC), but one can imagine others. We could try and GetPathRoot as the first segment but that assumes the path is well formatted. Users can make the two-step call and we'll give that example if they have "normal" path. This is, in fact, what we do with GetFullPath(path, basePath) currently.

@Anipik
Copy link
Contributor Author

Anipik commented Jul 31, 2018

@JeremyKuhne anything else that is required here ?

@JeremyKuhne
Copy link
Member

@Anipik I think it's fine to mark ready for review whenever you're ready.

@GrabYourPitchforks
Copy link
Member

Just a quick note on using this for URLs: what would be the result of RemoveRelativeSegments("/foo/?IAmActuallyAQueryStringComponent/../../")?

@terrajobst
Copy link
Member

terrajobst commented Aug 27, 2019

Video

@Anipik you mentioned you don't want to use GetFullPath() because you want to preserve the relativeness. What's the scenario for this API then? Functionally, it seems GetFullPath() or keeping the path with the dots seems both OK.

@JeremyKuhne
Copy link
Member

@Anipik you mentioned you don't want to use GetFullPath() because you want to preserve the relativeness. What's the scenario for this API then? Functionally, it seems GetFullPath() or keeping the path with the dots seems both OK.

What you want to do is normalize a path segment without it being combined with the current working directory. If you call Path.GetFullPath() on foo\..\bar you're going to get a path with the current working directory added to the front. This api will give you back foo\bar. This is particularly useful for working with archives as described in dotnet/corefx#4208.

@carlreinke
Copy link
Contributor

Similar to dotnet/corefx#24685, this API will be defective on Linux in the presence of symbolic links:

  • Windows: If C:\Users\All Users is a symbolic link to C:\ProgramData then the canonical form of C:\Users\All Users\..\foo is C:\Users\foo.
  • Linux: If /var/lock is a symbolic link to /run/lock then the canonical form of /var/lock/../foo is /run/foo.

Fire up WSL and give it a try. :)

user@roxy:~$ head -n 1 /var/lock/../resolvconf/resolv.conf /run/resolvconf/resolv.conf /var/resolvconf/resolv.conf
==> /var/lock/../resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.

==> /run/resolvconf/resolv.conf <==
# This file was automatically generated by WSL. To stop automatic generation of this file, remove this line.
head: cannot open '/var/resolvconf/resolv.conf' for reading: No such file or directory
user@roxy:~$

@JeremyKuhne
Copy link
Member

this API will be defective on Linux in the presence of symbolic links

This API will deliberately not follow links or touch the file system in any way.

I'm updating the original description to me more precise about what the heuristics are.

@carlreinke
Copy link
Contributor

I guess since GetFullPath has the same behavior, it's not really out of line. And there's probably an argument to be made that nobody wants or expects Linux's actual behavior. Even bash's tab completion doesn't match Linux's actual behavior.

@terrajobst
Copy link
Member

terrajobst commented Sep 24, 2019

Video

Alright, here is the shape we'd like to see:

namespace System.IO
{
    public class Path
    {
        public static string RemoveRedundantSegments(string path);
        public static string RemoveRedundantSegments(ReadOnlySpan<char> path);
        public static bool TryRemoveRedundantSegments(ReadOnlySpan<char> path, Span<char> destination, out int charsWritten);
    }
}

@Anipik Anipik removed their assignment Jan 14, 2020
@carlossanlop
Copy link
Member

I'll work on this.

@carlossanlop carlossanlop self-assigned this Jan 24, 2020
@carlossanlop carlossanlop transferred this issue from dotnet/corefx Jan 24, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Jan 24, 2020
@carlossanlop carlossanlop added api-approved API was approved in API review, it can be implemented help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 24, 2020
@carlossanlop carlossanlop added this to the 5.0 milestone Jan 24, 2020
@jkotas
Copy link
Member

jkotas commented Jan 24, 2020

public static string RemoveRedundantSegments(ReadOnlySpan path);

Should this return Span, like existing Path.TrimEndingDirectorySeparator? Returning string means that this API will allocate unnecessarily when the path does not contain any redundant segments.

@carlossanlop
Copy link
Member

carlossanlop commented Jan 24, 2020

@jkotas I see what you mean. I am working on this right now and as I was implementing that method, I started wondering about that. You were faster than me in asking about it. 😸

I did a quick replay of the video, but I couldn't find any mention of the return value, so it could've been a small oversight (I could've also missed it).

Question to the reviewers in the video: Should we update the second signature to return a span instead of a string? What's the process to alter an already approved API?

@terrajobst @bartonjs @GrabYourPitchforks @tannergooding @KrzysztofCwalina @scalablecory @JeremyKuhne

@jeffhandley
Copy link
Member

I'm late to the game here, but I'm struggling with the word "redundant" in the API name, as I don't think we're matching the definition of "redundant". The path /foo/bar/../baz does not have any redundant segments as nothing is repeated or excessive--instead .. negates bar; it cancels it out.

Node has a Path.normalize function that is strikingly similar to what is being proposed here. I always liked the "normalize" verb for this function. To make this specific to relative path segments, I recommend NormalizeRelativeSegments.

@jeffhandley
Copy link
Member

Moving this out to .NET 7.0.

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Jul 22, 2021
@deeprobin
Copy link
Contributor

deeprobin commented Dec 21, 2021

@carlossanlop What is the current progress for #37939 (closed for inactivity 😩)?

@carlossanlop
Copy link
Member

carlossanlop commented Dec 22, 2021

@deeprobin The closed PR has the feature fully working and properly tested. The only problem is that it would be introduced a performance regression in Unix, which can be reproduced by calling Path.GetFullPath("/some/existing/path/without/relative/segments") and comparing the measured time it takes to execute that API, before and after the change. But once we get that regression investigated and fixed, it can get merged.

Right now I don't have bandwidth to work on that, but we would welcome external contributions that can take on the above task and help drive that PR to completion.

Would you be interested, @deeprobin?

cc @adamsitnik @Jozkee

@carlossanlop carlossanlop added the help wanted [up-for-grabs] Good issue for external contributors label Dec 22, 2021
@deeprobin
Copy link
Contributor

deeprobin commented Dec 22, 2021

@carlossanlop I would take a look at it. However, I am currently not sure if I can improve the regression.

@jeffhandley jeffhandley removed Priority:3 Work that is nice to have Cost:S Work that requires one engineer up to 1 week labels Jul 9, 2022
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@iSazonov
Copy link
Contributor

Looking #37939 I guess the code could be more simple if handle the path from end to begin.

@carlossanlop
Copy link
Member

Looking #37939 I guess the code could be more simple if handle the path from end to begin.

Feel free to copy my branch and submit a PR, @iSazonov .

@carlossanlop carlossanlop removed their assignment Mar 23, 2023
@krwq
Copy link
Member

krwq commented Sep 11, 2023

@jeffhandley I've unassigned you given there has been no activity in .NET 7-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.IO help wanted [up-for-grabs] Good issue for external contributors
Projects
System.IO - File system
  
Review in progress