Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make enum RegexParseError and RegexParseException public #38872

Closed
abelbraaksma opened this issue Jul 7, 2020 · 84 comments · Fixed by #40902
Closed

Make enum RegexParseError and RegexParseException public #38872

abelbraaksma opened this issue Jul 7, 2020 · 84 comments · Fixed by #40902
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.RegularExpressions
Milestone

Comments

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jul 7, 2020

Background and Motivation

A regular expression object made with System.Text.Regex is essentially an ad-hoc compiled sub-language that's widely used in the .NET community for searching and replacing strings. But unlike other programming languages, any syntax error is raised as an ArgumentException. Programmers that want to act on specific parsing errors need to manually parse the error string to get more information, which is error-prone, subject to change and sometimes non-deterministic.

We already have an internal RegexParseException and two properties: Error and Offset, which respectively give an enum of the type of error and the location in the string where the error is located. When presently an ArgumentException is raised, it is in fact a RegexParseException which inherits ArgumentException.

I've checked the existing code and I propose we make RegexParseException and RegexParseError public, these are pretty self-describing at the moment, though the enum cases may need better named choices (suggested below) . Apart from changing a few existing tests and adding documentation, there are no substantive changes necessary.

Use cases

  • Online regex tools may use the more detailed info to suggest corrections of the regex to users (like: "Did you forget to escape this character?").
  • Debugging experience w.r.t. regular expressions improves.
  • Currently, getting col and row requires parsing the string, and isn't in the string in Framework. Parsing in i18n scenarios is next to impossible, giving an enum and position helps writing better, and deterministic code
  • Improve tooling by using the offset to place squiggles under errors in regexes.
  • Self-correcting systems may use the extra info to close parentheses or brackets, or fix escape sequences that are incomplete.
  • It is simply better to be able to check for explicit errors than the more generic ArgumentException which is used everywhere.
  • BCL tests on regex errors now uses reflection, this is no longer necessary.

Related requests and proposals

Proposed API

The current API already exists but isn't public. The definitions are as follows:

    [Serializable]
-    internal sealed class RegexParseException : ArgumentException
+    public class RegexParseException : ArgumentException
    {
        private readonly RegexParseError _error; // tests access this via private reflection

        /// <summary>Gets the error that happened during parsing.</summary>
        public RegexParseError Error => _error;

        /// <summary>Gets the offset in the supplied pattern.</summary>
        public int Offset { get; }

        public RegexParseException(RegexParseError error, int offset, string message) : base(message)
        {
+            // add logic to test range of 'error' and return UnknownParseError if out of range
            _error = error;
            Offset = offset;
        }

        public override void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            base.GetObjectData(info, context);
            info.SetType(typeof(ArgumentException)); // To maintain serialization support with .NET Framework.
        }
    }

And the enum with suggested names for a more discoverable naming scheme. I followed "clarity over brevity" and have tried to start similar cases with the same moniker, so that an alphabetic listing gives a (somewhat) logical grouping in tooling.

I'd suggest we add a case for unknown conditions, something like UnknownParseError = 0, which could be used if users create this exception by hand with an invalid enum value.

Handy for implementers: Historical view of this prior to 22 July 2020 shows the full diff for the enum field by field. On request, it shows all as an addition diff now, and is ordered alphabetically.

-internal enum RegexParseError
+public enum RegexParseError
{
+    UnknownParseError = 0,    // do we want to add this catch all in case other conditions emerge?
+    AlternationHasComment,
+    AlternationHasMalformedCondition,  // *maybe? No tests, code never hits
+    AlternationHasMalformedReference,  // like @"(x)(?(3x|y)" (note that @"(x)(?(3)x|y)" gives next error)
+    AlternationHasNamedCapture,        // like @"(?(?<x>)true|false)"
+    AlternationHasTooManyConditions,   // like @"(?(foo)a|b|c)"
+    AlternationHasUndefinedReference,  // like @"(x)(?(3)x|y)" or @"(?(1))"
+    CaptureGroupNameInvalid,           // like @"(?< >)" or @"(?'x)"
+    CaptureGroupOfZero,                // like @"(?'0'foo)" or @("(?<0>x)"
+    ExclusionGroupNotLast,             // like @"[a-z-[xy]A]"
+    InsufficientClosingParentheses,    // like @"(((foo))"
+    InsufficientOpeningParentheses,    // like @"((foo)))"
+    InsufficientOrInvalidHexDigits,    // like @"\uabc" or @"\xr"
+    InvalidGroupingConstruct,          // like @"(?" or @"(?<foo"
+    InvalidUnicodePropertyEscape,      // like @"\p{Ll" or @"\p{ L}"
+    MalformedNamedReference,           // like @"\k<"
+    MalformedUnicodePropertyEscape,    // like @"\p{}" or @"\p {L}"
+    MissingControlCharacter,           // like @"\c"
+    NestedQuantifiersNotParenthesized  // @"abc**"
+    QuantifierAfterNothing,            // like @"((*foo)bar)"
+    QuantifierOrCaptureGroupOutOfRange,// like @"x{234567899988}" or @"x(?<234567899988>)" (must be < Int32.MaxValue)
+    ReversedCharacterRange,            // like @"[z-a]"   (only in char classes, see also ReversedQuantifierRange)
+    ReversedQuantifierRange,           // like @"abc{3,0}"  (only in quantifiers, see also ReversedCharacterRange)
+    ShorthandClassInCharacterRange,    // like @"[a-\w]" or @"[a-\p{L}]"
+    UndefinedNamedReference,           // like @"\k<x>"
+    UndefinedNumberedReference,        // like @"(x)\2"
+    UnescapedEndingBackslash,          // like @"foo\" or @"bar\\\\\"
+    UnrecognizedControlCharacter,      // like @"\c!"
+    UnrecognizedEscape,                // like @"\C" or @"\k<" or @"[\B]"
+    UnrecognizedUnicodeProperty,       // like @"\p{Lll}"
+    UnterminatedBracket,               // like @"[a-b"
+    UnterminatedComment,
}

* About IllegalCondition, this is thrown inside a conditional alternation like (?(foo)x|y), but appears to never be hit. There is no test case covering this error.

Usage Examples

Here's an example where we use the additional info to give more detailed feedback to the user:

public class TestRE
{
    public static Regex CreateAndLog(string regex)
    {
        try
        {
            var re = new Regex(regex);
            return re;
        }
        catch(RegexParseException reExc)
        {
            switch(reExc.Error)
            {
                case RegexParseError.TooFewHex:
                    Console.WriteLine("The hexadecimal escape contains not enough hex characters.");
                    break;
                case RegexParseError.UndefinedBackref:
                    Console.WriteLine("Back-reference in position {0} does not match any captures.", reExc.Offset);
                    break;
                case RegexParseError.UnknownUnicodeProperty:
                    Console.WriteLine("Error at {0}. Unicode properties must exist, see http://aka.ms/xxx for a list of allowed properties.", reExc.Offset);
                    break;
                // ... etc
            }
            return null;
        }
    }
}

Alternative Designs

Alternatively, we may remove the type entirely and merely throw an ArgumentException. But it is likely that some people rely on the internal type, even though it isn't public, as through reflection the contextual information can be reached and is probably used in regex libraries. Besides, removing it will make any future improvements in dealing with parsing errors and proposing fixes in GUIs much harder to do.

Risks

The only risk I can think of is that after exposing this exception, people would like even more details. But that's probably a good thing and only improves the existing API.

Note that:

  • Existing code that checks for ArgumentException continues to work.
  • While debugging, people who see the underlying exception type can now actually use it.
  • Existing code using reflection to get to the extra data may or may not not continue to work, it depends on how strict the search for the type is done.

[danmose: made some more edits]

@abelbraaksma abelbraaksma added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jul 7, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Jul 7, 2020
@ghost
Copy link

ghost commented Jul 7, 2020

Tagging subscribers to this area: @eerhardt
Notify danmosemsft if you want to be subscribed.

@danmoseley
Copy link
Member

danmoseley commented Jul 7, 2020

Thanks for writing this up.

By exposing the constructors, people can overwrite the exception in favor of a more specific one.

Do you have evidence folks need to do this? If not, we might leave them out - that's up to the API review folks.

For the enum,

  • is there some ordering that makes sense other than semi-random?
  • you might want to do some regularization eg, either "Ref" or "Reference" and not both "Ref" and "ref".
  • maybe "IncompleteSlashP" should say something about unicode categories instead?
  • "Capnum" is not a publicly meaningful name
  • "Invalid" is probably better than "Illegal" in new API

Offset should note it's zero based.

If we have the constructor it should probably verify RegexParseError isn't out of range, like FileStream does, but that's implementation. (And offset >= 0)

            if (access < FileAccess.Read || access > FileAccess.ReadWrite)
                throw new ArgumentOutOfRangeException(nameof(access), SR.ArgumentOutOfRange_Enum);

@CyrusNajmabadi any feedback given you've done something similar? specifically on the categories in the enum.

@abelbraaksma
Copy link
Contributor Author

I wasn't sure about the enum values, to leave them as is, or not. I don't particularly like them, I'll see if I can improve the names.

The order of the enum shouldn't matter, I think? But yeah, it's a bit random. I'll go over your other suggestions too.

@pgovind
Copy link
Contributor

pgovind commented Jul 7, 2020

I think this is close to being ready for review, save for the comments that Dan mentioned. Considering that it'd probably not get API reviewed in time for .NET 5, I'm going to mark it as Future, but I'd really like to see this get in for .NET 6. I'll just add that we'll need to document this class(and enum) when we make it public.

@pgovind pgovind removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@pgovind pgovind added this to the Future milestone Jul 7, 2020
@danmoseley
Copy link
Member

The order of the enum shouldn't matter, I think?

I think the only reason would be to make it easier on anyone reading it in the docs. In Visual Studio, I think it orders the completion list by usage.

BTW I was going to suggest to remove [Serializable] since we don't want to add that to anywhere new because BinaryFormatter is highly problematic. But it was already there - that would be a breaking change.

@SingleAccretion
Copy link
Contributor

BTW I was going to suggest to remove [Serializable] since we don't want to add that to anywhere new because BinaryFormatter is highly problematic

@danmosemsft The VS snippet for an exception has it as the default, heh. I was actually wondering, should I make exceptions in my library serializable? The recently added AsnContentException and CborContentException are both marked as such.

@danmoseley
Copy link
Member

danmoseley commented Jul 7, 2020

@SingleAccretion we do not recommend future use of BinaryFormatter unless you have no alternative. The primary reason is that it is vital you do not deserialize untrusted input, and it's too easy for system to evolve and end up feeding untrusted input. Eg., you have a game that saves state; users load state from their disk. Later, someone reports a bug and you ask for the saved game to reproduce the bug.

@GrabYourPitchforks is working on all up serialization guidance with a focus on doing it securely. He's also devising a plan for helping the ecosystem progressively move off BinaryFormatter. These should be available on the dotnet/designs repo or similar place in due course. Meantime, we're not adding [Serializable] anywhere new.

@danmoseley
Copy link
Member

The VS snippet for an exception has it as the default

@GrabYourPitchforks we should probably ask them to change that, right?

@SingleAccretion
Copy link
Contributor

Meantime, we're not adding [Serializable] anywhere new.

I see, so I will remove it then. I assume the *Content*Exceptions should also be made not serializable (maybe there should be an issue/PR for that)?

@danmoseley
Copy link
Member

@bartonjs should answer that.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 7, 2020

@danmosemsft, should the exception remain sealed? It doesn't seem sensible to leave it sealed, and u don't think it breaks anything if we open it up for inheritance.

What do you think?

@bartonjs
Copy link
Member

bartonjs commented Jul 7, 2020

If there's any chance the exception type could be used on .NET Framework, it should still be [Serializable], for exceptions-across-AppDomains concerns. If it's (semi-guaranteed) .NET5+-only, then I think we're now OK with not making it [Serializable].

@danmoseley
Copy link
Member

I don't know what the recommendation is on sealed. The API review can correct that if needed.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 10, 2020

@danmosemsft

I've gone over your comments and updated the original text. I went over each enum value to check the actual conditions used to throw it. This was a bit surprising at times (as in: not in line with the expected meaning of the token), which I've tried to reflect in suggested new names for the enum cases.

You comments were:

By exposing the constructors, people can overwrite the exception in favor of a more specific one.

Do you have evidence folks need to do this? If not, we might leave them out - that's up to the API review folks.

Done, removed. I have no evidence, was just a hunch.

For the enum,
is there some ordering that makes sense other than semi-random?

For ease of comparison with the original, I've left the order as-is, but chose a naming scheme that should make some sense when ordered alphabetically. I'm fine if the final order would either be alphabetical, or somewhat grouped.

you might want to do some regularization eg, either "Ref" or "Reference" and not both "Ref" and "ref".

Done, I've tried to make same-meaning-same-wording and where it made sense, I chose clarity-over-brevity (in this case, Reference, not Ref). I assume that this will be subject to change during the review.

maybe "IncompleteSlashP" should say something about unicode categories instead?

Done, including other names that benefited from better naming.

"Capnum" is not a publicly meaningful name

Done: NumericCaptureCannotBeZero.

"Invalid" is probably better than "Illegal" in new API

Done: IllegalRange -> ReversedQuantifierRange, in line with the other Reversed token. The word Illegal is gone in all monikers.

Offset should note it's zero based.

Done.

While testing each condition one by one, I've come across a few oddities and an unexpected NullReferenceException, I'll log those as bugs separately.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 10, 2020

Considering that it'd probably not get API reviewed in time for .NET 5, I'm going to mark it as Future, but I'd really like to see this get in for .NET 6

@pgovind, I'd argue it'd be a good, and relatively cheap improvement to the regex-experience to have this in for .NET 5 (cheap, because the code is already there, including relevant tests, except for some renaming). Of course, I cannot judge whether this makes the mark and I realize that .NET 5 is close to being frozen, but if it's somehow still possible to put it on the list, that would be really welcome :).

@danmoseley
Copy link
Member

This was a bit surprising at times (as in: not in line with the expected meaning of the token), which I've tried to reflect in suggested new names for the enum cases.

Are there any we should fix - at the same time as exposing the API? We own the code after all 🙂 and we want the enum values to be useful.

@danmoseley
Copy link
Member

if it's somehow still possible to put it on the list, that would be really welcome

The 5.0 milestone means "must fix to ship it". This is what Microsoft developers are focusing on. It doesn't mean "not allowed in release" for another few weeks yet. It could get in if we could do API review (likely bottleneck) plus then a community member gets a PR up and we can merge it.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 10, 2020

plus then a community member gets a PR up and we can merge it.

Got it. I'll volunteer, it's a rather trivial change anyway.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 10, 2020

Are there any we should fix - at the same time as exposing the API? We own the code after all 🙂 and we want the enum values to be useful.

The new proposed names in the updated OP are sufficient. There can be some debate on whether we should improve certain errors (as in, raise the more specific one if we have enough information). The few I thought make that threshold are in the linked issue (#39075).

But to be clear, the current behavior wrt to the error text is as one would expect, the clarification is on the choice of enum names (like: where an enum only ever is used within conditional alternation contexts, I've proposed a name that clarifies precisely that).

I've checked the source code and tests for each of them to be sure this is the case, and added a small example of each error to help the reviewers.

@abelbraaksma
Copy link
Contributor Author

@danmosemsft, is there anything else that needs attention, or can it be marked api-ready-for-review?

@danmoseley
Copy link
Member

danmoseley commented Jul 22, 2020

Missed this, excuse me.

I think it looks good -- would you mind breaking the diff into additions and subtractions? That isn't what we normally do but this is a bit special because the "subtractions" are internal. So they are irrelevant to API review, only to the implementer.

Edit: actually I suggest simply showing the adds, then you can alphabetize which is what they'll likely want.

IllegalCondition

It would be good to either find a way to hit this (and ideally add a test) or remove it, so we can decide whether we need it or not.

One other thing occurred to me -- do we know that the offsets are correct? Those would just be bugs, not influencing the API, but those bugs would be exposed as soon as we offer this API.

@eerhardt @pgovind any other feedback or can we mark this api-ready-for-review?

@danmoseley
Copy link
Member

Some other low value comments

  • some refer to XXXNamedReference, some to XXBackReference. I think named backreferences are still backreferences?
  • some describe the problem eg ShorthandClassInCharRange, InsufficientOrInvalidHexDigits, AlternationHasTooManyConditions and some the broken rule eg NestedQuantifiersMustBeParenthesized, AlternationCannotHaveComment, ExclusionGroupMustBeLast. Maybe it should stick to one style or the other. I suggest the former as I think ti's more typical for compilers, etc.
  • ShorthandClassInCharRange contracts Character ?

@terrajobst
Copy link
Member

terrajobst commented Aug 11, 2020

@abelbraaksma

Apologies -- my email was confusing. It's scheduled for tomorrow August 11, 10 AM PDT or which should be 19:00 CEST (AFAIK your timezone). The meeting is scheduled for two hours and this issue will be discussed an hour into the meeting. Hope this helps :-)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 11, 2020

@terrajobst, @pgovind, Thanks, you're right, I misread: it's indeed today ;) (I actually clicked the meeting link and it said there was nobody, I drew the wrong conclusion :P). As it looks now, I'll be online by then.

And yes, I'm based in Amsterdam, which is CEST.

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 11, 2020
@terrajobst
Copy link
Member

terrajobst commented Aug 11, 2020

Video

  • The type is currently serializable and people do serialize the exception by catching ArgumentException (or Exception). To preserve cross-framework deserialization, like when someone deserializes this exception on .NET Framework (or earlier versions of .NET Core), we need to keep serializing as ArgumentException, not as RegexParseException.
  • This also means that Error and Offset are lost when crossing framework boundaries. Hence, we should also improve the message.
  • We don't want to have a separate public type from the internal one, but we should make sure that all enum members are actually used by the implementation/are reachable. If there are unused values, we should remove them.
  • We should probably preserve the current ordering, if only to discourage the urge to make future additions alphabetically sorted.
namespace System.Text.RegularExpressions
{
    [Serializable]
    public sealed class RegexParseException : ArgumentException
    {
        public RegexParseException(RegexParseError error, int offset);
        private RegexParseException(SerializationInfo info, StreamingContext context)
        {
            // It means someone modified the payload.
            throw new NotImplementedException();
        }
        public override void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            // We'll serialize as an instance of ArgumentException
        }
        public RegexParseError Error { get; }
        public int Offset { get; }
    }
    public enum RegexParseError
    {
        Unknown,
        AlternationHasComment,
        AlternationHasMalformedCondition,
        AlternationHasMalformedReference,
        AlternationHasNamedCapture,
        AlternationHasTooManyConditions,
        AlternationHasUndefinedReference,
        CaptureGroupNameInvalid,
        CaptureGroupOfZero,
        ExclusionGroupNotLast,
        InsufficientClosingParentheses,
        InsufficientOpeningParentheses,
        InsufficientOrInvalidHexDigits,
        InvalidGroupingConstruct,
        InvalidUnicodePropertyEscape,
        MalformedNamedReference,
        MalformedUnicodePropertyEscape,
        MissingControlCharacter,
        NestedQuantifiersNotParenthesized,
        QuantifierAfterNothing,
        QuantifierOrCaptureGroupOutOfRange,
        ReversedCharacterRange,
        ReversedQuantifierRange,
        ShorthandClassInCharacterRange,
        UndefinedNamedReference,
        UndefinedNumberedReference,
        UnescapedEndingBackslash,
        UnrecognizedControlCharacter,
        UnrecognizedEscape,
        UnrecognizedUnicodeProperty,
        UnterminatedBracket,
        UnterminatedComment
    }
}

@terrajobst
Copy link
Member

@danmosemsft, can we consider this for .NET 5? Seems easy enough :-)

@abelbraaksma
Copy link
Contributor Author

We should probably preserve the current ordering, if only to discourage the urge to make future additions alphabetically sorted.

In Jeremy Barton's words at the meeting: "it would be good if people using reflection currently could fix their code to the public version by doing a x + 1 on each value (due to the new Unknown on top)". I agree that this is reasonable and to keep the original order, tooling will do the alphabetizing anyway.

@danmoseley
Copy link
Member

@danmosemsft, can we consider this for .NET 5? Seems easy enough

Team members are all working on work required for 5.0. If a community member volunteers and it can get merged before we branch on Monday, sure.

@abelbraaksma
Copy link
Contributor Author

@danmosemsft, I can try to get something in by Friday, though I'm not sure about the point of improving the error messages, which is something that tends to need a lot of back and forth to iron out. The renaming part, and fixing the tests should be relatively straightforward.

Of course, than it still depends on whether it can be reviewed in time to be merged in time.

@danmoseley
Copy link
Member

danmoseley commented Aug 11, 2020

Sure. Totally up to you

@danmoseley
Copy link
Member

BTW @abelbraaksma the only tricky part of this I expect will be adding the test for binary serialization to/from the new type, to address the concern that this not break. We have tests for this

The bulk of such testing is done through blobs in this file - they test to and from for .NET Framework and .NET Core:
https://raw.githubusercontent.com/danmosemsft/runtime/7a1ff8272bd8afe74ed1b98b8c7d1f6c6a6d2a07/src/libraries/System.Runtime.Serialization.Formatters/tests/BinaryFormatterTestData.cs

It's fiddly to update those blobs. But happily I see there's already a specific test however for serializing a RegexParseException on .NET Core and deserializating as an ArgumentException on .NET Framework!
https://github.com/danmosemsft/runtime/blob/7a1ff8272bd8afe74ed1b98b8c7d1f6c6a6d2a07/src/libraries/System.Runtime.Serialization.Formatters/tests/BinaryFormatterTests.cs#L137
It maybe should verify the message was preserved but I think that test should be sufficient to protect the behavior they're asking for.

@GrabYourPitchforks
Copy link
Member

Adding a test for BinaryFormatter serialization should be relatively straightforward. Serialize an instance of the new exception, then deserialize it, and validate that deserialized.GetType() is ArgumentException.

@danmoseley
Copy link
Member

That's what that test does, basically

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 14, 2020

@danmosemsft, thanks for the pointers on those tests! I'm actually not sure I can finish today for the simple reason that I couldn't get the build succeed (only to find out that I needed to update VS, oops). Behind a relatively slow connection, the download of 5GB for the update can take a while...

Oh, and I get 502's from https://pkgs.dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_packaging/a8a526e9-91b3-4569-ba2d-ff08dbb7c110/nuget/v3/flat2/runtime.win-x86.microsoft.netcore.coredistools/1.0.1-prerelease-00005/runtime.win-x86.microsoft.netcore.coredistools.1.0.1-prerelease-00005.nupkg, but my hope is that they'll disappear magically.

While I think the changes i made are sound (the renaming part, at least), I'd prefer to at least successfully build it locally again before submitting a PR.

Anyway, I expect to submit a working PR somewhere tonight (which may still be day for you :). We'll see how far we get.

@danmoseley
Copy link
Member

@abelbraaksma I don't know your schedule - we don't branch until Monday, if that helps.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 14, 2020

Hmm, strange, I would swear I could build stuff before. After updating everything (I ran into this briefly too: #40283), I get:

C:\Users\Abel.nuget\packages\microsoft.net.compilers.toolset\3.8.0-2.20403.2\tasks\netcoreapp3.1\Microsoft.CSharp.Core.targets(70,5): error : Cannot use stream for resource [D:\Projects\Github\Runtim
e.dotnet\shared\Microsoft.NETCore.App\5.0.0-preview.6.20305.6\Microsoft.NETCore.App.deps.json]: No such file or directory [D:\Projects\Github\Runtime\tools-local\tasks\installer.tasks\installer.tasks
.csproj]

Since this points to the directory in the local .dotnet path, it appears that either (a) that location it's broken or (b) it's not downloading/installing the things properly there. Not quite sure what's happening here, though I suspect git clean -xdf, which appears not to be able to clean everything, causes this. EDIT: it looks like dotnet.exe doesn't get killed properly when I use build.cmd, leading to open handles, leading to not cleaning properly. Let's see what we get now.

I don't know your schedule - we don't branch until Monday, if that helps.

I may be needing the weekend after all ;).

@danmoseley
Copy link
Member

I have never seen that error and if I got it I would probably git clean -fdx on the repo and git pull and then build again explicitly using the dotnet out of the .dotnet folder (which isn't required, but why not)

@pgovind
Copy link
Contributor

pgovind commented Aug 14, 2020

I've never seen that error either. If your build still fails, tell us your environment too? Or, even better, try it again by passing in -bl in the command line to generate a bin log.

I can still most likely review your PR over the weekend if you put it up, so you have some time :)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 14, 2020

Thanks for the support. The error is gone (I edited the comment with the cause: dotnet.exe was hanging and made git clean fail in creative ways).

EDIT: Scratch that, what I posted here before about ml64.exe seems to be caused by lib.exe not being found, same as this one: #13114. Though I actually did do a restart minutes ago.

I'm in an x64 Native Tools Command Prompt (and tried a normal Administrator cmd prompt too, same error). I'll check why it isn't available. it's in the path and available in the command window that I use to build this. 🤔

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 14, 2020

Ok, this appears to be caused by a new cmd.exe, which probably doesn't have lib.exe on the same path (it doesn't inherit it?). Since I have basically all Visual Studio's and Windows SDK's from the last 20 years or so, something has gotten amiss here. I need to find the command in the build that fires up a new cmd.exe and have it inherit the parent, or just update the global path and add the latest version on top.

Anyway, 1AM here, tomorrow things will be better. Thanks for the quick replies so far, it is much appreciated!

@danmoseley
Copy link
Member

Repairing your VS (the newest one?) might make sure the environment variables are correct -not sure though.

@abelbraaksma
Copy link
Contributor Author

Repair didn't help, nor did any other things I tried to make the ml64.exe discoverable (it's there, it's on the path, I can run it etc etc).

So I decided to fire up a VM with a clean Win10 and a clean recent VS2019 and go from there. While this took some time, I have managed to get the initial build going. Hopefully the rest will go smooth now ;)

image

@danmoseley
Copy link
Member

Yeesh Msybe you can fix the env vars on your original machine by looking at the ones in your VM..

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 16, 2020

@danmosemsft the starting point is there, but I'll need a little help on how to proceed w.r.t. inline documentation (I assume that's supposed to be added here as well): #40902.

Yeesh Msybe you can fix the env vars on your original machine by looking at the ones in your VM..

I did something more "brute force", I just copied the whole build dir over to my primary dev machine. Turns out this allowed me to build & run tests for System.Text.RegularExpressions. Didn't try much else yet as this is all that's needed right now anyway.

(I also created an Azure VM, but for the last hour or so that has still been building, it is not very fast...)

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Aug 18, 2020

For posterity: this has now been implemented, details are in the PR: #40902. Thanks to all for reviewing this proposal and for the valuable feedback. Special thanks to @danmosemsft to help get this past the finish line, and assisting with the parts in this process I was yet unfamiliar with.

It was in the nick of time to make it into .NET 5 🥳.

@danmoseley
Copy link
Member

We'd welcome other contributions if you're interested @abelbraaksma . There's plenty up for grabs. If you want another in regex specifically there are opportunities for perf improvement or features eg #25598 (that one I think is a bit involved..)

@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.RegularExpressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants