Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XElement - LoadOptions.SetLineInfo - very slow & memory times 2 #498

Open
304NotModified opened this issue Nov 22, 2019 · 2 comments
Open

Comments

@304NotModified
Copy link

Hi,

I noticed that using XElement.Load(stream, LoadOptions.SetLineInfo) is very slow and memory consuming is times 2 compared to XElement.Load(stream, LoadOptions.None)

See table. For the completeness, I also added the results for the generated for class (XSD to C#) - which is very fast and not using much memory but is lacking the lineinfo.

Tested it with a XML file of 914MB

Method Memory Time
XElement without lineinfo (LoadOptions.None) 2.213.541 Mb (2.271.092.736 bytes) 27 Sec (26.502 ms)
XElement with lineinfo (LoadOptions.SetLineInfo) 5.523.492 Mb (5.667.102.720 bytes) 46 Sec (45.781 ms)
Own "XElement" without lineinfo 2.825.026 Mb (2.898.477.056 bytes) 25 Sec (24.584 ms)
Own "XElement" with lineinfo 2.825.226 Mb (2.898.681.856 bytes) 26 Sec (25.969 ms)
XSD generated classes 845.057 Mb (867.028.992 bytes) 27 Sec (26.966 ms)

image

image

Own "XElement"

The funny part is that I wrote my own "XElement" in the past, and there is nearly no difference in capturing the lineInfo compared to non-capure. I'm doing for all element and all attributes XmlPosition.Create(reader):

public struct XmlPosition
{
    public XmlPosition(int lineNumber, int columnNumber)
    {
        if (lineNumber < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(lineNumber));
        }

        if (columnNumber < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(columnNumber));
        }

        LineNumber = lineNumber;
        ColumnNumber = columnNumber;
    }
    
    public int LineNumber { get; }

    public int ColumnNumber { get;  }

    public static XmlPosition Create(System.Xml.XmlReader reader)
    {
        if(reader is IXmlLineInfo lineInfo)
        {
            return new XmlPosition(lineInfo.LineNumber, lineInfo.LinePosition);
        }
        
        throw new NotImplementedException("ow noos");
    }
}

So I can't understand why the XElement is using so much memory/time for SetLineInfo!

XmlReaderSettings

PS: also tried, but no significant change.

var options = new XmlReaderSettings();
options.IgnoreWhitespace = true;
options.DtdProcessing = DtdProcessing.Ignore;
options.MaxCharactersFromEntities = 10000000L;
options.Async = true;
options.CloseInput = true;


XElement xml;
using (XmlReader reader = XmlReader.Create(s, options))
{
    xml = XElement.Load(reader, LoadOptions.SetLineInfo);
}

Test

  • using Process.GetCurrentProcess().WorkingSet64 for memory
  • using System.Diagnostics.StopWatch for timings

Env

Tested with .NET Core 2.2, Windows Server 2016 datacenter

@304NotModified 304NotModified changed the title XElement - LoadOptions.SetLineInfo - very slow XElement - LoadOptions.SetLineInfo - very slow & memory times 2 Nov 22, 2019
@krwq
Copy link
Member

krwq commented Dec 3, 2019

@304NotModified this is something we would have to investigate. Do you have your xml file somewhere permanently available? Do you perhaps have perf logs with and without line info available to compare?

Would you be interested in doing the investigation/fixes?

PS. We have merged corefx and coreclr into dotnet/runtime repo so that's where the issue should be filed (it will get moved there eventually)

@danmoseley danmoseley transferred this issue from dotnet/corefx Dec 4, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2019
@danmoseley
Copy link
Member

@304NotModified as an aside, we recommend using Benchmark.NET for profiling. This is what we use ourselves. Although, in this case the difference seems to be large enough that it doesn't matter...

@buyaa-n buyaa-n added this to the 6.0.0 milestone Jun 29, 2020
@buyaa-n buyaa-n removed the untriaged New issue has not been triaged by the area owner label Jun 29, 2020
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this issue Mar 25, 2021
Fixes a regression in:

JIT\Methodical\eh\deadcode\deadoponerrorinfunclet_r
@krwq krwq modified the milestones: 6.0.0, Future Jul 22, 2021
@krwq krwq moved this from 6.0 to Future in Triage POD for Meta, Reflection, etc Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

6 participants