Skip to content

XmlDocument slow with many children #120297

@adrianm64

Description

@adrianm64

Description

I ran into an issue in an application where a XmlDocument had a node with 150000 text node children.
The element contained an base64-encoded pdf which for some reason was split up in text nodes of 256 characters each.

The problem is not the split but that it takes long time to iterate over all children.

for (var node = root.FirstChild; node != null; node = node.NextSibling)
{
    count += 1;
}

(and also node.ChildNodes.Count)

In my test it also looks like the iteration time is exponential.
50000 text nodes = 1.8s
100000 text nodes = 7s
150000 text nodes = 16s

My test code

void Main()
{
	var doc = new XmlDocument();
	var root = doc.CreateElement("Root");
	doc.AppendChild(root);
	AddChildren(doc, root, 150000);
	CountChildren(doc.FirstChild).Dump();
}

void AddChildren(XmlDocument doc, XmlNode node, int count)
{
	string text = new string('x', 256);
	for (int ii = 0; ii < count; ++ii)
	{
		var textNode = doc.CreateTextNode(text);
		node.AppendChild(textNode);
	}
}

int CountChildren(XmlNode root)
{
	int count = 0;
	var sw = new Stopwatch();
	sw.Start();
	for (var node = root.FirstChild; node != null; node = node.NextSibling) {
		count += 1;
	}

	Console.WriteLine($"{count} - {sw.ElapsedMilliseconds}ms");
	return count;
}

Could not figure out why, but in my real application it takes much longer time.
The source for the XmlDocument in that case is a WCF message created from

var doc = new XmlDocument();
using (var docWriter = doc.CreateNavigator().AppendChild())
{
    message.WriteBody(docWriter);
}

Configuration

Benchmark Process Environment Information:
BenchmarkDotNet v0.13.8
Runtime=.NET 9.0.9 (9.0.925.41916), X64 RyuJIT AVX2
GC=Concurrent Workstation
HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT,AvxVnni,SERIALIZE VectorSize=256

Regression?

I got the same result in .Net Core 3.1.32

Data

Analysis

I looked at the source code which seems to be a single linked list. Have no idea how that can result in exponential iteration time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions