Tested large files for performance #48

NikolasKomonen · 2018-08-09T16:05:00Z

No description provided.

angelozerr · 2018-08-09T16:10:55Z

Have you some trouble with large file?

fbricon · 2018-08-09T16:15:29Z

not necessarily yet, but we need to think about setting up performance tests, once all the major features are complete

angelozerr · 2018-08-09T16:24:14Z

not necessarily yet, but we need to think about setting up performance tests, once all the major features are complete

If there is performance problem with big file, I think vscode html language service will have the same problem since XMLDocument is rebuild each time you type some content inside editor (TextDocumentSyncKind.Full).

I think to improve performance, TextDocumentSyncKind.Incremental should be used, but I think it's an hard thing to do.

fbricon · 2018-08-09T17:00:56Z

@angelozerr whatever performance the vscode html has, since the xml parser starts to deviate from it (in many ways), we might see a different behaviour, better or worse.
I'm not saying we need to improve the performance now, just that we need to keep track of it eventually. What matters now is to provide correct results. Worry about performance later.

angelozerr · 2018-08-11T10:38:20Z

What matters now is to provide correct results. Worry about performance later.

Totally agree with you. Thanks for creating this issue.

NikolasKomonen · 2018-09-04T19:34:07Z

After testing largeFile.txt
I deleted the 2nd last tag </a> in Intellij it took ~2.8 seconds and in lsp4xml ~4.7 seconds till a missing end tag response was received.

fbricon · 2018-09-04T19:48:20Z

15000L is not what I call large :-) I was thinking about 10's of MB (even that's not large, depending on the context)
Anyways, I think we're getting murdered by TextDocumentSyncKind.Full, as we're sending the full document over the connection on each document change, the whole document is rebuilt entirely on every keystroke, it's not scaling well.
For now, we'll call it a known limitation, but we'll have to work on improving performance once we get the initial features right.

angelozerr · 2018-09-05T10:13:17Z

@NikolasKomonen thanks for testing that and attached your large file.

Anyways, I think we're getting murdered by TextDocumentSyncKind.Full, as we're sending the full document over the connection on each document change, the whole document is rebuilt entirely on every keystroke, it's not scaling well.

Indeed I think it can be a problem, and it was my fear, because we need to manage "incremental" parser which is an hard task I think.

BUT I have started to study the problem, and building the XMLDocument directly from the given file takes 1590 ms which is too big. It seems the problem comes from the regexp. I will give you feedback and try to fix the problem.

angelozerr · 2018-09-05T10:15:44Z

See test at https://github.com/angelozerr/lsp4xml/blob/master/org.eclipse.lsp4xml/src/test/java/org/eclipse/lsp4xml/internal/parser/XMLDocumentTest.java

large string (see #48)

angelozerr · 2018-09-06T07:09:43Z

@NikolasKomonen I have improved performance, your large file takes 195 ms instead of 1590 ms!

Please give me feedback.

fbricon · 2018-09-06T13:35:55Z

@angelozerr awesome work! This makes the extension much snappier!!!
did you use a profiler or just noticed that substring hanging there?
Are there other places where substring/string concatenation might be hurting us?

angelozerr · 2018-09-06T13:41:20Z

@angelozerr awesome work! This makes the extension much snappier!!!

Glad it pleases you:) I think now we have the same performance than HTML Language Server.

did you use a profiler or just noticed that substring hanging there?

To be honnest with you, I'm not very familiar with profiler, I have just noticed this hang (at first I though it was because of regexp, but it was about String#substring)

Are there other places where substring/string concatenation might be hurting us?

I have tried to check that, but I think it's OK. The main problem is to do a substring from a position to string length (it creates a very large String each time, using Matcher#region locate just the matcher. We could use the same Matcher too, but it seems that it doesn't improve performance.

NikolasKomonen · 2018-09-06T15:30:56Z

@angelozerr This is awesome, great find.

angelozerr · 2018-09-06T15:34:43Z

Thanks guys!

large string (see eclipse#48)

operations are called at the same time (ex: diagnostic, highlight, documentSymbol created 3 XMLDocument, now one XMLDocument is created). see #48

fbricon · 2018-09-27T11:56:42Z

Here's a site to find xml documents of various sizes, some pretty big: http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/

Tried with a 30MB doc in vscode. Never reached the point where I got error reported. I'll try with incremental support later

angelozerr · 2018-09-27T13:55:21Z

Here's a site to find xml documents of various sizes, some pretty big: http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/

Cool!

Tried with a 30MB doc in vscode. Never reached the point where I got error reported. I'll try with incremental support later

Could you give me the link of your xml that you are testing please.

fbricon · 2018-11-15T16:48:30Z

Seems the server chokes on the nasa.xml from http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html#nasa

Tried with xmx2GB

angelozerr · 2018-11-15T18:10:56Z

@fbricon I have added the nasa.xml in the test:

for largeFile.xml:

Parsed 'largeFile.xml' with XMLScanner in 31 ms.
Parsed 'largeFile.xml' with XMLParser in 25 ms.

for nasa.xml:

Parsed 'nasa.xml' with XMLParser in 731 ms.
Parsed 'nasa.xml' with XMLScanner in 371 ms.

fbricon · 2018-11-15T18:15:17Z

try validation, formatting, hover...

angelozerr · 2018-11-16T09:58:00Z

try validation, formatting, hover...

Yes sure we need too add thoses tests. But if creation of XMLDocument takes 731ms, I think we will have slow problem.

WTP XML Editor cannot open it too your nasa.xml.

I fear that it will very hard to support very large file. I think a problem is because it's not incremental. Have you tried with "experimental" incremental support?

angelozerr · 2019-07-17T04:49:42Z

I close this issue since 0.8.0 improves performance and memories and gives the capability to disable outline.

NikolasKomonen added the enhancement New feature or request label Aug 9, 2018

fbricon added the performance This issue or enhancement is related to performance concerns label Sep 4, 2018

angelozerr added a commit that referenced this issue Sep 5, 2018

Add XMLDocumentTest to test performance (see #48)

e092e44

angelozerr added a commit that referenced this issue Sep 6, 2018

Improve performance of scanner by avoiding using String#substring in

bf0a722

large string (see #48)

NikolasKomonen pushed a commit to NikolasKomonen/lsp4xml that referenced this issue Sep 7, 2018

Add XMLDocumentTest to test performance (see eclipse#48)

bbe04e7

NikolasKomonen pushed a commit to NikolasKomonen/lsp4xml that referenced this issue Sep 7, 2018

Improve performance of scanner by avoiding using String#substring in

cc9d9b0

large string (see eclipse#48)

angelozerr added a commit that referenced this issue Sep 12, 2018

Synchronize the language model cache to avoid creating XMLDocument when

65f5b1f

operations are called at the same time (ex: diagnostic, highlight, documentSymbol created 3 XMLDocument, now one XMLDocument is created). see #48

angelozerr added a commit that referenced this issue Nov 15, 2018

Add big file 'nasa.xml' in the test (see #48)

ae60f52

angelozerr mentioned this issue Apr 8, 2019

Experimental Incremental support #133

Closed

angelozerr closed this as completed Jul 17, 2019

angelozerr self-assigned this Jul 17, 2019

angelozerr added this to the v0.8.0 milestone Jul 17, 2019

angelozerr removed the enhancement New feature or request label Jul 23, 2019

angelozerr changed the title ~~Test Large Files for Performance~~ Tested large files for performance Jul 23, 2019

rgrunber mentioned this issue May 11, 2022

A test file contains "private non-commercial use" clause #1197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tested large files for performance #48

Tested large files for performance #48

NikolasKomonen commented Aug 9, 2018

angelozerr commented Aug 9, 2018

fbricon commented Aug 9, 2018

angelozerr commented Aug 9, 2018

fbricon commented Aug 9, 2018

angelozerr commented Aug 11, 2018

NikolasKomonen commented Sep 4, 2018 •

edited

fbricon commented Sep 4, 2018

angelozerr commented Sep 5, 2018

angelozerr commented Sep 5, 2018

angelozerr commented Sep 6, 2018

fbricon commented Sep 6, 2018

angelozerr commented Sep 6, 2018

NikolasKomonen commented Sep 6, 2018

angelozerr commented Sep 6, 2018

fbricon commented Sep 27, 2018 •

edited

angelozerr commented Sep 27, 2018 •

edited

fbricon commented Nov 15, 2018

angelozerr commented Nov 15, 2018

fbricon commented Nov 15, 2018

angelozerr commented Nov 16, 2018

angelozerr commented Jul 17, 2019

Tested large files for performance #48

Tested large files for performance #48

Comments

NikolasKomonen commented Aug 9, 2018

angelozerr commented Aug 9, 2018

fbricon commented Aug 9, 2018

angelozerr commented Aug 9, 2018

fbricon commented Aug 9, 2018

angelozerr commented Aug 11, 2018

NikolasKomonen commented Sep 4, 2018 • edited

fbricon commented Sep 4, 2018

angelozerr commented Sep 5, 2018

angelozerr commented Sep 5, 2018

angelozerr commented Sep 6, 2018

fbricon commented Sep 6, 2018

angelozerr commented Sep 6, 2018

NikolasKomonen commented Sep 6, 2018

angelozerr commented Sep 6, 2018

fbricon commented Sep 27, 2018 • edited

angelozerr commented Sep 27, 2018 • edited

fbricon commented Nov 15, 2018

angelozerr commented Nov 15, 2018

fbricon commented Nov 15, 2018

angelozerr commented Nov 16, 2018

angelozerr commented Jul 17, 2019

NikolasKomonen commented Sep 4, 2018 •

edited

fbricon commented Sep 27, 2018 •

edited

angelozerr commented Sep 27, 2018 •

edited