Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tested large files for performance #48

Closed
NikolasKomonen opened this issue Aug 9, 2018 · 21 comments
Closed

Tested large files for performance #48

NikolasKomonen opened this issue Aug 9, 2018 · 21 comments
Assignees
Labels
performance This issue or enhancement is related to performance concerns
Milestone

Comments

@NikolasKomonen
Copy link
Contributor

No description provided.

@NikolasKomonen NikolasKomonen added the enhancement New feature or request label Aug 9, 2018
@angelozerr
Copy link
Contributor

Have you some trouble with large file?

@fbricon
Copy link
Contributor

fbricon commented Aug 9, 2018

not necessarily yet, but we need to think about setting up performance tests, once all the major features are complete

@angelozerr
Copy link
Contributor

not necessarily yet, but we need to think about setting up performance tests, once all the major features are complete

If there is performance problem with big file, I think vscode html language service will have the same problem since XMLDocument is rebuild each time you type some content inside editor (TextDocumentSyncKind.Full).

I think to improve performance, TextDocumentSyncKind.Incremental should be used, but I think it's an hard thing to do.

@fbricon
Copy link
Contributor

fbricon commented Aug 9, 2018

@angelozerr whatever performance the vscode html has, since the xml parser starts to deviate from it (in many ways), we might see a different behaviour, better or worse.
I'm not saying we need to improve the performance now, just that we need to keep track of it eventually. What matters now is to provide correct results. Worry about performance later.

@angelozerr
Copy link
Contributor

What matters now is to provide correct results. Worry about performance later.

Totally agree with you. Thanks for creating this issue.

@NikolasKomonen
Copy link
Contributor Author

NikolasKomonen commented Sep 4, 2018

After testing largeFile.txt
I deleted the 2nd last tag </a> in Intellij it took ~2.8 seconds and in lsp4xml ~4.7 seconds till a missing end tag response was received.

@fbricon
Copy link
Contributor

fbricon commented Sep 4, 2018

15000L is not what I call large :-) I was thinking about 10's of MB (even that's not large, depending on the context)
Anyways, I think we're getting murdered by TextDocumentSyncKind.Full, as we're sending the full document over the connection on each document change, the whole document is rebuilt entirely on every keystroke, it's not scaling well.
For now, we'll call it a known limitation, but we'll have to work on improving performance once we get the initial features right.

@fbricon fbricon added the performance This issue or enhancement is related to performance concerns label Sep 4, 2018
@angelozerr
Copy link
Contributor

@NikolasKomonen thanks for testing that and attached your large file.

Anyways, I think we're getting murdered by TextDocumentSyncKind.Full, as we're sending the full document over the connection on each document change, the whole document is rebuilt entirely on every keystroke, it's not scaling well.

Indeed I think it can be a problem, and it was my fear, because we need to manage "incremental" parser which is an hard task I think.

BUT I have started to study the problem, and building the XMLDocument directly from the given file takes 1590 ms which is too big. It seems the problem comes from the regexp. I will give you feedback and try to fix the problem.

angelozerr added a commit that referenced this issue Sep 6, 2018
@angelozerr
Copy link
Contributor

@NikolasKomonen I have improved performance, your large file takes 195 ms instead of 1590 ms!

Please give me feedback.

@fbricon
Copy link
Contributor

fbricon commented Sep 6, 2018

@angelozerr awesome work! This makes the extension much snappier!!!
did you use a profiler or just noticed that substring hanging there?
Are there other places where substring/string concatenation might be hurting us?

@angelozerr
Copy link
Contributor

@angelozerr awesome work! This makes the extension much snappier!!!

Glad it pleases you:) I think now we have the same performance than HTML Language Server.

did you use a profiler or just noticed that substring hanging there?

To be honnest with you, I'm not very familiar with profiler, I have just noticed this hang (at first I though it was because of regexp, but it was about String#substring)

Are there other places where substring/string concatenation might be hurting us?

I have tried to check that, but I think it's OK. The main problem is to do a substring from a position to string length (it creates a very large String each time, using Matcher#region locate just the matcher. We could use the same Matcher too, but it seems that it doesn't improve performance.

@NikolasKomonen
Copy link
Contributor Author

@angelozerr This is awesome, great find.

@angelozerr
Copy link
Contributor

Thanks guys!

NikolasKomonen pushed a commit to NikolasKomonen/lsp4xml that referenced this issue Sep 7, 2018
NikolasKomonen pushed a commit to NikolasKomonen/lsp4xml that referenced this issue Sep 7, 2018
angelozerr added a commit that referenced this issue Sep 12, 2018
operations are called at the same time (ex: diagnostic, highlight,
documentSymbol created 3 XMLDocument, now one XMLDocument is created).
see #48
@fbricon
Copy link
Contributor

fbricon commented Sep 27, 2018

Here's a site to find xml documents of various sizes, some pretty big: http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/

Tried with a 30MB doc in vscode. Never reached the point where I got error reported. I'll try with incremental support later

@angelozerr
Copy link
Contributor

angelozerr commented Sep 27, 2018

Here's a site to find xml documents of various sizes, some pretty big: http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/

Cool!

Tried with a 30MB doc in vscode. Never reached the point where I got error reported. I'll try with incremental support later

Could you give me the link of your xml that you are testing please.

@fbricon
Copy link
Contributor

fbricon commented Nov 15, 2018

Seems the server chokes on the nasa.xml from http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html#nasa

Tried with xmx2GB

@angelozerr
Copy link
Contributor

@fbricon I have added the nasa.xml in the test:

  • for largeFile.xml:
Parsed 'largeFile.xml' with XMLScanner in 31 ms.
Parsed 'largeFile.xml' with XMLParser in 25 ms.

  • for nasa.xml:
Parsed 'nasa.xml' with XMLParser in 731 ms.
Parsed 'nasa.xml' with XMLScanner in 371 ms.

@fbricon
Copy link
Contributor

fbricon commented Nov 15, 2018

try validation, formatting, hover...

@angelozerr
Copy link
Contributor

try validation, formatting, hover...

Yes sure we need too add thoses tests. But if creation of XMLDocument takes 731ms, I think we will have slow problem.

WTP XML Editor cannot open it too your nasa.xml.

I fear that it will very hard to support very large file. I think a problem is because it's not incremental. Have you tried with "experimental" incremental support?

@angelozerr
Copy link
Contributor

I close this issue since 0.8.0 improves performance and memories and gives the capability to disable outline.

@angelozerr angelozerr self-assigned this Jul 17, 2019
@angelozerr angelozerr added this to the v0.8.0 milestone Jul 17, 2019
@angelozerr angelozerr removed the enhancement New feature or request label Jul 23, 2019
@angelozerr angelozerr changed the title Test Large Files for Performance Tested large files for performance Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance This issue or enhancement is related to performance concerns
Projects
None yet
Development

No branches or pull requests

3 participants