Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tool] Add performance test for parsing time #340

Merged
merged 8 commits into from
Nov 20, 2019

Conversation

kobawan
Copy link
Contributor

@kobawan kobawan commented Oct 18, 2019

Description

Script to calculate the parsing speed of VAST client (fetch and parse) and VAST parser (just parse). Each test is run 100 times, to get the average.
Current alpha 8 results:
Screenshot 2019-11-06 at 15 33 26

Type

  • Breaking change
  • Enhancement
  • Fix
  • Documentation
  • Tooling

@kobawan kobawan added wip Work in progress chore Anything not involving production features do not merge labels Oct 18, 2019
@kobawan kobawan self-assigned this Oct 18, 2019
const promises = arr.map(() => getParsingTime(VastParser, inlineXml));

Promise.all(promises).then(results => {
const average = getAverage(results);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: The higher the numberOfRuns the higher the average becomes. I have changed from Date.now() to performance.now() and it didn't solve the issue. Any inputs here would be helpful :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution: Each loop runs the test in a separate process so they don't share resources with other tests and become slower with time. Promise.all is also not reliable because it runs all the promises, and only after it has executed all the promises, will it begin to resolve them (so the timing isn't right). Now we wait for each run to be resolved before going to the next one. Downside is that the tests are a bit slow now.

@kobawan kobawan changed the title DO NOT MERGE [tool] Add performance test for parsing time WIP: [tool] Add performance test for parsing time Nov 6, 2019
@kobawan kobawan changed the title WIP: [tool] Add performance test for parsing time [tool] Add performance test for parsing time Nov 6, 2019
@kobawan kobawan removed the wip Work in progress label Nov 6, 2019
performance/get_time.js Outdated Show resolved Hide resolved
performance/parse_time.js Outdated Show resolved Hide resolved
performance/performance_test.js Outdated Show resolved Hide resolved
performance/performance_test.js Outdated Show resolved Hide resolved
for(let i = 0; i < numberOfRuns; i++) {
try {
const time = await new Promise((resolve, reject) => {
const proc = spawn("node", [testFilePath]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Candid question, do we really need to go throught a new child process every time ? Just calling a promise that returns the value was not enough ? I recall you talked about the more you run the test the more it took, I guess it's related to that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe now that we don't do Promise.all it isn't needed anymore but I will need to test it out. This is safer because we know that each test is not impacting other runs (by sharing the same resources)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since you're doing a await they actually running one by one so it won't share anything right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it without the child processes, and the results changed in correlation to how many runs was performed. So it seems it is still needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Rumesh or Jakub have an explaination for that I would be happy to know it

Copy link
Contributor

@rumesh rumesh Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR; Yes spawn new process will be faster than doing it in the same execution thread.

The child_process.spawn() method spawns the child process asynchronously, without blocking the Node.js event loop

https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
So first using spawn will not block the node event loop.

And NodeJS is single thread but using spawn let the capabilities to use ale the cores in the CPU

a new process will be created and managed by your OS. That new process can be executed in parallel to the main process as long as your computer has at least 2 virtual CPU cores

https://stackoverflow.com/questions/33381796/does-node-child-process-spawn-run-on-multi-core-cpu

So yes spawn will create clean and new thread to execute this test file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawning node for each test feels heavy. Maybe https://nodejs.org/api/vm.html can be useful and more lightweight, although I never used it personally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it and tried it, but I had some errors (new vm.Script('node performance/get_time.js') was throwing an error). As the waiting time for the performance tests to run is "not great, not terrible", and this PR isn't a priority, I will leave it as it is for now, because at least it works 🤷‍♀

Copy link
Contributor

@clementFrancon clementFrancon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

const path = require('path');
const { VASTClient } = require('../dist/vast-client-node.min.js');

const url = path.join('file://', __dirname, '..', 'spec', 'samples', 'inline-linear.xml');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI, when using path.join you can also just do '../spec/samples/inline-linear.xml' and it will work cross-platform just fine. No need to supply N separate arguments. (the output will be the same)

(the output will contain \ on Windows though, so it can be used when you want to open a file, but not always in other situations)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However the problem is that path was created for dealing with filesystem disk paths, not with any generic paths. In particular it does not work as expected on Windows when dealing with URLs because URLs have / while all path operations output \ on Windows, and also on Windows you have the C:\ stuff which complicates things.

When I run this on WIndows with bash, I get:

  • with file:// in the beginning:
[Error: ENOENT: no such file or directory, open 'C:\C:\git\vast-client-js\spec\samples\inline-linear.xml'] {
  • without file://:
TypeError [ERR_INVALID_PROTOCOL]: Protocol "c:" not supported. Expected "http:"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make it work with https://www.npmjs.com/package/file-url (https://github.com/sindresorhus/file-url/blob/master/index.js)

const fileUrl = require('file-url');
const url = fileUrl(path.join(__dirname, '../spec/samples/inline-linear.xml'));

but I still get the ENOENT. Need to dig further, maybe there's some bash-on-windows specific stuff, or an issue with vast-client 🤔

Copy link
Contributor

@jakub-g jakub-g Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the output from file-url is like file:///C:/git/vast-client-js/spec/samples/inline-linear.xml which when loaded in any browser on Windows works just fine.

Unclear to me why node wants to load C:\C:\
It is either issue in vast-client or the lib which it uses under the hood to issue requests, or some peculiary of msys/mingw like this one

BTW check also this (added in node 10.12)
https://nodejs.org/api/url.html#url_url_pathtofileurl_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to using pathtofileurl. I wasn't able to test it on windows though, so I'm not sure it will work. I also removed the keyword __dirname as it wasn't necessary after all


const VastClient = new VASTClient();

const timeBefore = Date.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps you could use process.hrtime() although it's more complex
https://nodejs.org/api/process.html#process_process_hrtime_time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for(let i = 0; i < numberOfRuns; i++) {
try {
const time = await new Promise((resolve, reject) => {
const proc = spawn("node", [testFilePath]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawning node for each test feels heavy. Maybe https://nodejs.org/api/vm.html can be useful and more lightweight, although I never used it personally.

@kobawan kobawan requested a review from jakub-g November 18, 2019 10:35
@jakub-g
Copy link
Contributor

jakub-g commented Nov 18, 2019

Alright so I found the issue on Windows: it's this nodejs/node#24223 (comment)

To fix this, this line:
https://github.com/dailymotion/vast-client-js/blob/master/src/urlhandlers/node_url_handler.js#L11
fs.readFile(url.pathname, ...
should become:
fs.readFile(uri.fileURLToPath(url.href), ...
but it requires node 10.12+ (currently package.json says 8.x). But since master is now at 3.0.0 alpha it's a good time to bump the min version of node required ;)

VastClient.get(url)
.then(() => {
console.log(Date.now() - timeBefore);
const res = process.hrtime(timeBefore);
console.log((res[0] * 1e3 + res[1] * 1e-6));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be 1e6 not 1e-6? (same in the other file)

1e6 === 1000000
1e-6 === 0.000001

Maybe more readable to just write 1000 / 1000000 instead :)
Or even better, since recent versions of node you can write integers with _ separators for readability:
1_000
1_000_000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res[1] is in nanoseconds so I'm multiplying it by 1e-6, I could also divide it by 1e6, but it is correct as it is. I can change it to be more explicit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well, the node docs suck because they look like the arr[1] is a small value while in reality it's huge.

And now I see that you use ms while they use ns in output. All good 👍

@jakub-g jakub-g mentioned this pull request Nov 18, 2019
5 tasks
@kobawan
Copy link
Contributor Author

kobawan commented Nov 18, 2019

@jakub-g thanks for the input, I didn't think to look at node_url_handler for some reason, I thought the problem was in the performance tests.

@ZacharieTFR ZacharieTFR merged commit 60f399f into 3.0-version Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change chore Anything not involving production features
Development

Successfully merging this pull request may close these issues.

None yet

5 participants