-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n] Move subscription to web worker #9795
[1/n] Move subscription to web worker #9795
Conversation
BRR-289 run page performance w/ large jobs
sample runs to load with https://drive.google.com/drive/folders/1ON6Wf8CF2c1pqF1E8P8R-Jh_aTv3cZFt |
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
…-289-run-page-performance-w-large-jobs-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 1/N in the title but the PR summary does not make it clear what the expected scope of this is. From what i can tell this on its own doesn't improve anything, just adds a web worker hop to parsing the same volume of data.
worker.addEventListener('message', (event) => { | ||
try { | ||
onLogs(JSON.parse(arrayBufferToString(event.data))); | ||
} catch (_) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make some noise somewhere on failure - this would be really gnarly to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll remove the try/catch
@alangenfeld Yeah this is just setup for what we want to do which is only send data that would be visible from the webworker to the mainthread so that the mainthread does less work overall. The linked linear task has more information and heres this quip describing the plan: https://elementl.quip.com/To6cAXcqGIbi/rfc-Improving-Job-Run-Performance Btw I did do 4 trial runs for each version with and without the worker and the worker version was consistently faster at loading a run with 200k events |
Just tested this on Adtriba's debug file from a while ago and the improvements are huge there:
Worker:
I don't see a lot more room for improvement on the frontend but my macbook's python server is pretty slow so its the bottleneck for me. On a faster machine it's possible it could overwhelm the frontend. I'll see if I can setup dagster on my personal workstation pc |
What exactly is that time measuring? May be useful to include profiler screenshots. Its not obvious to me why things are faster with this change if the volume of data parsed is the same in both approaches - unless
isn't right or needs some more specific detail. |
I can put up a commit with the test setup. It just measures from pageload until the last log message from the webserver is handled (I used a
Yeah I'm going to be honest it did catch me by surprise. I can do some more digging to figure out exactly where the time is coming from. I setup my workstation so hopefully a beefier setup will make it easier to profile (my macbook hangs everytime I try to profile the non-worker version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf benefit looks really cool. Thanks for putting this together!
import {gql} from '@apollo/client'; | ||
|
||
export const PYTHON_ERROR_FRAGMENT = gql` | ||
fragment PythonErrorFragment on PythonError { | ||
__typename | ||
message | ||
stack | ||
cause { | ||
message | ||
stack | ||
} | ||
} | ||
`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fragment is already defined in PythonErrorInfo
, can you reuse that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I didn't want the initial PR to have a ton of changes so that its easier to review but I'll do what I did in the original PR where I moved the dependencies around and changed all the imports so that theres only 1 definition
|
||
let apolloClient: ApolloClient<any> | undefined = undefined; | ||
|
||
export function setup(data: any) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this type be tightened up?
}); | ||
|
||
apolloClient = new ApolloClient({ | ||
cache: new InMemoryCache(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth using createAppCache
?
// https://developer.chrome.com/blog/how-to-convert-arraybuffer-to-and-from-string/ | ||
|
||
export function arrayBufferToString(buf: ArrayBuffer): string { | ||
return String.fromCharCode.apply(null, new Uint16Array(buf) as any); | ||
} | ||
|
||
// Chunk the data into 16kb bytes to avoid hitting max argument errors when converting back to a string | ||
// with String.fromCharCode.apply | ||
const MAX_BUFFER_SIZE = 16000; | ||
export function stringToArrayBuffers(str: string): ArrayBuffer[] { | ||
const buffers: ArrayBuffer[] = []; | ||
let buf = new ArrayBuffer(0); | ||
let view = new Uint16Array(buf); | ||
for (let i = 0, strLen = str.length; i < strLen; i++) { | ||
const index = i % MAX_BUFFER_SIZE; | ||
if (index === 0) { | ||
buf = new ArrayBuffer(Math.min(MAX_BUFFER_SIZE, str.length - i) * 2); // 2 bytes for each char\ | ||
view = new Uint16Array(buf); | ||
buffers.push(buf); | ||
} | ||
view[index] = str.charCodeAt(i); | ||
} | ||
return buffers; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented URL looks fairly old and has an update note -- is there a more recent native API we can use for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using native APIs underneath, it's just chunking to avoid going over the max number of arguments allowed for a function (Since we call String.fromCharCode.apply
on the buffer, the number of arguments is the size of the array).
import {PipelineRunLogsSubscription} from '../../runs/types/PipelineRunLogsSubscription'; | ||
|
||
export type Message = INITIALIZE; | ||
type INITIALIZE = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type InitializeMessage
? It's a little confusing on the eyes for a type to be ALL_CAPS
since it looks like a constant value at a glance.
type SHUTDOWN = { | ||
type: 'SHUTDOWN'; | ||
staticPathRoot?: undefined; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define INITIALIZE
and SHUTDOWN
types in the same location in a union so that you can have data
be a single union type? Just to tidy things up a bit.
js_modules/dagit/packages/core/src/runs/PipelineRunLogsSubscription.tsx
Outdated
Show resolved
Hide resolved
I was able to take a full profile for both versions with the Gantt chart off. One interesting thing is that the non worker version has double the "system" time, though its not much. That last render is very long I should diff the bottom-up performance data to figure out what that extra scripting time is in the non-worker version. I was expecting both of them to do the same amount of work since its the same number of renders for each one. |
@hellendag I removed the branching based off the URL and made the worker the default since it performs better. Also thanks for the detailed feedback! I know the PR was really rough I was mostly trying to get something quick out there for feedback and testing |
…-289-run-page-performance-w-large-jobs-1
d28ab99
to
704767a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copying this over from slack DMs - context was investigating the mysterious overall speed-up of the worker version:
so i was playing around with this change
if (!logs.hasMorePastEvents) {
requestAnimationFrame(throttledSetNodes);
}
just against master, and noticed in the react profiler that even though we try to don’t set nodes, the apollo subscription update still causes RunWithData to re-render every time we receive a websocket payload
which upon googling appears to be a common source of performance issues
https://stackoverflow.com/questions/61876931/how-to-prevent-re-rendering-with-usesubscription
https://medium.com/@seniv/improve-performance-of-your-react-apollo-application-440692e37026
Summary & Motivation
This PR recreates #8734
Organizations have runs that can output thousands of logs and upwards of 500MB of logs. This causes extreme slowdown in the UI as the mainthread struggles to keep up with the giant spam of logs coming from the backend both when the run is active and especially when the run is completed and all of the logs are loaded upfront. The main source of lag in the UI is deserializing the JSON data. To reduce this time we're planning on only sending the logs/data for things that are actually visible in the UI.
https://elementl.quip.com/To6cAXcqGIbi/rfc-Improving-Job-Run-Performance
Right now the worker will only be enabled if a query parameter worker=1 is in the URL
How I Tested These Changes
Loaded a huge run with dagit-debug and saw everything loaded fine.
Ran a few runs with and without the worker and the worker consistently beat the non worker version:
many_events.py:
No worker:
Average: 115.5 seconds
Worker:
Average: 97.2 seconds
~16% improvement
Adtriba run debug file:
No Worker:
Average: 324.26 seconds
Worker:
Average: 99.98 seconds
~71% improvement