-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC-430] Add Managing Concurrency Guide #24296
[DOC-430] Add Managing Concurrency Guide #24296
Conversation
@cnolanminich I noticed this is a draft, would this be ready for review you think? |
@PedramNavid yes I think so -- I wrote this during the week after docathon and wasn't sure if I should mark it ready for review or as a draft until some of the questions in my comment were addressed. Will mark it ready for review! |
I think you will need to tag a reviewer to help get this over the finish line as well.
glossary
Setting priority for ops/assets
Troubleshooting
|
Graphite Automations"Label and add CE on all Docs" took an action on this PR • (09/28/24)3 reviewers were added to this PR based on Pedram Navid's automation. |
Do you mean added as a reference of configuration to this page, or as a separate page? Will add a troubleshooting section and then tag reviewers, thanks! |
A separate page, fine to link to a [/todo] for now! |
@PedramNavid the deploy docs revamp action is failing, it had succeeded previously and I ran @prha it looks like you were the original author of the limiting concurrency docs page -- would you be able to review this from a content perspective or recommend someone who can? |
Here's the full error message, @cnolanminich:
|
Thanks! Fixed the link and it built successfully 🎉 |
<CodeExample filePath="guides/tbd/concurrency-tag-key-asset.py" language="python" title="No more than 1 asset running with a tag of 'database'" /> | ||
|
||
</TabItem> | ||
<TabItem value="Op Tag" label="Asset tag concurrency limits"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the label
|
||
<Tabs> | ||
<TabItem value="Asset Tag" label="Asset tag concurrency limits"> | ||
<CodeExample filePath="guides/tbd/concurrency-tag-key-asset.py" language="python" title="No more than 1 asset running with a tag of 'database'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe too wordy, but consider adding "across all runs"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more wordy but I think precision is good for this since it's a complex topic
|
||
</TabItem> | ||
<TabItem value="Op Tag" label="Asset tag concurrency limits"> | ||
<CodeExample filePath="guides/tbd/concurrency-tag-key-op.py" language="python" title="No more than 1 op running with a tag of 'database'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto ("across all runs")
Or you can configure it in the job definition: | ||
|
||
<Tabs> | ||
<TabItem value="Asset Tag with Job" label="Asset tag concurrency limits in a job"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a job is a little bit misleading... It's really within a run.
e.g. I can kick off two runs of job foo, and each run will have 1 "database" tag asset running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that was sloppy, corrected!
</TabItem> | ||
</Tabs> | ||
|
||
Or you can configure it in the job definition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to explain why you might use one versus the other.
For example... if you have some resource-constraints on something like BigQuery, you might want to configure global concurrency limits so that you don't tax this resource which is used across runs.
The run-scoped tag concurrency limits are useful when limiting compute resources within the run, if it's using something like the multiprocess executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, reworded / added some context on when you might want to do this
|
||
Need screenshot here | ||
|
||
## Prevent runs from starting if another run is already occurring (advanced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we talk about block_op_concurrency_limited_runs
at all? There's a section here about it in the old docs: https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#throttling-concurrency-limited-runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedram and I decided to punt that and a few other items to a "reference" page to be written
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @prha! updated per your feedback
Or you can configure it in the job definition: | ||
|
||
<Tabs> | ||
<TabItem value="Asset Tag with Job" label="Asset tag concurrency limits in a job"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that was sloppy, corrected!
</TabItem> | ||
</Tabs> | ||
|
||
Or you can configure it in the job definition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, reworded / added some context on when you might want to do this
<CodeExample filePath="guides/tbd/concurrency-tag-key-asset.py" language="python" title="No more than 1 asset running with a tag of 'database'" /> | ||
|
||
</TabItem> | ||
<TabItem value="Op Tag" label="Asset tag concurrency limits"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the label
|
||
<Tabs> | ||
<TabItem value="Asset Tag" label="Asset tag concurrency limits"> | ||
<CodeExample filePath="guides/tbd/concurrency-tag-key-asset.py" language="python" title="No more than 1 asset running with a tag of 'database'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more wordy but I think precision is good for this since it's a complex topic
|
||
Need screenshot here | ||
|
||
## Prevent runs from starting if another run is already occurring (advanced) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedram and I decided to punt that and a few other items to a "reference" page to be written
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this guide has come a long way, and I'd love to get this merged. I've been sharing the PR to this to some community members, and would love to get this on docs-preview.dagster.io.
We can definitely iterate more, and will likely do another pass of everything again.
Per @cmpadden, the |
Adapted / edited the original doc for the new guide format.
I had a couple of questions as I was working on it:
dagster.yaml / Dagster+ Deployment settings) examples as inline YAML, keep the folder with the dagster.yaml? To repro, need to
export DAGSTER_HOME=$(pwd)/global_concurrency` (and for tag_concurrency, need an OSS version using Postgres / MYSQL)