Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maximum shard size to config #4986

Merged
merged 1 commit into from Jun 29, 2021
Merged

Add maximum shard size to config #4986

merged 1 commit into from Jun 29, 2021

Conversation

mrocklin
Copy link
Member

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this needs to be configurable?

@@ -165,6 +165,7 @@ distributed:
min: 1s # the first non-zero delay between re-tries
max: 20s # the maximum delay between re-tries
compression: auto
shard: 64MiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick but the previous value was 64MB not 64MiB :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I think that we should default to powers of two in general. Any objection?

Copy link
Member

@fjetter fjetter Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should default to powers of two in genera

I guess it doesn't really matter but with powers of two we probably have the best chance to hit some kind of sweet mem alignment/cache size/whatever optimization so I'm all for it.

However, if you prefer powers of two, you should use MB, shoudn't you? (I understand if you didn't want to push changes any more, just wondering if I messed something up in my mind :) )

MB == 1024 ** 2 / 2 ** 20
MiB == 1000 ** 2 / 10**6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the internet, the source of all truth.

Megabytes (MB) or Mebibytes (MiB)?

Though the article refers to Linux, the topic is applicable to all computers. ... According to these standards, technically a megabyte (MB) is a power of ten, while a mebibyte (MiB) is a power of two, appropriate for binary machines. A megabyte is then 1,000,000 bytes.Dec 23, 2001

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I was wrong all those years. I remember a first semester class where my prof introduced the unit and I would've sworn all these years it was the other way round 🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the prof was wrong too? Imagine how they feel :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If he was wrong, he'd never admit to it!

@mrocklin
Copy link
Member Author

Any reason why this needs to be configurable?

We ran into problems with websocket comms. Various parts of the web infrastructure that were outside of our control didn't want to pass frames larger than a few megabytes. Folks were changing this value by hand and pushing new versions of the software around.

@mrocklin mrocklin merged commit b9d2e3b into dask:main Jun 29, 2021
@mrocklin mrocklin deleted the shard-config branch June 29, 2021 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants