Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to more easily track bandwidth usage #6943

Closed
Joshfindit opened this issue Dec 9, 2020 · 5 comments
Closed

Ability to more easily track bandwidth usage #6943

Joshfindit opened this issue Dec 9, 2020 · 5 comments
Labels
needs info Needs further information from the user

Comments

@Joshfindit
Copy link

I've successfully deployed an MVP, but I've run in to unexpectedly high usage on the bucket.

I see that the debug logs do list usage in the form of read[7] 32768 bytes from 16384, but what we'd really like is something that could be used without debug logging.

Ideally stats where we could see:

  • What file was accessed
  • What the datetime was
  • How much of the data was read at that time
@Joshfindit
Copy link
Author

(It may be worth noting that we're using B2, and Backblaze currently does not offer any sort of usage reporting. You have to contact support to get the total for individual days as opposed to the monthly total shown in the UI)

@ncclementi ncclementi added the needs info Needs further information from the user label Aug 12, 2021
@ncclementi
Copy link
Member

@Joshfindit There have been recently some efforts on getting more information display in the form of plots about network bandwidth, check dask/distributed#5129 .

I am afraid that this feature request is a bit broad and needs more information, if you are interested in describing in more detail what is what you would like to see, please let us know in this thread. I would leave the issue open for now, with a needs info tag.

@Joshfindit
Copy link
Author

@ncclementi I checked the thread and that looks awesome already.

I apologize as I’m too far out from the project I was referencing and don’t have access to the notes about the specifics.

Generally my questions around these types of requests are about one of the following questions:

  • How can we be sure that everything is running as expected?
  • There was an unexpectedly high usage bill. What specific process was at fault, and what was it accessing/doing to cause that?

Feel free to close this request if those don’t lead to meaningful upgrades for Dask.

@ncclementi
Copy link
Member

Thank you for your reply @Joshfindit, I think those are valid questions and they would probably be helpful for other people. I wonder if @jrbourbeau has any insight/updates on this.

@ian-r-rose
Copy link
Collaborator

The situation has improved in the last year around visibility into worker network usage. I don't think there are many specific ideas here for how to further improve observability in a cloud context, so I'm going to close this. But if you have more thoughts about specific features we could build into dask/distributed, feel free to comment here or in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info Needs further information from the user
Projects
None yet
Development

No branches or pull requests

3 participants