-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Talk Information
- **Draft Title:**Unlocking Network-Aware Scheduling with Topograph
- Length:
- Lighting talk/Demo (10 min)
- Full Presentation (20 min)
- Links: (slides, article, notes) https://github.com/NVIDIA/topograph
- Which dates, particularly Tue/Wed/Thu during the 3rd/4th week of a month, are you likely available? (e.g. "Most Thursdays with 4 weeks notice", "June or later", etc.) Most Mondays or Thursdays
- Short Summary of your talk:
Distributed AI workloads perform best when scheduled on network-optimized nodes, improving performance and reducing cost. Although major CSPs expose network topology in their managed Kubernetes offerings, this information is difficult to obtain in on-prem or non-managed clusters, and SLURM and SLURM-on-Kubernetes cannot natively use it even in managed environments.
In this talk, I introduce Topograph, which automatically discovers and maintains real-time network topology across SLURM, Kubernetes, and SLURM-on-Kubernetes.
Speaker Bio
- **Name:**Dmitry Shmulevich
- Mini-bio: (This is the introduction our emcee will give about you, one paragraph is usually best)
Dmitry is a software engineer at NVIDIA with over a decade of experience in cloud computing. He is an active member of the open-source community and maintains and contributes to several open-source projects. - Picture for slides: (We use this pictures to create promotional content and to present your talk in our socials.)
- Would you like help with your presentation? (Feedback on notes, content?)
- Social media link(s): (twitter, website, linkedin, etc.) https://github.com/dmitsh
- Do you agree to the CNCF Code of Conduct?
- I agree