Skip to content

Talk Proposal: Unlocking Network-Aware Scheduling with Topograph #21

@dmitsh

Description

@dmitsh

Talk Information

  • **Draft Title:**Unlocking Network-Aware Scheduling with Topograph
  • Length:
    • Lighting talk/Demo (10 min)
    • Full Presentation (20 min)
  • Links: (slides, article, notes) https://github.com/NVIDIA/topograph
  • Which dates, particularly Tue/Wed/Thu during the 3rd/4th week of a month, are you likely available? (e.g. "Most Thursdays with 4 weeks notice", "June or later", etc.) Most Mondays or Thursdays
  • Short Summary of your talk:
    Distributed AI workloads perform best when scheduled on network-optimized nodes, improving performance and reducing cost. Although major CSPs expose network topology in their managed Kubernetes offerings, this information is difficult to obtain in on-prem or non-managed clusters, and SLURM and SLURM-on-Kubernetes cannot natively use it even in managed environments.
    In this talk, I introduce Topograph, which automatically discovers and maintains real-time network topology across SLURM, Kubernetes, and SLURM-on-Kubernetes.

Speaker Bio

  • **Name:**Dmitry Shmulevich
  • Mini-bio: (This is the introduction our emcee will give about you, one paragraph is usually best)
    Dmitry is a software engineer at NVIDIA with over a decade of experience in cloud computing. He is an active member of the open-source community and maintains and contributes to several open-source projects.
  • Picture for slides: (We use this pictures to create promotional content and to present your talk in our socials.)
  • Would you like help with your presentation? (Feedback on notes, content?)
  • Social media link(s): (twitter, website, linkedin, etc.) https://github.com/dmitsh
  • Do you agree to the CNCF Code of Conduct?
    • I agree

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions