The SMU O’Donnell Data Science and Research Computing Institute Distributed Python workshop is designed for researchers seeking to optimize their workflows for large-scale data processing and parallel computing. Participants will explore the fundamentals of distributed computing with Python, learning how to leverage Dask for efficient task parallelism and Ray for scalable machine learning and reinforcement learning applications. Through hands-on exercises, attendees will gain practical experience in deploying these tools to handle massive datasets, optimize computational resources, and streamline workflows.
-
Introduction to Parallel Programming:
- Understanding the basics of parallel computing.
- Advantages of parallelism in software development.
- Overview of parallel hardware and software architectures.
-
Dask:
-
Ray:
- Prerequisites: Basic understanding of Python programming.
- Environment Setup: Guidance on setting up a development environment.
- Example Projects: Hands-on examples provided to practice the concepts learned.
- Documentation: Detailed explanations and examples of the concepts covered.
- Examples: Sample code demonstrating the use of parallel constructs in real-world scenarios.
- Exercises: Problems and projects to test your understanding and skills.
This workshop is open for contributions from the community. You can suggest changes, report issues, or add new content to help enhance the learning experience.