Project 7: Parallelizing a Machine Learning Algorithm Concept: This project applies parallel techniques to a high-demand field. It's an excellent way to see how parallel computing is integral to modern data science and AI. Project Description: Implement a parallel version of a machine learning algorithm, such as Stochastic Gradient Descent (SGD) or K-Means clustering.
Mid-Term Evaluation (Month 2): Task: Implement the sequential version of the chosen algorithm and perform initial tests on a small dataset. The team should then design a strategy to parallelize the core computation (e.g., using data parallelism, where different nodes process different parts of the data). Deliverable: A working sequential implementation, a design document outlining the parallelization strategy, and a plan for evaluating performance.
Final Evaluation (Month 4): Task: Implement the full parallel version of the algorithm. Students will demonstrate its performance on a larger dataset, analyzing the speedup and the impact of communication overhead on accuracy. Deliverable: The complete parallel implementation, a final report with performance benchmarks comparing accuracy and runtime against the sequential version, and a presentation.