Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment02 #6

Closed
5 tasks done
drkostas opened this issue Mar 9, 2021 · 0 comments
Closed
5 tasks done

Assignment02 #6

drkostas opened this issue Mar 9, 2021 · 0 comments
Assignees
Projects

Comments

@drkostas
Copy link
Owner

drkostas commented Mar 9, 2021

  • (25 points) Clone the kmeans repository into your own area at /lustre/haven/proj/UTK0150/$USER.
  • (25 points) Write a job script that will use a single node and a single process per node (so only one process total). Ensure the job runs on a compute node, and run the non-distributed kmeans (kmeans_vectorized.py). Make a note of the output directory and commit the job script to your cloned repo.
  • (25 points) Write another job script to run the distributed kmeans script on two compute nodes using 20 processes, using the same iris data we've been looking at and submit the job, noting the output directory. This job should finish in a very short amount of time so requesting a walltime of 5 minutes will help you get through the queue quicker. Don't forget that you must launch your processes with mpirun inside the script...
  • (25 points) Modify the script to use the TCGA data in the /lustre/haven/proj/UTK0150/data directory (see README for refresher on how to load the data). Run another job on ISAAC using 20 processes and time how long the script takes to run, using 10 clusters. Make a note of the time it takes. Also run with a single process and one node and verify that both jobs output identical cluster assignments and centroids by saving the outputs of each job, loading them once complete and verifying that they match. (hint: successs at this requires identical initialization).
  • Submit a message here with the following information:
    • path to your code on ISAAC.
    • paths and brief description of relevant output log directories for ISAAC jobs that succeeded (please don't make us sort through your main output directory ourselves and sort through failed job IDs).
    • Timings for k-means on Iris and TCGA data, with single process vs twenty. Do you achieve a 20x speedup in each case?

Your assignment will not be graded unless you submit it here on Canvas; no exceptions.

@drkostas drkostas created this issue from a note in DSE-512 (In Progress) Mar 9, 2021
drkostas added a commit that referenced this issue Mar 9, 2021
drkostas added a commit that referenced this issue Mar 9, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
This reverts commit 4dbd1f9
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 16, 2021
drkostas added a commit that referenced this issue Mar 17, 2021
drkostas added a commit that referenced this issue Mar 18, 2021
drkostas added a commit that referenced this issue Mar 18, 2021
drkostas added a commit that referenced this issue Mar 18, 2021
Changed back to 1 node-20 processes
@drkostas drkostas self-assigned this Mar 18, 2021
DSE-512 automation moved this from In Progress to Done Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

1 participant