Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Multiple bugs from first round of feedback... #27

Merged
merged 6 commits into from Feb 18, 2021

Conversation

rilango
Copy link
Contributor

@rilango rilango commented Feb 13, 2021

  • Docker COPY Command Does Not Transfer Necessary Files Into Container
    Now git clone is used to copy everything from master branch. TODO: Further
    changes are required to align development and production deployment.

  • Incorrectly Set GPU Flag Issue Still Exists
    Now while creating list of available GPUs casting exception are managed to
    avoid this issue.

  • UMAP Bug with CPU Pipeline in UI
    This was a ripple effect of changes to Dockerfile and conda env. file.

  • Changing the Number of Clusters Does not Seem to Impact Plot
    Additional changes to event handling callbacks are make to pick the value
    from the textbox is working. Now, the changed 'Number of Clusters' is picked
    on Recluster or Reload event

  • Reclustering on Two Clusters Produces an Error
    Reclustering with no 'Number of cluster' was the underlying reason for this
    issue. Now, 'Number of cluster' is not set to null or ''. In cases were
    'Number of cluster is invalid, the default value (7) is used.

  • Cluster Color Palette Seems to Change
    Now the colors are generated only when 'Number of Clusters' is changed.

- Docker COPY Command Does Not Transfer Necessary Files Into Container
  Now git clone is used to copy everything from master branch. TODO: Further
  changes are required to align development and production deployment.

- Incorrectly Set GPU Flag Issue Still Exists
  Now while creating list of available GPUs casting exception are managed to
  avoid this issue.

- UMAP Bug with CPU Pipeline in UI
  This was a ripple effect of changes to Dockerfile and conda env. file.

- Changing the Number of Clusters Does not Seem to Impact Plot
  Additional changes to event handling callbacks are make to pick the value
  from the textbox is working. Now, the changed 'Number of Clusters' is picked
  on Recluster or Reload event

- Reclustering on Two Clusters Produces an Error
  Reclustering with no 'Number of cluster' was the underlying reason for this
  issue. Now, 'Number of cluster' is not set to null or ''. In cases were
  'Number of cluster is invalid, the default value (7) is used.

- Cluster Color Palette Seems to Change
  Now the colors are generated only when 'Number of Clusters' is changed.
@rilango rilango requested a review from mlgill February 13, 2021 22:20
Copy link
Contributor

@mlgill mlgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have noted non-blocking issues with Dockerfile and unfortunately some bugs that block testing. Happy to finish testing when the blockers are fixed.

Dockerfile Show resolved Hide resolved
@mlgill
Copy link
Contributor

mlgill commented Feb 15, 2021

@rilango Have started testing the new commits unfortunately there are still some bugs and they are blockers for testing, so I can't continue without more fixes. Please see requested change on the PR, in addition fo the following:

One of the imports in gpukmeansumap.py is incorrect -- I've corrected it in in my version of your branch here.

Unfortunately there is also a bug after fixing the import, and I couldn't figure out how to fix it via Google:

Traceback (most recent call last):
  File "startdash.py", line 274, in <module>
    main()
  File "startdash.py", line 270, in main
    Launcher()
  File "startdash.py", line 92, in __init__
    getattr(self, args.command)()
  File "startdash.py", line 210, in analyze
    n_gpu=args.n_gpu)
  File "/workspace/nvidia/cheminformatics/utils/dask.py", line 46, in initialize_cluster
    enable_infiniband=enable_infiniband)
  File "/opt/conda/envs/cuchem/lib/python3.7/site-packages/dask_cuda/local_cuda_cluster.py", line 160, in __init__
    memory_limit, threads_per_worker, n_workers
  File "/opt/conda/envs/cuchem/lib/python3.7/site-packages/distributed/worker.py", line 3270, in parse_memory_limit
    memory_limit = int(system.MEMORY_LIMIT * min(1, nthreads / total_cores))
ZeroDivisionError: division by zero
Exception ignored in: <function Cluster.__del__ at 0x7ff37a4b1560>
Traceback (most recent call last):
  File "/opt/conda/envs/cuchem/lib/python3.7/site-packages/distributed/deploy/cluster.py", line 113, in __del__
    if self.status != Status.closed:
AttributeError: 'LocalCUDACluster' object has no attribute 'status'

Happy to finish testing when this CUDACluster bug is fixed. I should be available later this evening, if needed.

COPY launch.sh /opt/nvidia/cheminfomatics/
COPY *.py /opt/nvidia/cheminfomatics/
COPY nbs/*.ipynb /opt/nvidia/cheminfomatics/
RUN mkdir -p /opt/nvidia/ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this was known, but the still seems to be missing most of the library files:

# ls /opt/nvidia/cheminfomatics/
Dockerfile  LICENSE  README.md  chemvisualize.py  demo.ipynb  icla.pdf  launch.sh  screenshot.jpg  startdash.py

Added some of the changes recommended during the initial feedback in Dockerfile.
Fix the import stmt issus in GPU workflow class.
@mlgill
Copy link
Contributor

mlgill commented Feb 16, 2021

@rilango I've verified the following are fixed and have approved changes so you can merge when the rest (below) are fixed:

  • UMAP Bug with CPU Pipeline in UI
  • Changing the Number of Clusters Does not Seem to Impact Plot
  • Reclustering on Two Clusters Produces an Error
  • Cluster Color Palette Seems to Change

The ones that remain are:

  • Docker COPY Command Does Not Transfer Necessary Files Into Container -- from your comments, I think it's known that the codebase in /opt/nvidia/cheminformatics is not complete enough to be functional on it's own.
  • Incorrectly Set GPU Flag Issue Still Exists -- leaving this in since my setup still has issues, per our Slack call.
  • New ChEMBL IDs aren't handled correctly -- as you mentioned
  • KMeans will crash with a small number of molecules -- as you mentioned
  • I also noticed that reclustering with the lasso selector using the CPU pipeline may not work more than once. I did ensure it was a small number of molecules, but not less than 100. Might be worth checking.

- Dockerfile Issue: Copy Dockerfile changes from feedback PDF.

- New ChEMBLE issue: This is now fixed. A filtering issue was causing empty
  dataframe passed for kmeans. Root cause was ChEMBLE ID to molregno conversion
  error

- Potential fix for start error when just one GPU is available

- Ablitity to remove all MoI and remove the marker from Figure.
Added new workflow RandomProjection to the Dropdown. This uses single GPU.

Other changes include:
  - Introduce error Dialog. At the moment error during reclustering is rendered
  - Move some code the base cluster class
Changes to benchmark unittest to use classes directly instead of mocking thru.
commandline args.

Other changes includes, further refactoring to make unittest development easier.
Copy link
Contributor

@mlgill mlgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rilango Have finished testing this code. Issues and suggestions for how to handle are below.

Requirements to Fix for Release

• BUG: Remove hardcoded GPU device number in nvidia/cheminformatics/utils/dask.py This is noted in the code review.
• BUG: UI seems to be pulling 5k molecules, but it's not clear where this is set at. The default in startdash.py is 10k. I can't figure out how to update / change this.

Fix in Future Release
I suggest the following be fixed in a future release so that we can get this code merged.

• BUG: test_workflow.py hangs for at least an hour on my system sometimes. I haven't yet figured out the exact pattern to reproduce the issue. If there are issues with the CI, we can set this test to not run by default using available purest flags.
• BUG: there is still a bug associated with the benchmark.sh script. I'm happy to work on this after the release.
• IMPROVEMENT: UI responsiveness is still slow with small number (5k) molecules after switch to multiple GPUs. I haven't tested large number of molecules to see if this is improved there since I can't figure out how to set the number of molecules (see issue #2 above)

@@ -34,6 +34,7 @@ def initialize_cluster(use_gpu=True, n_cpu=None, n_gpu=-1):
CUDA_VISIBLE_DEVICES.append(int(device))
except ValueError as vex:
logger.warn(vex)
CUDA_VISIBLE_DEVICES = [0, 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove hardcoded CUDA devices. Thanks. :D

@rilango rilango merged commit 8c1e3bb into NVIDIA:dev Feb 18, 2021
rilango added a commit to rilango/cheminformatics that referenced this pull request Jul 23, 2021
…sampling

WIP: MegaMolBART single molecule sampling and linear interpolation
rilango added a commit to rilango/cheminformatics that referenced this pull request Jul 18, 2022
Bugfix: Multiple bugs from first round of feedback...
rilango added a commit to rilango/cheminformatics that referenced this pull request Jul 18, 2022
…sampling

WIP: MegaMolBART single molecule sampling and linear interpolation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants