New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Adapt the grt123 model #4

Closed
reubano opened this Issue Aug 1, 2017 · 10 comments

Comments

Projects
None yet
3 participants
@reubano
Copy link
Contributor

reubano commented Aug 1, 2017

Overview

Currently, there is a just a placeholder in the algorithm that classifies nodules in scans. Nodules are areas of interest that might be cancerous. We need to adapt the Data Science Bowl (DSB) algorithms to predict P(cancer) given an iterator of nodule centroids for an image.

The top DSB algorithm (grt123) was written to run on a GPU for Python2. It would be nice to integrate this algorithm into the current structure and update it to run on Python3 (potentially on a CPU as well).

Expected Behavior

Given the grt123 model trained to perform this task, a DICOM image, and an iterator of nodule centroids, yield the P(cancer) for each nodule.

Design doc reference: Detect and select

Technical details

  • The majority of the Python3 and CPU conversion has been completed and is available in the conversion branch.

  • The forked model is available here (reads source DICOM images from S3).

  • One area that definitely needs review is the Py2/Py3 floor/true division conversions. Some calculation explicitly converted numbers to floats, and in those cases it was apparent that true division was desired. However, the remaining floor division calculations should be checked to ensure that true is not the appropriate operation.

  • If you get a UnicodeDecodeError error while trying to load the serialized Torch model, use the torch_loader function in the utils module instead.

  • When running on the CPU, it isn't necessary to perform that much work. Just enough to obtain a plausible result in a reasonable amount of time.

  • This feature should be implemented in the prediction/classify/trained_model/predict method.

Acceptance criteria

NOTE: All PRs must follow the standard PR checklist.

@reubano reubano added this to the 1-mvp milestone Aug 1, 2017

@reubano reubano changed the title Feature: Integrate the algos-grt123 branch Feature: Adapt the grt123 model Aug 4, 2017

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 7, 2017

I'd like to work on this one.

P.S. btw, is it a good idea to tell when you are working on something?

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 17, 2017

Hi @reubano! I've fixed few issues on pre-processing part and managed to have it running successfully and giving the same result as original code. These changes have been pushed now.
Now I am looking into detection part, but encountered this issue:

Torch: not enough memory: you tried to allocate 21GB

Does it really take this much memory or is it an issue with environment \ code?

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Sep 18, 2017

@Serhiy-Shekhovtsov

Hi @reubano! I've fixed few issues on pre-processing part and managed to have it running successfully and giving the same result as original code.

That's great! Nice job!

Does it really take this much memory or is it an issue with environment \ code?

I'm afraid i don't know the answer to that one. Do you get the same error when running the base repo?

@dchansen

This comment has been minimized.

Copy link
Contributor

dchansen commented Sep 19, 2017

I have the full model adapted for Python3 in the pull request at #122

The Nodule identification part is extremely memory hungry. Be sure to enable the volatile setting on the Variables, and reduce the chunk size in the SplitComb.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 19, 2017

Do you get the same error when running the base repo?

Yes.

I have the full model adapted for Python3 in the pull request at #122

Great!

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Sep 20, 2017

@Serhiy-Shekhovtsov did @dchansen recommendations help you out?

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 20, 2017

I didn't check it yet.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 22, 2017

@dchansen I am getting this error when trying to checkout your repo:

$ git-lfs.exe smudge -- prediction/src/algorithms/classify/assets/gtr123_model.ckpt
Error downloading object: prediction/src/algorithms/classify/assets/gtr123_model.ckpt (eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4)

Smudge error: Error downloading prediction/src/algorithms/classify/assets/gtr123_model.ckpt (eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4): [eb2c037dd55e2f49da95657f7a7851cbcfe2f2b516848ed03f8c5c820f3e16b4] Object does not exist on the server: [404] Object does not exist on the server

Any ideas what is it and how to solve it?

@dchansen

This comment has been minimized.

Copy link
Contributor

dchansen commented Sep 22, 2017

It was an error on my side. It's fixed in the repo now.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Sep 22, 2017

Thanks, now I can pull it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment