New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous improvement of nodule classification models (see #2) #131

Open
isms opened this Issue Sep 20, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@isms
Copy link
Contributor

isms commented Sep 20, 2017

Overview

We want to continuously improve the accuracy and reliability of models developed for nodule classification. This is a continuation of #2 and will remain open indefinitely.

Note: Substantive contributions are currently eligible for increased point awards.

Design doc reference:
Jobs to be done > Detect and select > Prediction service

Acceptance criteria

  • trained model for classification
  • documentation for the trained model (e.g., cross validation performance, data used) and how to re-train it

NOTE: All PRs must follow the standard PR checklist.

@isms isms added this to the 1-mvp milestone Sep 20, 2017

@isms isms added the POINT-BOUNTY label Sep 20, 2017

@isms isms removed this from the 1-mvp milestone Oct 10, 2017

@isms isms added this to the 2-feature-building milestone Oct 29, 2017

@WGierke

This comment has been minimized.

Copy link
Contributor

WGierke commented Dec 13, 2017

Maybe we should proceed in small steps by e.g. first adding evaluation methods that cover most of the metrics introduced in #221 so we can have a more standardized way to determine those metrics? This way we would also be able to quickly determine the quality of the current implementations of the identification, classification and segmentation algorithms. This would in turn make it easier to focus first on the algorithm that is performing the worst so far. Any thoughts @isms @reubano @lamby ?

@pjbull

This comment has been minimized.

Copy link
Contributor

pjbull commented Dec 22, 2017

For folks who are interested, see the latest announcement here:
https://concepttoclinic.drivendata.org/newsfeed

There are a limited number of AWS credits available for folks to continue to make progress on these algorithms.

@WGierke

This comment has been minimized.

Copy link
Contributor

WGierke commented Dec 22, 2017

Thanks @pjbull !
I contacted the DrivenData team and asked for credits. If they can provide me with some, I'd like to work on this issue. Once I get their answer, I'll update this comment with the current state for transparency reasons.

Update: I received the credits but unfortunately I have to wait ~2 weeks for my credit card being delivered, before that I won't be able to complete the sign-up process at AWS. So in case someone wants to start with this issue as well: feel free to do so! :)

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Dec 29, 2017

@WGierke, I have created a virtual machine, the support should increase the instances limit anytime soon, and then the machine will be ready for use. Also, I have plenty of time for upcoming week. So, if you have some ideas, we can work together on the issue and share the points and the fun :)
If this sounds good to you, please, contact me on Gitter.

@Serhiy-Shekhovtsov

This comment has been minimized.

Copy link
Contributor

Serhiy-Shekhovtsov commented Jan 2, 2018

@reubano, @pjbull what @WGierke said here makes perfect sense to me. If the score for those metrics is a required part of models improvement it would be nice to have a set of standard tests for them.
I am happy to take part in it's development. What do you think?

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 3, 2018

@Serhiy-Shekhovtsov @WGierke sounds good to me. Feel free to create the relevant issues.

@WGierke WGierke referenced this issue Jan 3, 2018

Open

Evaluation Pipeline for Models #271

1 of 6 tasks complete

@isms isms modified the milestones: 2-feature-building, 3-packaging Jan 5, 2018

@swarm-ai

This comment has been minimized.

Copy link
Contributor

swarm-ai commented Jan 18, 2018

Hi @reubano I have been working on retraining the classifier and detector models for better performance. I am planning to document the process for both detector and classifier models and submit a pull request to the concept-to-clinic clone of the GRT code base here: https://github.com/concept-to-clinic/DSB2017

Will that work? I did not find any training code set up in the concept-to-clinic repo.

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Jan 18, 2018

@swarm-ai that repo is just for reference. Are you able to incorporate your performance enhancements to the code in this repo? The GRT model has already been included as per #4.

@caseyfitz

This comment has been minimized.

Copy link

caseyfitz commented Jan 18, 2018

Hi @swarm-ai , improved models would be very welcome––good luck! There is currently no workflow for including training processes in the application codebase, but this is something we'd love to have.

The minimum we need to incorporate an improved model are currently

  1. the weights, which live in the assets/ subdir for each algorithm
  2. the architecture, which lives in the src/ subdir for each algorithm (so that we can load the trained weights)

With an eye towards future development, we'd be happy to see a PR that augments the algorithm directories with a training/ subdir (in addition to the current src/ and assets/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment