-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Endpoint/Models for "samples/examples" #2
Comments
The GDC portal is a good example of a friendly user interface and a good starting point to describe what a sample selector for this type of data should be able to do. For our purposes, however, I think we will need our interface to communicate with the |
I created a quick class diagram of the items that I think have been sufficiently specified based on discussions thus far to start implementation of the models [Samples, Genes, Mutations, Mutation Types]. I went ahead and assumed we'd install django-genes and django-organisms in this project, as that lets use use what is there. At least django-genes will need a rest API but it already provides an elasticsearch index that will be useful to find the right gene when a user types an identifier. I'll create a pull request with the XML form from draw.io that we can edit. |
@cgreene This is great! Would you be able to indicate data type for each? If a field is an enumeration, what the potential values could be? I assume an id int auto incrementing PK on each model. I would also recommend created_at and updated_at fields on each. We may also want to consider no deletes or soft deletes using deleted_at |
Will fill in what I can but probably need the cognoma/cancer-data team to chime in. This is generally using text, unless i'm absolutely convinced that an enum or more complex approach makes sense. The cancer-data needs to fill some of these in (like age at diagnosis - I made it an integer but not sure it actually is in the data). Sample:
|
Here is the ultra stripped down version requested by @aelkner at the meetup last night. |
* Task Integration (#1) * Testing CircleCI * Testing new AWS IAM credentials/permissions * Increased deployment timeout threshold (#64) * Increased ecs_deploy timeout threshold to 180 seconds The most recent deploy took about 120 seconds, but the current threshold for the ecs_deploy script timeout is 90 seconds. This causes builds on CircleCI to fail with the red X - not a good look. 180 seconds plays it on the safe side without being too long. * Create task after classifer * Task creation working * Expand task on get * Expand on classifier post * Starting task creation tests * Expanding task(s) * Fixed tests The unique ID of a task was causing problems in the testing environment when integrating with a task-service container. This commit resolves that problem by detecting if tests are running and generating a random unique ID if that is the case. * Forgot to import Gene * Fixed task-def creation request data In accordance with commits made in the task-service repository * Added endpoint for completed notebook upload to classifier (for ml-worker) When ml-worker completes running a notebook, it needs to upload the completed notebook to core-service so that core-service can send an email to the user with a link to download their completed notebook. This commit enabled that functionality by adding: - New authentication permission designed to only allow an internal service to upload a notebook - notebook_file attribute to Classifier model and serializer - rudimentary file storage logic (stores files locally under /media_files/notebook/classifier_<id>.ipynb, no S3 integration yet) - Tests for notebook uploads, which include uploading a real notebook - Whenever a test is ran that creates a file, you are always left with the directory still on your filesystem after the test. I added a test runner file which will delete the media_files directory after testing * Make classifiers write-once only For now, we will assume classifiers cannot be updated. * Email & S3 (#2) * Added sending email upon notebook upload * Added S3 integration with django-storages Followed this guide: https://www.caktusgroup.com/blog/2014/11/10/Using-Amazon-S3-to-store-you r-Django-sites-static-and-media-files/ * Fixed isses with sending email * Consolidated task-service and core-service All of the task-service functionality is now ported over into core-service, including queueing, serialization, views, etc. All of the relevant columns that used to be stored on Task and TaskDef objects in task-service are now stored directly on the classifier in core-service. This greatly simplifies Cognoma’s overall architecture and codebase. * Converted flags to long args in circle.yml
#79) * Task Integration (#1) * Testing CircleCI * Testing new AWS IAM credentials/permissions * Increased deployment timeout threshold (#64) * Increased ecs_deploy timeout threshold to 180 seconds The most recent deploy took about 120 seconds, but the current threshold for the ecs_deploy script timeout is 90 seconds. This causes builds on CircleCI to fail with the red X - not a good look. 180 seconds plays it on the safe side without being too long. * Create task after classifer * Task creation working * Expand task on get * Expand on classifier post * Starting task creation tests * Expanding task(s) * Fixed tests The unique ID of a task was causing problems in the testing environment when integrating with a task-service container. This commit resolves that problem by detecting if tests are running and generating a random unique ID if that is the case. * Forgot to import Gene * Fixed task-def creation request data In accordance with commits made in the task-service repository * Added endpoint for completed notebook upload to classifier (for ml-worker) When ml-worker completes running a notebook, it needs to upload the completed notebook to core-service so that core-service can send an email to the user with a link to download their completed notebook. This commit enabled that functionality by adding: - New authentication permission designed to only allow an internal service to upload a notebook - notebook_file attribute to Classifier model and serializer - rudimentary file storage logic (stores files locally under /media_files/notebook/classifier_<id>.ipynb, no S3 integration yet) - Tests for notebook uploads, which include uploading a real notebook - Whenever a test is ran that creates a file, you are always left with the directory still on your filesystem after the test. I added a test runner file which will delete the media_files directory after testing * Make classifiers write-once only For now, we will assume classifiers cannot be updated. * Email & S3 (#2) * Added sending email upon notebook upload * Added S3 integration with django-storages Followed this guide: https://www.caktusgroup.com/blog/2014/11/10/Using-Amazon-S3-to-store-you r-Django-sites-static-and-media-files/ * Fixed isses with sending email * Consolidated task-service and core-service All of the task-service functionality is now ported over into core-service, including queueing, serialization, views, etc. All of the relevant columns that used to be stored on Task and TaskDef objects in task-service are now stored directly on the classifier in core-service. This greatly simplifies Cognoma’s overall architecture and codebase. * Quick bug fixes Forgot to remove references to serializer after a changed import statement. Test runner will now not error if no media files were created. * Added https -> https redirect in nginx * Fixed & updated email sending - Upload request would fail if the user had no email registered. Added fail_silently=True to bypass this failure. - Updated the email message to include a link to the nbviewer website. - Simplified MLWorkers permission logic * User/Classifier security enhancements and /genes/ pagination - /users/ only provides access to create users. Endpoint will not return a list of users anymore. - access to /users/id/ is only given to users accessing themselves and internal services. Users/anonymous users cannot access other users. - /classifiers/ only provides access to create classifiers. Endpoint will not return a list of classifiers anymore. - Before, accessing the /genes/ endpoint really slowed down the server because it had to process 100 genes and their mutations. I lowered the pagination size to 10 which speeds things up significantly. * Removed unnecessary UniqueTaskConflict This isn’t used anymore due to the core-service/task-service consolidation. * Added email message for classifier processing failure * Hotfix Was trying to access Classifier object information directly on the serializer, which won’t work. * Comment out genes endpoints and tests It appears that these are not needed at the moment. * Commented out mutations endpoint Appears unneeded for now. * Updated mutations test status codes * Forgot to comment out extraneous test assertions for mutation endpoints
Researchers will select which samples they want to include in the analysis. From a machine learning point of view, we mean which examples are relevant to the researcher. These samples will have various metadata. The GDC Data Portal [ https://gdc-portal.nci.nih.gov/search/s ] has a very nice interface for these metadata. Essentially the facets on the left for "cases" are the same ones that we would expect to be relevant here.
The text was updated successfully, but these errors were encountered: