Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepcell task #199

Merged
merged 40 commits into from
Sep 28, 2020
Merged

Deepcell task #199

merged 40 commits into from
Sep 28, 2020

Conversation

omerbt
Copy link
Contributor

@omerbt omerbt commented Aug 25, 2020

This branch includes a new method for automating the process of uploading files to DeepCell, downloading and extracting the output .tif image.

Implementation is based on relevant modules from https://github.com/vanvalenlab/kiosk-client.

According to the current implementation of the kiosk-client, the DeepCell output will be downloaded to a predetermined folder (/downloads). Ideally, the path should be a parameter.

@omerbt omerbt requested a review from ngreenwald August 25, 2020 08:07
@omerbt omerbt marked this pull request as draft August 25, 2020 14:03
Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Omer, great start! The build is failing because of a formatting issue, you can just add a blank line at the end of the file and you should be set.

Could you please update the description of the PR with the indicated information?

@omerbt omerbt marked this pull request as ready for review August 25, 2020 18:44
@omerbt
Copy link
Contributor Author

omerbt commented Aug 25, 2020

Hey, thanks. I fixed the indentation so it be compliant with PEP and added a short (is it enough?) description .

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, just a quick question about the requirements.txt file, which looks like it has some unused functions in it.

Given that you're not actually in our lab, no pressure to do any more than this, it will already be super helpful.

If you're interested in working some more on this, let me know. I think some nice additions would be creating a top-level function that creates a directory for each Point, calls this function with that directory as the output, and then renames the output appropriately. This would avoid having to get the last modified file. It would also be good to add some testing to make sure this works as expected. If you have other stuff you're working on, no worries, we can handle that part. But if you're interested in fleshing this out a bit more, let me know.

Thanks!

requirements.txt Outdated
certifi
pyOpenSSL
service_identity
git+https://github.com/vanvalenlab/kiosk-client@master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the client is pip installable: can you try adding kiosk_client to the requirements.txt file?

It doesn't look like certifi, pyOpenSSL, or service_identify are imported anywhere, is there a reason these are required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to add more methods.
regarding the requirements - they are required in order to be able to upload files using twisted and for the tls certificate (even tough they are not explicitly called in the method). I'm changing the git dependency to pip installation.
Thanks!

@ngreenwald
Copy link
Member

Okay great! I opened an issue in the kiosk-client to see if there's a reason Will set it up this way. If not, I think it would be pretty easy to submit a PR allowing us to specify an output path.

Once we get an answer on that, we'll know how to move forward. I think we'll want to create a function like create_deepcell_output(). This function should handle all of the necessary data manipulation for running deepcell. It should expect a path to a folder of already formatted images, call the helper function you've already written on each of those images, and then format the output of the helper function appropriately.

Once that's set up, we can update the jupyter notebook to use this new function

@ngreenwald
Copy link
Member

Okay, so Will said the reason they never added that option was just because they didn't get to it. I think a PR to address that would be great. If you'd like to take that on, let me know, otherwise Alex or Adam can put that on their todo list.

What that means is that the high level function can expect the files to be named appropriately. This will simplify things further.

Let me know what stuff you'd like to get started on

@omerbt
Copy link
Contributor Author

omerbt commented Aug 26, 2020

Great, so I can start with that PR and then move on to create_deepcell_output()

@ngreenwald
Copy link
Member

Perfect, sounds great. If you don't have write access to kiosk-client you fork the repo and submit your PR from your branch, or ask Will to give you write access.

If you run into any issues feel free to leave a comment on the issue I opened in that repo and Will or I can respond.

@omerbt
Copy link
Contributor Author

omerbt commented Aug 26, 2020

Great thanks!

@ngreenwald ngreenwald self-requested a review September 12, 2020 19:45
@alex-l-kong
Copy link
Contributor

alex-l-kong commented Sep 25, 2020

What's the need for RTD to have access to all of the random dependencies in the project?

It's an annoying part of working with RTD and specifically, building documentation from docstrings. When snaking through the documentation, I think autodoc, which builds the documentation, first looks through the import statements and complains immediately if it can't import it properly (which can only be guaranteed by mocking it up in autodoc_mock_imports). Part of the reason I think it does this is to properly handle a documentation arg-list value or return type of, say, np.ndarray or pd.DataFrame. This is especially the case if you, for example, use something like intersphinx_mapping to directly link to NumPy or Pandas documentation, which we do.

So in Omer's case, when it sees a library such as kiosk_client, I think what's happening is that it tries to import it first so it can properly build any documentation that potentially includes an argument or return type of kiosk_client in it. If it can't import it, it fails before it tries to build potentially bad documentation with potential non-existent references to kiosk_client.

@alex-l-kong
Copy link
Contributor

@ngreenwald I'm guessing one of your concerns is that developers have to know to add to docs/conf.py, which requires them to have had some experience working with Sphinx/RTD in the past.

I've been looking into a way to somehow tell RTD to do a pip install -r requirements.txt on their end so they don't have to rely on autodoc_mock_imports, but my leads have come up dry for now. It's for another PR, for sure.

@ngreenwald
Copy link
Member

My main concern is that kiosk-client has a ton of dependencies that we don't actually care about. So mocking it directly sounds like the best approach since there isn't anything in the documentation that relies on us referencing a class or something from there.

Thanks!

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to Will, it turns out twisted is a requirement of treq, which is a requirement for kiosk-client.

Can you just double check (by commenting out one at a time and rebuilding the environment) that each of the current requirements are actually needed? If so, we can of course add them, but I'd like to avoid adding unnecessary requirements if possible if they're getting installed via the kiosk.

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome Omer! I had some brief comments about the documentation, testing, and requirements.txt.

requirements.txt Outdated
Comment on lines 17 to 19
Twisted~=20.3.0
pyOpenSSL
service_identity
Copy link
Contributor

@alex-l-kong alex-l-kong Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realize this is basically a piggyback off of Noah's comment now, sorry! I wonder if simply including kiosk-client will do the trick. Looking at setup.py of the kiosk-client repo, it looks like it's configured to install treq on a pip install, meaning that if all goes well, we might be able to remove at least twisted, if not also pyOpenSSL and service_identity (assuming treq installs at least twisted as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does! I only checked the requirements file itself (of kiosk-client), but as you and @ngreenwald said, kiosk-client installs all the necessary modules. thanks

warnings.warn(f'Deep Cell output file was not found for {point}.')


def run_deepcell_task(input_dir, output_dir, host='https://deepcell.org',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, we don't need a test function for this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of any testing that would actually be useful. We'd have to mock basically the entire function, and at that point I'm not sure what we're actually testing. Open to suggestions if you think there's functionality worth testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't think there was any additional testing we needed for this since its basically a driver function. Don't think we need to add a test function for this.

Default: 'multiplex'
"""

# more configuration parameters can be set. https://github.com/vanvalenlab/kiosk-client
Copy link
Contributor

@alex-l-kong alex-l-kong Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to put this in the docstring description (before the Args list) so people can reference it on ReadTheDocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it in the last commit

@ngreenwald
Copy link
Member

Hey Omer, this looks great! I'll wait until Adam takes a look before approving in case he has any additional suggestions.

If you'd like to keep contributing to the repository, I've created an issue outlining the next steps for a subsequent PR. If you've got other stuff to work on, no worries at all, one of us can tackle it. Just let me know what works for you!

Here's the issue: #243

@alex-l-kong
Copy link
Contributor

Ditto on Noah's comment. It looks good to me, let's see what Adam has to say.

Copy link
Contributor

@ackagel ackagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! Just one suggestion for the points argument in create_deepcell_output.

Comment on lines 18 to 19
points (list):
List of points in preprocessing pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an option to set points=None so that all .tif files in the input directory can be read in would be pretty useful. We have some tools written to help with this. If you import ark.utils.io_utils, you could use something like:

if points is None:
    tifs = io_utils.list_files(deepcell_input_dir, substrs='.tif')
    points = io_utils.extract_delimited_names(tifs, delimiter='.')

That should plug directly into the code you've already written, assuming I didn't make any typos, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in the last commit

@omerbt
Copy link
Contributor Author

omerbt commented Sep 28, 2020

@ngreenwald thanks! sure, I'll take it.

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Omer, this looks great! I had one comment about your docstring for create_deepcell_output.

Comment on lines 19 to 35
points (list):
List of points in preprocessing pipeline. if None, all .tif files
in deepcell_input_dir will be considered as input points.
deepcell_input_dir (str):
Location of preprocessed files (assume deepcell_input_dir contains <point>.tif
for each point in points list)
suffix (str):
Suffix for DeepCell output filename. e.g. for pointX, DeepCell output
should be <pointX>+suffix.tif. Default: '_feature_0'
deepcell_output_dir (str):
Location to save DeepCell output (as .tif)
host (str):
Hostname and port for the kiosk-frontend API server
Default: 'https://deepcell.org'
job_type (str):
Name of job workflow (multiplex, segmentation, tracking)
Default: 'multiplex'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put your args list in the order you specified your parameters in the function (so, deepcell_input_dir, deepcell_output_dir, points, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the last commit

@alex-l-kong alex-l-kong self-requested a review September 28, 2020 17:30
Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I think it's good to go!

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Copy link
Contributor

@ackagel ackagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@ngreenwald ngreenwald merged commit 7f8eeba into master Sep 28, 2020
@ngreenwald ngreenwald deleted the deepcell-task branch September 28, 2020 18:57
alex-l-kong pushed a commit that referenced this pull request Jan 14, 2021
* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* additional requirements for twister certificates

* install kiosk-client using pip

* create_deepcell_output

* create_deepcell_output

* Update requirements.txt

Co-authored-by: Noah Greenwald <noahfgreenwald@gmail.com>

* create_deepcell_output

* changing back to 'kiosk-client'

* basic testing framework

* modified requirements.txt file

* fixed version issue

* input validation + unit tests

* rerun build

* docstrings formatting

* docstrings formatting

* requirements.txt

* requirements.txt

* requirements.txt

* add dependencies to conf.py

* test multiple .tif

* remove unnecessary modules

* remove unnecessary modules

* make points an optional parameter
+ testing

* Docstring parameters reorder

Co-authored-by: Noah Greenwald <noahfgreenwald@gmail.com>
y2kbugger pushed a commit that referenced this pull request Jul 29, 2021
* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* Use kiosk-client modules for automated DeepCell tasks

* Update deepcell_service_utils.py

* Use kiosk-client modules for automated DeepCell tasks

* additional requirements for twister certificates

* install kiosk-client using pip

* create_deepcell_output

* create_deepcell_output

* Update requirements.txt

Co-authored-by: Noah Greenwald <noahfgreenwald@gmail.com>

* create_deepcell_output

* changing back to 'kiosk-client'

* basic testing framework

* modified requirements.txt file

* fixed version issue

* input validation + unit tests

* rerun build

* docstrings formatting

* docstrings formatting

* requirements.txt

* requirements.txt

* requirements.txt

* add dependencies to conf.py

* test multiple .tif

* remove unnecessary modules

* remove unnecessary modules

* make points an optional parameter
+ testing

* Docstring parameters reorder

Co-authored-by: Noah Greenwald <noahfgreenwald@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants