Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto upload patch #369

Merged
merged 29 commits into from
Jan 27, 2021
Merged

Auto upload patch #369

merged 29 commits into from
Jan 27, 2021

Conversation

ackagel
Copy link
Contributor

@ackagel ackagel commented Dec 12, 2020

Purpose

Replaces kiosk-client with direct calls to deepcell API. This bypasses the kiosk client upload bug described in issue #302 so smaller zip files don't need to be created; we can just upload one. Consequently closes #302

Implementation

Replaces kiosk-client usage in deepcell_service_utils.py with direct calls to deepcell's api. This more closely mimics direct user interaction with deepcell.org.

Remaining issues

Might want to check with Van Valen lab on how stable their deepcell API will be.

@ackagel
Copy link
Contributor Author

ackagel commented Jan 4, 2021

tested .zip upload of ~183MB. Looks like the uploading is fine, but it seems like deepcell can have an issue where it times itself out with too big of an upload (happens on website too). That looks something like:

Traceback (most recent call last):
File "/usr/src/app/redis_consumer/consumers/base_consumer.py", line 202, in consume
status = self._consume(redis_hash)
File "/usr/src/app/redis_consumer/consumers/multiplex_consumer.py", line 124, in _consume
image = self.predict(image, model_name, model_version)
File "/usr/src/app/redis_consumer/consumers/base_consumer.py", line 614, in predict
model_dtype, untile=untile)
File "/usr/src/app/redis_consumer/consumers/base_consumer.py", line 464, in _predict_big_image
in_tensor_dtype=model_dtype)
File "/usr/src/app/redis_consumer/consumers/base_consumer.py", line 311, in grpc_image
prediction = client.predict(req_data, settings.GRPC_TIMEOUT)
File "/usr/src/app/redis_consumer/grpc_clients.py", line 185, in predict
response = self._retry_grpc(request, request_timeout)
File "/usr/src/app/redis_consumer/grpc_clients.py", line 134, in _retry_grpc
raise err
File "/usr/src/app/redis_consumer/grpc_clients.py", line 123, in _retry_grpc
response = api_call(request, timeout=request_timeout)
File "/usr/local/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1609789741.491197858","description":"Error received from peer ipv4:10.43.253.0:8500","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Deadline Exceeded","grpc_status":4}"

When this type of error happens, some of the files are processed, and others are not, so one possible fix for this type of error would be to recursively upload remaining files until all the expected files exist, or to just batch from the start with smaller zip files.

The only thing with the batching, is that this error doesn't seem to be directly 'size' related, so much as it's 'difficulty-of-segmentation' related, which means there's no obvious batching size/number.

That being said, if deepcell fixes or extends the redis timeouts, then this isn't really an issue, since the uploading itself works fine

@ngreenwald
Copy link
Member

What do you mean it's difficulty of segmentation related?

@ackagel
Copy link
Contributor Author

ackagel commented Jan 4, 2021

I suppose that's more of a guess, but basically, on Tyler's DCIS data (51MB) less than half of the images were processed, while on the duplicated 'fov8.tif' template (183MB), more than 3/4ths of the images were processed. Could be something else, but it's more of a problem of how long it takes the servers to process each image. We could ask Will if there's any reason not to bump up the default redis timeout (settings.GRPC_TIMEOUT), since that would fix the problem. Looks like it's currently set to 30s

@ngreenwald
Copy link
Member

Yes, will you copy the error message and open an issue in the kiosk-redis-consumer repo

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to add on top of Noah. Looks good so far!

@ackagel
Copy link
Contributor Author

ackagel commented Jan 9, 2021

@ngreenwald same pytest/testbook bug on this PR. Other than that, it's good for review.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one commented line left. Did the timeout issue get resolved with Redis?

@@ -86,7 +94,8 @@ def create_deepcell_output(deepcell_input_dir, deepcell_output_dir, fovs=None,

# pass the zip file to deepcell.org
print('Uploading files to DeepCell server.')
run_deepcell_task(zip_path, deepcell_output_dir, host, job_type, scale)
# run_deepcell_task(zip_path, deepcell_output_dir, host, job_type, scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the commented out line

@ackagel
Copy link
Contributor Author

ackagel commented Jan 11, 2021

not sure if the timeout has been increased yet, but I'll test it today

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some simple tests for new function.

If you could also test it yourself actually uploading to deepcell.org with real data to make sure when you supply bad data, the right error message is raised.

@@ -7,7 +7,7 @@
from ark.utils.deepcell_service_utils import create_deepcell_output


def mocked_run_deepcell(input_dir, output_dir, host, job_type, scale):
def mocked_run_deepcell(input_dir, output_dir, host, job_type, scale, timeout):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add some simple checks for _zip_run_extract
e.g. for 210 images, make sure first zip has 100, second zip has 100, third zip has 10.
Make sure first zip contains first 100 images.
Make sure changing zip_size actually changes number of images included.

Can change the default to be like 5 or 2 or something to make previous tests faster.

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing else to add on top of Noah's comments.

@ackagel
Copy link
Contributor Author

ackagel commented Jan 26, 2021

looks like I'm getting the 'correct' error messages from deepcell for bad data

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. I'm fine to merge in the PR with reduced coverage since it's mostly error messages from Deepcell.org that we're catching. The one thing that isn't tested which I think is worth testing is the parallel execution option.

In addition to adding a test, can you also confirm locally that uploading 3 zips of 50 FOVs each simultaneously doesn't lead to any issues with Deepcell.org?

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, almost there

assert os.path.exists(os.path.join(output_dir, 'example_output_2.zip'))

os.remove(os.path.join(output_dir, 'fov1_feature_0.tif'))
os.remove(os.path.join(output_dir, 'fov2_feature_0.tif'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner if we made a new temp folder, rather than deleting the files we've already created. Can either make output1, output2, output3 for each part of the test, or a context manager with a temp dir

with pytest.warns(UserWarning):
create_deepcell_output(deepcell_input_dir=input_dir, deepcell_output_dir=output_dir,
fovs=['fov1'])
pathlib.Path(os.path.join(input_dir, 'fov4.tif')).touch()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific warning that we're making sure is raised here, or is this just so that we now have 4 TIFs for the batch size checks that come next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last pytest.warns(UserWarning) is redundantly checking warnings for 'overwriting file'; definitely could be removed if needed. The extra fov is created for the batch size checking later.

@ngreenwald ngreenwald merged commit 0c9c5ba into master Jan 27, 2021
@ngreenwald ngreenwald deleted the auto-upload-patch branch January 27, 2021 22:21
@ackagel ackagel mentioned this pull request Feb 8, 2021
y2kbugger pushed a commit that referenced this pull request Jul 29, 2021
* adds alternate deepcell call

* adds docstring to new deepcell api caller

* redirects mock in deepcell service testing

* adds argument to deepcell service mocker

* documentation fix

* sanitizes progressbar updates

* upgrades tqdm to notebook version

* testing readout from redis responce

* decodes url error text

* allows failures and better failure printing

* sends expire signal to redis (out of kindness to deepcell's servers)

* extends timeout on file upload

* empty commit for travis testing

* removes commented function call

* removes kiosk-client deepcell upload function

* distributes large zips for deepcell into smaller batches

* fixes error with range-int mult

* fixes zip auto-batching and connects zip_size argument

* parallelizes deepcell uploads

* changes parallelization strategy from processes to threads (due to non-picklability)

* makes parallelization optional

* converts map object to list to run processes

* removes kiosk-client requirement

* adds tests to validate zip batching, and corrects batching in 'no-overflow' case

* adds test for parallel uploads

* prevents race condition on file extraction

* replaces manual file deletion with context management in deepcell unit tests

* removes redundant warning context wrapper in deepcell upload batching test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create multiple, smaller zip files to avoid timeout errors
3 participants