Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake integration? #86

Closed
jsignell opened this issue Dec 11, 2018 · 4 comments
Closed

Intake integration? #86

jsignell opened this issue Dec 11, 2018 · 4 comments

Comments

@jsignell
Copy link
Member

I was just starting to work on a PIL plugin for intake to support the handling of image stacks. I was hoping to make something that takes as input: paths, file objects, url, or s3 at a minimum. The output I think would be xarray dask arrays.

Do you think this project is sturdy enough to be built on top of in that way or is it moving too quickly?

@jsignell
Copy link
Member Author

Closing this since I just found out about dask.array.image.imread and plan to use that instead.

@jakirkham
Copy link
Member

Sorry to be slow here, @jsignell.

My guess is you are running into issue ( soft-matter/pims#310 ). Though would be great if you could confirm.

The imread implementation here is a bit more efficient when it comes to reading data on the filesystem. This is because Dask needs to know the shape and type information of each image and then it actually needs to load each image later when the computation occurs. IME this was slow because Dask was loading the full image into memory to get this metadata, which we avoid here. Though would imagine this is not great when it comes to making HTTP requests either. Expect what we would really want is the ability to cache the content retrieved from those URLs to make things a bit more performant. I could be wrong about this though. Would be curious to hear your thoughts.

@jsignell
Copy link
Member Author

Sorry if I was unclear. I hadn't run into any issues yet; I just wanted to check the status of this project. In the case I am looking at performance isn't that important. The main thing that we are interested in is maintaining the image labeling that we get from the file names. I ended up writing a new wrapper around scikit-image.io.imread so that we can pass OpenFile objects into the imread. This allows the reading of files from lots of different sources.

@manugarri
Copy link

@jsignell i am experiencing a somewhat similar issue (reading images in bulk from an s3 bucket to perform image classification), would you mind sharing your wrapper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants