-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DICOMfileClient for working with locally stored DICOM Part10 files #56
Conversation
Thoughts on introducing either a shared abstract base class or protocol (via typing-extensions) to allow for better type hinting? Given they share an API, users may wish to inject an instance of |
+1 for an abstract base class. Could also make sense to use this to provide a python implementation of precomputed dicomweb results in files/buckets as discussed with @chafey. |
Yes, that's exactly the motivation for the file client. Given your preference of using a shared base class or protocol, I assume you favor adding the file client to this package. I see pros and cons for subtyping and structural subtyping (PEP 544). However, I recently got bitten a few times by inheritance in Python and would thus prefer
What should the protocol be called? What about |
Absolutely! I have experimented with this already and it should be straight forward to implement a The only part that's a bit tricky from a performance perspective is reading and indexing DICOM metadata. A |
I am, though I do see the merit in making numpy/pillow opt-in, in particular pillow.
No preference on ABC vs Protocol.
DICOMStore or some variant? |
To avoid creating a significant number of implementations, in particular because there may be interest from Azure and GCP, it may be easier to have a single version of this client and inject a "flavor" argument upon creation. A logical next step, assuming the part 10 implementation is merged, may also be to consider creatong a hybrid interface which accepts two or more clients. Use cases could include query federation and caching of fetched studies. |
Can all this be pre-calculated? If the metadata is big perhaps the client can use range requests to implement overlapped requests for improved performance. |
As an aside @hackermd, it may also be useful to start thinking about module/package organization if the number of implementations will continue to grow. I don't think it's vital now per se, but it may make things easier/cleaner in the future. |
I have considered and explored that initially, but found it messy because the web and file services/protocols are quite different (HTTP versus POSIX) and require different configuration (authentication, authorization, etc.).
In our experience, once the client objects have been created, they behave similarly independent of their type (in the spirit of structural duct-typing). So far, we have been annotating them as |
To be clear, I was only referring to the pre-computed DICOMweb results in blobs that @pieper was referring to. My hunch is that there'd be a high degree of similarity between cloud vendors, though I certainly could be mistaken.
This would be more about ease of use and less about typing. Imagine I'd like to fetch a study, preferentially from a cache if I already have it on disk. It could be implemented as follows: web_client = DICOMwebClient(...)
file_client = DICOMfileClient(...)
try:
out = file_client.retrieve_study(...)
except FileNotFoundError:
out = web_client.retrieve_study(...) Doing this repeatedly could become quite verbose, whereas a thin wrapper could make this more ergonomic. |
An earlier version of this client actually used JSON files (formatted according to the DICOM JSON model and structured according to the DICOMweb Query resource definitions), which were stored alongside the DICOM files (similar to DICOMDIR files). This approach works fine for reading, but it doesn't work well for writing ( For now, I prefer keeping the database, which contains the data pointers and indices, separate from the data and not assume the data to be organized in a specific way. The implementation of the database currently is fully abstracted via the The downside of this approach is that every user has to index the data. The upside is that we don't need to standardize the structure of the indices. That is something we could bring to the DICOM working groups, though. |
I like the idea. However, the exact data access logic may be highly application specific (e.g., try to fetch over web first or use a different client depending on the modality) and an abstraction layer may incur significant performance overhead. Therefore, I am not sure whether such wrappers should be provided by the library or rather implemented by applications. |
In favour of including in this package. For me this seems like a case well suited to structural subtyping (protocols), unless parts of the implementation would also be shared in which case there may be a case for inheritance. However wouldn't use of typing.Protocol require Python >= 3.8 ? No strong opinions on the cpython requirement, I can certainly see the case to avoid requiring it though I can't imagine too many uses cases where users would be using this package and never touching pixel data. Having numpy and pillow as dependencies may have other advantages too - such as when retrieving rendered frames. |
@hackermd , the way I was envisioning it, the user would provide an ordered sequence of clients, e.g. file_client = DICOMfileClient(...)
web_client = DICOMwebClient(...)
web_client2 = DICOMwebClient(...)
# only go to the web if it's not present locally
caching_client = ComposableDICOMClient([file_client, web_client], sequential=True)
# parallelize requests in a federated PACS setup
federated_client = ComposableDICOMClient([web_client, web_client2], sequential=False)
# and of course, they're composable themselves
caching_federated_client = ComposableDICOMClient([file_client, federated_client], sequential=True) If we wanted to get especially crazy, we could add logic (or the ability for the user to insert logic) on how/when the client should write to the cache.
@CPBridge, typing-extensions backports |
@ntenenz I've now implemented the from dicomweb_client.api import DICOMwebClient, DICOMfileClient
from dicomweb_client.protocol import DICOMwebProtocol
web_client = DICOMwebClient(...)
assert isinstance(web_client, DICOMwebProtocol)
file_client = DICOMfileClient(...)
assert isinstance(file_client, DICOMwebProtocol) |
The following four methods of the
Access of bulkdata is a bit tricky. Currently, all elements other than Pixel Data, Float Pixel Data, and Double Float Pixel Data are included in metadata. We could create URLs (using a hash other another method), include the URL as Rendering series and instances will require more work and thought. It's currently not clear to me what the exact behavior should be. We can certainly support rendering single-frame image instances via |
If I understand correctly, it seems to me that the BulkDataURI for the file client introduces a logical headache. It seems the standard would assume that you can access that data using web network transactions without the need of a client. But if your goal is to have the file client be a drop in replacement you would need to require that users of the client always resolve these URIs via the client. That's not a terrible requirement, but it's a slightly different paradigm. |
IMO, if a method will raise a NotImplementedError, the class doesn't truly implement the protocol. If user code accepts an object whose implementation adheres to the protocol and only leverages methods found within it, it would be a reasonable expectation for that code to successfully complete in the absence of other environmental/logic errors. Perhaps consider narrowing the definition of the protocol and creating more expansive versions in the future? Note: I'm replying from my phone, so I apologize if the file client indeed implements all the methods in the protocol. |
I agree with you and think we should implement the methods before merging. However, I also think it's reasonable for the serverless implementation to raise exceptions that mimicstatus codes. For the We could provide an abstraction layer for HTTP errors (e.g., |
Alternatively, the methods of the |
It's fairly common to have hierarchies of Protocols which extend the functionality of each other. If one or more methods of an implementation will always raise an exception, regardless of input, it suggests (to me) that the class is likely implementing a more narrowly defined Protocol. Conversely, if you're suggesting that the exception will be raised for certain inputs based on the "server's" state, that to me feels very different. |
I have "implemented" the The
That's very broad and I imagine the expected behaviour would be dependent on the modality and the acceptable media types. Since the |
This PR adds the
DICOMfileClient
, a class that facilitates serverless query, retrieval, and storage of DICOM Data Sets stored locally in DICOM Part10 files. It exposes the same API as theDICOMwebClient
.The
DICOMfileClient
recursively searches for DICOM files inbase_dir
, reads (a subset) of the metadata of the data sets stored in the files, and indexes the metadata in a SQLite database to enable efficient queries. In addition, it reads individual frames of multi-frame images efficiently by using a Basic Offset Table (BOT) and by only selectively loading requested frames into memory (critical for large SM images!). Last but not least, it can be pickeled and used with Python multiprocessing (for example with a PyTorchDataLoader
).Considerations
The
DICOMfileClient
depends on the NumPy and Pillow libraries. Therefore, this PR would introduce a dependency on C Python. I don't think this is a major issue and most users will likely have NumPy and Pillow installed anyways.If this should turn out to be an issue, we could either
DICOMfileClient
into a separate package (also not ideal because the class is supposed to have the same API as theDICOMwebClient
and having both in one package is advantageous in this regard)