Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AL-1713] Dicom #1572

Merged
merged 96 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from 89 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
9367a1c
init
farizrahman4u Feb 9, 2022
6870c6e
Merge branch 'main' of https://www.github.com/activeloopai/hub into f…
farizrahman4u Feb 10, 2022
7ebb58b
partial upload
farizrahman4u Feb 10, 2022
e52d415
test fixes
farizrahman4u Feb 14, 2022
0452a21
mypy
farizrahman4u Feb 16, 2022
b382e64
merge main
farizrahman4u Feb 16, 2022
54ebb01
merge main
farizrahman4u Feb 16, 2022
fe0f102
more test
farizrahman4u Feb 16, 2022
5a21814
Update __init__.py
farizrahman4u Feb 16, 2022
3846862
refac + tests
farizrahman4u Feb 22, 2022
c41a99e
Merge branch 'fr_partial_upload' of https://www.github.com/activeloop…
farizrahman4u Feb 22, 2022
ee21646
rem debg line
farizrahman4u Feb 22, 2022
27a314c
Merge branch 'main' into fr_partial_upload
farizrahman4u Feb 22, 2022
d8eec0f
sequence htype
farizrahman4u Feb 23, 2022
0384067
Merge branch 'fr_partial_upload' into fr_sequence_htype
farizrahman4u Feb 23, 2022
1f5d7f3
fix
farizrahman4u Feb 24, 2022
b0a5c01
Merge branch 'fr_sequence_htype' of https://www.github.com/activeloop…
farizrahman4u Feb 24, 2022
e40ca57
test
farizrahman4u Mar 1, 2022
846838e
update test
farizrahman4u Mar 2, 2022
5ef3f8c
fix updates
farizrahman4u Mar 3, 2022
d280215
typo
farizrahman4u Mar 3, 2022
ffe8059
merge main
farizrahman4u Mar 3, 2022
8e96cff
fix double indexing
farizrahman4u Mar 3, 2022
2db488a
format
farizrahman4u Mar 3, 2022
29180b5
test shape bug
farizrahman4u Mar 3, 2022
95f2ae9
format
farizrahman4u Mar 3, 2022
fbe28ae
compression tests
farizrahman4u Mar 4, 2022
e4d6c45
fix test_like
farizrahman4u Mar 4, 2022
5ccec73
format
farizrahman4u Mar 6, 2022
8ef00da
hub.read + sequence tests
farizrahman4u Mar 6, 2022
93a2c08
format
farizrahman4u Mar 6, 2022
54cf493
fix text tests
farizrahman4u Mar 6, 2022
92fa5b6
more tests
farizrahman4u Mar 6, 2022
6d93192
cleanup
farizrahman4u Mar 6, 2022
31e13ef
format
farizrahman4u Mar 6, 2022
2a5ed6d
fixes
farizrahman4u Mar 7, 2022
a9aa858
Merge branch 'main' into fr_sequence_htype
farizrahman4u Mar 8, 2022
4e4ebe8
pop and commit diff fixes
farizrahman4u Mar 9, 2022
661043d
merge main
farizrahman4u Mar 9, 2022
ac37b1e
format
farizrahman4u Mar 9, 2022
4da9e50
vc test fix
farizrahman4u Mar 9, 2022
f5fa11e
format
farizrahman4u Mar 9, 2022
4e50d52
shape and err msg fix
farizrahman4u Mar 9, 2022
d90a65e
linked tensors init
farizrahman4u Mar 9, 2022
4f7ed8e
updates
farizrahman4u Mar 9, 2022
d837d75
merge seq ht
farizrahman4u Mar 9, 2022
b2e8827
callback
farizrahman4u Mar 9, 2022
c98b2c0
mypy
farizrahman4u Mar 9, 2022
73c12f7
Merge branch 'fr_sequence_htype' of https://www.github.com/activeloop…
farizrahman4u Mar 9, 2022
3d879e3
cbs
farizrahman4u Mar 9, 2022
4465617
merge main
farizrahman4u Mar 14, 2022
1cce050
updates
farizrahman4u Mar 14, 2022
71807a9
fix
farizrahman4u Mar 14, 2022
9cb4af2
idx fix
farizrahman4u Mar 14, 2022
836b27d
test
farizrahman4u Mar 14, 2022
ad07379
sequence id test
farizrahman4u Mar 14, 2022
5f8efa3
update test
farizrahman4u Mar 14, 2022
7d8da06
format
farizrahman4u Mar 14, 2022
05c02ca
meta fix
farizrahman4u Mar 14, 2022
2346eaa
no op update
farizrahman4u Mar 15, 2022
c8d59d0
Merge branch 'main' of https://www.github.com/activeloopai/hub into f…
farizrahman4u Mar 16, 2022
7153b47
sample info
farizrahman4u Mar 16, 2022
d7c6260
video meta
FayazRahman Mar 16, 2022
32013b0
Merge branch 'fr_sample_meta2' of https://github.com/activeloopai/Hub…
FayazRahman Mar 22, 2022
25b9bab
fixes
FayazRahman Mar 22, 2022
1e089d8
audio meta
FayazRahman Mar 23, 2022
d1c342f
merge main
farizrahman4u Mar 24, 2022
9dfa41e
fix
farizrahman4u Mar 24, 2022
8ae08a5
fix
farizrahman4u Mar 24, 2022
64bc740
refc
farizrahman4u Mar 24, 2022
97f6080
av
farizrahman4u Mar 24, 2022
214bc50
smol fix
FayazRahman Mar 27, 2022
53d9da1
black + mypy
FayazRahman Mar 28, 2022
8175a6d
updates
farizrahman4u Mar 28, 2022
2ab9cde
Merge branch 'fr_sample_meta2' of https://www.github.com/activeloopai…
farizrahman4u Mar 28, 2022
5efd75d
fix
farizrahman4u Mar 28, 2022
30d755d
fix
farizrahman4u Mar 28, 2022
876f460
mypy
farizrahman4u Mar 28, 2022
4c93c47
better exc
farizrahman4u Mar 28, 2022
81f3be1
req
farizrahman4u Mar 28, 2022
9e59312
dicom
farizrahman4u Mar 29, 2022
8ca6b4a
test sample info
FayazRahman Mar 29, 2022
af96e02
smol fix
farizrahman4u Mar 29, 2022
3a2bace
Merge branch 'fr_sample_meta2' of https://www.github.com/activeloopai…
farizrahman4u Mar 29, 2022
11df018
Merge branch 'fr_sample_meta2' of https://www.github.com/activeloopai…
farizrahman4u Mar 29, 2022
11d4d91
fix tests
FayazRahman Mar 29, 2022
ece30de
Merge branch 'main' into fr_dicom
farizrahman4u Mar 30, 2022
0984baa
fix tests
FayazRahman Mar 31, 2022
b6d7662
black
FayazRahman Mar 31, 2022
2c19581
fixes
farizrahman4u Apr 5, 2022
b2b9a54
Merge branch 'fr_dicom' of https://www.github.com/activeloopai/hub in…
farizrahman4u Apr 5, 2022
04317c6
fix
farizrahman4u Apr 5, 2022
a4566b9
jpg fix
farizrahman4u Apr 5, 2022
ba5860c
format
farizrahman4u Apr 6, 2022
caa8cfb
Merge branch 'main' into fr_dicom
farizrahman4u Apr 6, 2022
7c7aa5d
Update common.txt
farizrahman4u Apr 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions hub/api/read.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ def read(
Image: "bmp", "dib", "gif", "ico", "jpeg", "jpeg2000", "pcx", "png", "ppm", "sgi", "tga", "tiff", "webp", "wmf", "xbm"
Audio: "flac", "mp3", "wav"
Video: "mp4", "mkv", "avi"
Dicom: "dcm"

Args:
path (str): Path to a supported file.
Expand Down
2 changes: 2 additions & 0 deletions hub/api/tests/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,7 @@ def test_compressions_list():
"apng",
"avi",
"bmp",
"dcm",
"dib",
"flac",
"gif",
Expand Down Expand Up @@ -892,6 +893,7 @@ def test_htypes_list():
"bbox",
"binary_mask",
"class_label",
"dicom",
"generic",
"image",
"json",
Expand Down
37 changes: 37 additions & 0 deletions hub/api/tests/test_dicom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import hub
import pytest
from pydicom.data import get_testdata_file
from pydicom import dcmread


def test_dicom_basic(memory_ds):
ds = memory_ds
path = get_testdata_file("MR_small.dcm")
with ds:
ds.create_tensor("x", htype="dicom")
dcm = hub.read(path)
assert dcm.dtype == "int16"
assert dcm.shape == (64, 64, 1)
ds.x.append(dcm)
ds.x.append(dcm)
assert ds.x.dtype == "int16"
arr = ds.x.numpy()
assert arr.dtype == "int16"
assert arr.shape == (2, 64, 64, 1)
for item in dcmread(path):
if not isinstance(item.value, bytes):
assert item.keyword in ds.x[0].sample_info


def test_dicom_mixed_dtype(memory_ds):
ds = memory_ds
with ds:
ds.create_tensor("x", htype="dicom")
dcm = hub.read(get_testdata_file("MR_small.dcm"))
assert dcm.dtype == "int16"
ds.x.append(dcm)
dcm = hub.read(get_testdata_file("ExplVR_BigEnd.dcm"))
assert dcm.dtype == "uint8"
ds.x.append(dcm)
arr = ds.x[:, :10, :10, :1].numpy()
assert arr.dtype == "int16"
7 changes: 3 additions & 4 deletions hub/api/tests/test_sample_info.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
from miniaudio import mp3_get_file_info # type: ignore
from PIL import Image # type: ignore
from PIL.ExifTags import TAGS # type: ignore

import hub
import pytest
from miniaudio import mp3_get_file_info # type: ignore
import numpy as np
import pytest
import os
import sys
import hub


def get_exif_helper(path):
Expand Down
2 changes: 2 additions & 0 deletions hub/compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@
c for c in IMAGE_COMPRESSIONS if c.upper() in Image.SAVE and c.upper() in Image.OPEN
]


IMAGE_COMPRESSIONS.insert(0, "apng")
IMAGE_COMPRESSIONS.insert(2, "dcm")
farizrahman4u marked this conversation as resolved.
Show resolved Hide resolved

SUPPORTED_COMPRESSIONS = [
*BYTE_COMPRESSIONS,
Expand Down
8 changes: 4 additions & 4 deletions hub/core/chunk_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -978,7 +978,7 @@ def _numpy(
length = self.num_samples
last_shape = None
enc = self.chunk_id_encoder

htype = self.tensor_meta.htype
if use_data_cache and self.is_data_cachable:
samples = self.numpy_from_data_cache(index, length, aslist)
else:
Expand All @@ -999,9 +999,9 @@ def _numpy(
)[tuple(entry.value for entry in index.values[2:])]
else:
chunk = self.get_chunk_from_chunk_id(chunk_ids[0])
sample = chunk.read_sample(local_sample_index)[
tuple(entry.value for entry in index.values[1:])
]
sample = chunk.read_sample(
local_sample_index, cast=htype != "dicom"
)[tuple(entry.value for entry in index.values[1:])]
elif len(index.values) == 1:
# Tiled sample, all chunks required
chunks = self.get_chunks_for_sample(global_sample_index)
Expand Down
48 changes: 45 additions & 3 deletions hub/core/compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,8 @@ def decompress_array(

if compression == "apng":
return _decompress_apng(buffer) # type: ignore
if compression == "dcm":
return _decompress_dicom(buffer) # type: ignore
try:
if shape is not None and 0 in shape:
return np.zeros(shape, dtype=dtype)
Expand Down Expand Up @@ -420,6 +422,8 @@ def verify_compressed_file(
elif compression in ("mp4", "mkv", "avi"):
if isinstance(file, (bytes, memoryview, str)):
return _read_video_shape(file), "|u1" # type: ignore
elif compression == "dcm":
return _read_dicom_shape_and_dtype(file)
farizrahman4u marked this conversation as resolved.
Show resolved Hide resolved
else:
return _fast_decompress(file)
except Exception as e:
Expand All @@ -434,10 +438,11 @@ def verify_compressed_file(
def get_compression(header=None, path=None):
if path:
# These formats are recognized by file extension for now
file_formats = ["mp3", "flac", "wav", "mp4", "mkv", "avi"]
file_formats = [".mp3", ".flac", ".wav", ".mp4", ".mkv", ".avi", ".dcm"]
path = str(path).lower()
for fmt in file_formats:
if str(path).lower().endswith("." + fmt):
return fmt
if path.endswith(fmt):
return fmt[1:]
if header:
if not Image.OPEN:
Image.init()
Expand Down Expand Up @@ -606,6 +611,8 @@ def read_meta_from_compressed_file(
shape, typestr = _read_png_shape_and_dtype(f)
except Exception:
raise CorruptedSampleError("png")
elif compression == "dcm":
shape, typestr = _read_dicom_shape_and_dtype(f)
elif get_compression_type(compression) == AUDIO_COMPRESSION:
try:
shape, typestr = _read_audio_shape(file, compression), "<f4"
Expand Down Expand Up @@ -696,6 +703,41 @@ def _read_jpeg_shape_from_buffer(buf: bytes) -> Tuple[int, ...]:
return shape


def _read_dicom_shape_and_dtype(
f: Union[bytes, BinaryIO]
) -> Tuple[Tuple[int, ...], str]:
try:
from pydicom import dcmread
from pydicom.pixel_data_handlers.util import pixel_dtype
except ImportError:
raise ModuleNotFoundError(
"Pydicom not found. Install using `pip install pydicom`"
)
if not hasattr(f, "read"):
f = BytesIO(f) # type: ignore
dcm = dcmread(f)
nchannels = dcm[0x0028, 0x0002].value
shape = (dcm.Rows, dcm.Columns, nchannels)
isfloat = "FloatPixelData" in dcm or "DoubleFloatPixelData" in dcm
dtype = pixel_dtype(dcm, isfloat).str
return shape, dtype


def _decompress_dicom(f: Union[str, bytes, BinaryIO]):
if isinstance(f, (bytes, memoryview, bytearray)):
f = BytesIO(f)
try:
from pydicom import dcmread
except ImportError:
raise ModuleNotFoundError(
"Pydicom not found. Install using `pip install pydicom`"
)
arr = dcmread(f).pixel_array
if arr.ndim == 2:
return np.expand_dims(arr, -1)
return arr


def _read_png_shape_and_dtype(f: Union[bytes, BinaryIO]) -> Tuple[Tuple[int, ...], str]:
"""Reads shape and dtype of a png file from a file like object or file contents.
If a file like object is provided, all of its contents are NOT loaded into memory."""
Expand Down
3 changes: 2 additions & 1 deletion hub/core/dataset/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ def create_tensor(
if info_kwargs:
tensor.info.update(info_kwargs)
self.storage.maybe_flush()
if create_sample_info_tensor and htype in ("image", "audio", "video"):
if create_sample_info_tensor and htype in ("image", "audio", "video", "dicom"):
self._create_sample_info_tensor(name)
if create_shape_tensor and htype not in ("text", "json"):
self._create_sample_shape_tensor(name, htype=htype)
Expand All @@ -426,6 +426,7 @@ def _create_sample_shape_tensor(self, tensor: str, htype: str):
create_id_tensor=False,
create_sample_info_tensor=False,
create_shape_tensor=False,
max_chunk_size=SAMPLE_INFO_TENSOR_MAX_CHUNK_SIZE,
)
f = "append_len" if htype == "list" else "append_shape"
self._link_tensors(
Expand Down
77 changes: 66 additions & 11 deletions hub/core/sample.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ast import Bytes
farizrahman4u marked this conversation as resolved.
Show resolved Hide resolved
from hub.core.compression import (
compress_array,
decompress_array,
Expand Down Expand Up @@ -114,10 +115,10 @@ def buffer(self):

@property
def dtype(self):
if self._dtype:
return self._dtype
self._read_meta()
return np.dtype(self._typestr).name
if self._dtype is None:
self._read_meta()
self._dtype = np.dtype(self._typestr).name
return self._dtype

@property
def shape(self):
Expand All @@ -130,6 +131,23 @@ def compression(self):
self._read_meta()
return self._compression

def _load_dicom(self):
if self._array is not None:
return
try:
from pydicom import dcmread
except ImportError:
raise ModuleNotFoundError(
"Pydicom not found. Install using `pip install pydicom`"
)
if self.path and get_path_type(self.path) == "local":
dcm = dcmread(self.path)
else:
dcm = dcmread(BytesIO(self.buffer))
self._array = dcm.pixel_array
self._shape = self._array.shape
self._typestr = self._array.__array_interface__["typestr"]

def _read_meta(self, f=None):
if self._shape is not None:
return
Expand All @@ -152,6 +170,33 @@ def _read_meta(self, f=None):
if store:
self._compressed_bytes[self._compression] = f

def _get_dicom_meta(self) -> dict:
try:
from pydicom import dcmread
from pydicom.dataelem import RawDataElement
except ImportError:
raise ModuleNotFoundError(
"Pydicom not found. Install using `pip install pydicom`"
)
if self.path and get_path_type(self.path) == "local":
dcm = dcmread(self.path)
else:
dcm = dcmread(BytesIO(self.buffer))

meta = {
x.keyword: {
"name": x.name,
"tag": str(x.tag),
"value": x.value
if isinstance(x.value, (str, int, float))
else x.to_json_dict(None, None).get("Value", ""), # type: ignore
"vr": x.VR,
}
for x in dcm
if not isinstance(x.value, bytes)
}
return meta

def _get_video_meta(self) -> dict:
if self.path and get_path_type(self.path) == "local":
container, vstream = _open_video(self.path)
Expand Down Expand Up @@ -244,13 +289,20 @@ def uncompressed_bytes(self) -> bytes:
"""Returns uncompressed bytes."""

if self._uncompressed_bytes is None:
if self._array is not None:
self._uncompressed_bytes = self._array.tobytes()
return self._uncompressed_bytes
if self.path is not None:
compr = self._compression
if compr is None:
compr = get_compression(path=self.path)
if get_compression_type(compr) in (
AUDIO_COMPRESSION,
VIDEO_COMPRESSION,
if (
get_compression_type(compr)
in (
AUDIO_COMPRESSION,
VIDEO_COMPRESSION,
)
or compr == "dcm"
):
self._compression = compr
if self._array is None:
Expand Down Expand Up @@ -393,12 +445,15 @@ def _getexif(self) -> dict:
@property
def meta(self) -> dict:
meta: Dict[str, Union[Dict, str]] = {}
compression_type = get_compression_type(self.compression)
if compression_type == IMAGE_COMPRESSION:
compression = self.compression
compression_type = get_compression_type(compression)
if compression == "dcm":
meta.update(self._get_dicom_meta())
elif compression_type == IMAGE_COMPRESSION:
meta["exif"] = self._getexif()
if compression_type == VIDEO_COMPRESSION:
elif compression_type == VIDEO_COMPRESSION:
meta.update(self._get_video_meta())
if compression_type == AUDIO_COMPRESSION:
elif compression_type == AUDIO_COMPRESSION:
meta.update(self._get_audio_meta())
meta["shape"] = self.shape
meta["format"] = self.compression
Expand Down
5 changes: 1 addition & 4 deletions hub/core/tests/test_compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,10 @@
from PIL import Image # type: ignore


compressions = SUPPORTED_COMPRESSIONS[:]
compressions.remove(None) # type: ignore
compressions.remove("wmf") # driver has to be provided by user for wmf write support

image_compressions = IMAGE_COMPRESSIONS[:]
image_compressions.remove("wmf")
image_compressions.remove("apng")
image_compressions.remove("dcm")


@pytest.mark.parametrize("compression", image_compressions + BYTE_COMPRESSIONS)
Expand Down
1 change: 1 addition & 0 deletions hub/htype.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
},
"list": {"dtype": "List"},
"text": {"dtype": "str"},
"dicom": {"sample_compression": "dcm"},
farizrahman4u marked this conversation as resolved.
Show resolved Hide resolved
}

HTYPE_VERIFICATIONS: Dict[str, Dict] = {
Expand Down
3 changes: 2 additions & 1 deletion hub/requirements/common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ humbug>=0.2.6
tqdm
numcodecs
miniaudio~=1.44
av>=8.1.0; python_version >= '3.7' or sys_platform != 'win32'
av>=8.1.0; python_version >= '3.7' or sys_platform != 'win32'
pydicom
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
"audio": ["miniaudio"],
"gcp": ["google-cloud-storage", "google-auth", "google-auth-oauthlib"],
"video": ["av"],
"dicom": ["pydicom"],
}

all_extras = {r for v in extras.values() for r in v}
Expand Down