Merge pull request #35 from HumanBrainProject/feat_shardedPrecomputed

Feat: adding support for sharded precomputed
HumanBrainProject · Jan 10, 2024 · 0205370 · 0205370
2 parents a6bce3f + 7f1d4b1
commit 0205370
Show file tree

Hide file tree

Showing 21 changed files with 3,142 additions and 26 deletions.
diff --git a/.github/workflows/tox.yaml b/.github/workflows/tox.yaml
@@ -15,15 +15,13 @@ jobs:
         include:
         - runs-on: 'ubuntu-20.04'
           python-version: '3.6'
-        - runs-on: 'ubuntu-20.04'
-          python-version: '3.5'
     runs-on: ${{ matrix.runs-on }}
     steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v3
       with:
         lfs: true
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v4
       with:
         python-version: ${{ matrix.python-version }}
     - name: PIP cache

diff --git a/README.rst b/README.rst
@@ -74,3 +74,9 @@ This repository uses `pre-commit`_ to ensure that all committed code follows min
 
 .. _Neuroglancer: https://github.com/google/neuroglancer
 .. _pre-commit: https://pre-commit.com/
+
+
+Acknowledgments
+===============
+
+`cloud-volume <https://github.com/seung-lab/cloud-volume>_` (BSD 3-Clause licensed) for compressed morton code and shard/minishard mask implementation.
diff --git a/docs/examples.rst b/docs/examples.rst
@@ -15,7 +15,7 @@ two Nifti files based on the JuBrain human brain atlas, as published in version
 Note that you need to use `git-lfs <https://git-lfs.github.com/>`_ in order to
 see the contents of the NIfTI files (otherwise you can download them `from the
 repository on Github
-<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/JuBrain>`_.
+<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/examples>`_.)
 
 Conversion of the grey-level template image (MNI Colin27 T1 MRI)
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@@ -152,3 +152,84 @@ BigBrain is a very large image (6572 × 7404 × 5711 voxels) reconstructed from
        white_right_327680.gii \
        classif/
    link-mesh-fragments --no-colon-suffix mesh_labels.csv classif/
+
+
+Conversion of the grey-level template image (sharded precomputed)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+.. code-block:: sh
+
+   volume-to-precomputed \
+       --generate-info \
+       --sharding 1,1,0 \
+       colin27T1_seg.nii.gz \
+       colin27T1_seg_sharded
+
+At this point, you need to edit ``colin27T1_seg_sharded/info_fullres.json`` to set
+``"data_type": "uint8"``. This is needed because ``colin27T1_seg.nii.gz`` uses
+a peculiar encoding, with slope and intercept set in the NIfTI header, even
+though only integers between 0 and 255 are encoded.
+
+.. code-block:: sh
+
+  generate-scales-info colin27T1_seg_sharded/info_fullres.json colin27T1_seg_sharded/
+  volume-to-precomputed \
+      --sharding 1,1,0 \
+      colin27T1_seg.nii.gz \
+      colin27T1_seg_sharded/
+  compute-scales colin27T1_seg_sharded/
+
+
+.. _Conversion of Big Brain to sharded precomputed format:
+
+Big Brain (20um) has been converted to neuroglancer precomputed format, and
+accessible at
+https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit.
+Using this as the source volume, a sharded volume will be created.
+
+.. code-block:: sh
+
+  mkdir sharded_bigbrain/
+  curl --output sharded_bigbrain/info \
+    https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit/info
+
+At this point, sharded_bigbrain/info was edited to contain the desired sharding
+specification. For a smaller scale test run, 20um and 40um scales can be
+removed.
+
+.. code-block:: diff
+
+   {
+     "type": "image",
+     "data_type": "uint8",
+     "num_channels": 1,
+     "scales": [
+       {
+         "chunk_sizes": [[64,64,64]],
+         "encoding": "raw",
+         "key": "20um",
+         "resolution": [21166.6666666666666, 20000, 21166.6666666666666],
+         "size": [6572, 7404, 5711],
+   -      "voxel_offset": [0, 0, 0]
+   +      "voxel_offset": [0, 0, 0],
+   +      "sharding": {
+   +         "@type": "neuroglancer_uint64_sharded_v1",
+   +         "data_encoding": "gzip",
+   +         "hash": "identity",
+   +         "minishard_bits": 2,
+   +         "minishard_index_encoding": "gzip",
+   +         "preshift_bits": 0,
+   +         "shard_bits": 2
+   +      }
+       },
+       // ...truncated for brevity
+     ]
+   }
+
+Start the conversion process.
+
+.. code-block:: sh
+
+  convert-chunks \
+    https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit \
+    ./sharded_bigbrain/
diff --git a/docs/script-usage.rst b/docs/script-usage.rst
@@ -21,7 +21,7 @@ OUTSIDE_VALUE] volume_filename dest_url``.
 
 You may want to use :ref:`convert-chunks <convert-chunks>` in a second step, to
 further compres your dataset with JPEG or ``compressed_segmentation``
-encoding).
+encoding.
 
 
 Converting image volumes

diff --git a/docs/serving-data.rst b/docs/serving-data.rst
@@ -99,3 +99,50 @@ following Apache configuration (e.g. put it in a ``.htaccess`` file):
        AddEncoding x-gzip .gz
        AddType application/octet-stream .gz
    </IfModule>
+
+
+Serving sharded data
+====================
+
+
+Content-Encoding
+----------------
+
+Sharded data must be served without any `Content-Encoding header
+<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding>_`.
+
+
+HTTP Range request
+------------------
+
+Sharded data must be served by a webserver that supports `Range header
+<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range>_`.
+
+For development uses, python's bundled SimpleHTTPServer  `does not support
+this <https://github.com/python/cpython/issues/86809>_`. Recommended
+alternatives are:
+
+- `http-server (NodeJS)<https://www.npmjs.com/package/http-server>_`
+
+- `RangeHTTPServer(Python) <https://github.com/danvk/RangeHTTPServer>_`
+
+For production uses, most modern static web servers supports range requests.
+The below is a list of web servers that were tested and works with sharded
+volumes.
+
+- nginx 1.25.3
+
+- httpd 2.4.58
+
+- caddy 2.7.5
+
+In addition, most object storage also supports range requests without
+additional configurations.
+
+
+Enable Access-Control-Allow-Origin header
+-----------------------------------------
+
+`Access-Control-Allow-Origin
+<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin>_`
+will need to be enabled if the volume is expected to be accessed cross origin.
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,9 +1,6 @@
 [build-system]
-# We need support for entry_points in setup.cfg, which needs setuptools>=51.0.0
-# according to the setuptools documentation. However, in my testing it works
-# with version 50.3.2 which is the last to retain Python 3.5 compatibility.
 requires = [
-    "setuptools>=50.3.2",
+    "setuptools>=51.0.0",
     "wheel",
 ]
 build-backend = "setuptools.build_meta"
diff --git a/script_tests/test_scripts.py b/script_tests/test_scripts.py
@@ -1,6 +1,8 @@
-# Copyright (c) CEA
-# Copyright (c) 2018 Forschungszentrum Juelich GmbH
+# Copyright (c) 2018, 2023, 2024 Forschungszentrum Juelich GmbH
+# Copyright (c) 2018, 2023 CEA
+#
 # Author: Yann Leprince <y.leprince@fz-juelich.de>
+# Author: Xiao Gui <xgui3783@gmail.com>
 #
 # This software is made available under the MIT licence, see LICENCE.txt.
 
@@ -97,6 +99,56 @@ def test_all_in_one_conversion(examples_dir, tmpdir):
     # with --mmap / --load-full-volume
 
 
+def test_sharded_conversion(examples_dir, tmpdir):
+    input_nifti = examples_dir / "JuBrain" / "colin27T1_seg.nii.gz"
+    # The file may be present but be a git-lfs pointer file, so we need to open
+    # it to make sure that it is the actual correct file.
+    try:
+        gzip.open(str(input_nifti)).read(348)
+    except OSError as exc:
+        pytest.skip("Cannot find a valid example file {0} for testing: {1}"
+                    .format(input_nifti, exc))
+
+    output_dir = tmpdir / "colin27T1_seg_sharded"
+    assert subprocess.call([
+        "volume-to-precomputed",
+        "--generate-info",
+        "--sharding", "1,1,0",
+        str(input_nifti),
+        str(output_dir)
+    ], env=env) == 4  # datatype not supported by neuroglancer
+
+    with open(output_dir / "info_fullres.json", "r") as fp:
+        fullres_info = json.load(fp=fp)
+    with open(output_dir / "info_fullres.json", "w") as fp:
+        fullres_info["data_type"] = "uint8"
+        json.dump(fullres_info, fp=fp, indent="\t")
+
+    assert subprocess.call([
+        "generate-scales-info",
+        str(output_dir / "info_fullres.json"),
+        str(output_dir)
+    ], env=env) == 0
+    assert subprocess.call([
+        "volume-to-precomputed",
+        "--sharding", "1,1,0",
+        str(input_nifti),
+        str(output_dir)
+    ], env=env) == 0
+    assert subprocess.call([
+        "compute-scales",
+        "--downscaling-method=stride",  # for test speed
+        str(output_dir)
+    ], env=env) == 0
+
+    all_files = [f"{dirpath}/{filename}" for dirpath, _, filenames
+                 in os.walk(output_dir)
+                 for filename in filenames]
+
+    assert len(all_files) == 7, ("Expecting 7 files, but got "
+                                 f"{len(all_files)}.\n{all_files}")
+
+
 def test_slice_conversion(tmpdir):
     # Prepare dummy slices
     path_to_slices = tmpdir / "slices"

diff --git a/setup.cfg b/setup.cfg
@@ -13,7 +13,6 @@ classifiers =
     Intended Audience :: Science/Research
     License :: OSI Approved :: MIT License
     Programming Language :: Python :: 3
-    Programming Language :: Python :: 3.5
     Programming Language :: Python :: 3.6
     Programming Language :: Python :: 3.7
     Programming Language :: Python :: 3.8
@@ -28,7 +27,7 @@ keywords = neuroimaging
 package_dir =
     = src
 packages = find:
-python_requires = ~=3.5
+python_requires = ~=3.6
 install_requires =
     nibabel >= 2
     numpy >= 1.11.0

diff --git a/src/neuroglancer_scripts/accessor.py b/src/neuroglancer_scripts/accessor.py
@@ -1,5 +1,7 @@
-# Copyright (c) 2018 Forschungszentrum Juelich GmbH
+# Copyright (c) 2018, 2023 Forschungszentrum Juelich GmbH
+#
 # Author: Yann Leprince <y.leprince@fz-juelich.de>
+# Author: Xiao Gui <xgui3783@gmail.com>
 #
 # This software is made available under the MIT licence, see LICENCE.txt.
 
@@ -10,6 +12,7 @@
 """
 
 import urllib.parse
+import json
 
 __all__ = [
     "get_accessor_for_url",
@@ -35,15 +38,57 @@ def get_accessor_for_url(url, accessor_options={}):
     r = urllib.parse.urlsplit(url)
     if r.scheme in ("", "file"):
         from neuroglancer_scripts import file_accessor
+        from neuroglancer_scripts import sharded_base
         flat = accessor_options.get("flat", False)
         gzip = accessor_options.get("gzip", True)
         compresslevel = accessor_options.get("compresslevel", 9)
         pathname = _convert_split_file_url_to_pathname(r)
-        return file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
-                                          compresslevel=compresslevel)
+
+        accessor = file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
+                                              compresslevel=compresslevel)
+        is_sharding = False
+        if accessor_options.get("sharding"):
+            is_sharding = True
+        if not is_sharding:
+            try:
+                info = json.loads(accessor.fetch_file("info"))
+                if sharded_base.ShardedAccessorBase.info_is_sharded(info):
+                    is_sharding = True
+            except (DataAccessError, json.JSONDecodeError):
+                # In the event that info does not exist
+                # Or info is malformed
+                # Fallback to default behavior
+                ...
+
+        if is_sharding:
+            from neuroglancer_scripts import sharded_file_accessor
+            return sharded_file_accessor.ShardedFileAccessor(pathname)
+
+        return accessor
+
     elif r.scheme in ("http", "https"):
         from neuroglancer_scripts import http_accessor
-        return http_accessor.HttpAccessor(url)
+        from neuroglancer_scripts import sharded_base
+        accessor = http_accessor.HttpAccessor(url)
+
+        is_sharding = False
+        if "sharding" in accessor_options:
+            is_sharding = True
+        if not is_sharding:
+            try:
+                info = json.loads(accessor.fetch_file("info"))
+                if sharded_base.ShardedAccessorBase.info_is_sharded(info):
+                    is_sharding = True
+            except (DataAccessError, json.JSONDecodeError):
+                # In the event that info does not exist
+                # Or info is malformed
+                # Fallback to default behavior
+                ...
+
+        if is_sharding:
+            from neuroglancer_scripts import sharded_http_accessor
+            return sharded_http_accessor.ShardedHttpAccessor(url)
+        return accessor
     else:
         raise URLError("Unsupported URL scheme {0} (must be file, http, or "
                        "https)".format(r.scheme))

diff --git a/src/neuroglancer_scripts/dyadic_pyramid.py b/src/neuroglancer_scripts/dyadic_pyramid.py
@@ -148,10 +148,14 @@ def downscale_info(scale_level):
 
 
 def compute_dyadic_scales(precomputed_io, downscaler):
+    from neuroglancer_scripts import sharded_file_accessor
     for i in range(len(precomputed_io.info["scales"]) - 1):
         compute_dyadic_downscaling(
             precomputed_io.info, i, downscaler, precomputed_io, precomputed_io
         )
+        if isinstance(precomputed_io.accessor,
+                      sharded_file_accessor.ShardedFileAccessor):
+            precomputed_io.accessor.close()
 
 
 def compute_dyadic_downscaling(info, source_scale_index, downscaler,