Skip to content

Commit

Permalink
Merge pull request #35 from HumanBrainProject/feat_shardedPrecomputed
Browse files Browse the repository at this point in the history
Feat: adding support for sharded precomputed
  • Loading branch information
ylep committed Jan 10, 2024
2 parents a6bce3f + 7f1d4b1 commit 0205370
Show file tree
Hide file tree
Showing 21 changed files with 3,142 additions and 26 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/tox.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,13 @@ jobs:
include:
- runs-on: 'ubuntu-20.04'
python-version: '3.6'
- runs-on: 'ubuntu-20.04'
python-version: '3.5'
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
with:
lfs: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: PIP cache
Expand Down
6 changes: 6 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,9 @@ This repository uses `pre-commit`_ to ensure that all committed code follows min

.. _Neuroglancer: https://github.com/google/neuroglancer
.. _pre-commit: https://pre-commit.com/


Acknowledgments
===============

`cloud-volume <https://github.com/seung-lab/cloud-volume>_` (BSD 3-Clause licensed) for compressed morton code and shard/minishard mask implementation.
83 changes: 82 additions & 1 deletion docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ two Nifti files based on the JuBrain human brain atlas, as published in version
Note that you need to use `git-lfs <https://git-lfs.github.com/>`_ in order to
see the contents of the NIfTI files (otherwise you can download them `from the
repository on Github
<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/JuBrain>`_.
<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/examples>`_.)

Conversion of the grey-level template image (MNI Colin27 T1 MRI)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Expand Down Expand Up @@ -152,3 +152,84 @@ BigBrain is a very large image (6572 × 7404 × 5711 voxels) reconstructed from
white_right_327680.gii \
classif/
link-mesh-fragments --no-colon-suffix mesh_labels.csv classif/
Conversion of the grey-level template image (sharded precomputed)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

.. code-block:: sh
volume-to-precomputed \
--generate-info \
--sharding 1,1,0 \
colin27T1_seg.nii.gz \
colin27T1_seg_sharded
At this point, you need to edit ``colin27T1_seg_sharded/info_fullres.json`` to set
``"data_type": "uint8"``. This is needed because ``colin27T1_seg.nii.gz`` uses
a peculiar encoding, with slope and intercept set in the NIfTI header, even
though only integers between 0 and 255 are encoded.

.. code-block:: sh
generate-scales-info colin27T1_seg_sharded/info_fullres.json colin27T1_seg_sharded/
volume-to-precomputed \
--sharding 1,1,0 \
colin27T1_seg.nii.gz \
colin27T1_seg_sharded/
compute-scales colin27T1_seg_sharded/
.. _Conversion of Big Brain to sharded precomputed format:

Big Brain (20um) has been converted to neuroglancer precomputed format, and
accessible at
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit.
Using this as the source volume, a sharded volume will be created.

.. code-block:: sh
mkdir sharded_bigbrain/
curl --output sharded_bigbrain/info \
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit/info
At this point, sharded_bigbrain/info was edited to contain the desired sharding
specification. For a smaller scale test run, 20um and 40um scales can be
removed.

.. code-block:: diff
{
"type": "image",
"data_type": "uint8",
"num_channels": 1,
"scales": [
{
"chunk_sizes": [[64,64,64]],
"encoding": "raw",
"key": "20um",
"resolution": [21166.6666666666666, 20000, 21166.6666666666666],
"size": [6572, 7404, 5711],
- "voxel_offset": [0, 0, 0]
+ "voxel_offset": [0, 0, 0],
+ "sharding": {
+ "@type": "neuroglancer_uint64_sharded_v1",
+ "data_encoding": "gzip",
+ "hash": "identity",
+ "minishard_bits": 2,
+ "minishard_index_encoding": "gzip",
+ "preshift_bits": 0,
+ "shard_bits": 2
+ }
},
// ...truncated for brevity
]
}
Start the conversion process.

.. code-block:: sh
convert-chunks \
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit \
./sharded_bigbrain/
2 changes: 1 addition & 1 deletion docs/script-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ OUTSIDE_VALUE] volume_filename dest_url``.

You may want to use :ref:`convert-chunks <convert-chunks>` in a second step, to
further compres your dataset with JPEG or ``compressed_segmentation``
encoding).
encoding.


Converting image volumes
Expand Down
47 changes: 47 additions & 0 deletions docs/serving-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,50 @@ following Apache configuration (e.g. put it in a ``.htaccess`` file):
AddEncoding x-gzip .gz
AddType application/octet-stream .gz
</IfModule>
Serving sharded data
====================


Content-Encoding
----------------

Sharded data must be served without any `Content-Encoding header
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding>_`.


HTTP Range request
------------------

Sharded data must be served by a webserver that supports `Range header
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range>_`.

For development uses, python's bundled SimpleHTTPServer `does not support
this <https://github.com/python/cpython/issues/86809>_`. Recommended
alternatives are:

- `http-server (NodeJS)<https://www.npmjs.com/package/http-server>_`

- `RangeHTTPServer(Python) <https://github.com/danvk/RangeHTTPServer>_`

For production uses, most modern static web servers supports range requests.
The below is a list of web servers that were tested and works with sharded
volumes.

- nginx 1.25.3

- httpd 2.4.58

- caddy 2.7.5

In addition, most object storage also supports range requests without
additional configurations.


Enable Access-Control-Allow-Origin header
-----------------------------------------

`Access-Control-Allow-Origin
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin>_`
will need to be enabled if the volume is expected to be accessed cross origin.
5 changes: 1 addition & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
[build-system]
# We need support for entry_points in setup.cfg, which needs setuptools>=51.0.0
# according to the setuptools documentation. However, in my testing it works
# with version 50.3.2 which is the last to retain Python 3.5 compatibility.
requires = [
"setuptools>=50.3.2",
"setuptools>=51.0.0",
"wheel",
]
build-backend = "setuptools.build_meta"
56 changes: 54 additions & 2 deletions script_tests/test_scripts.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Copyright (c) CEA
# Copyright (c) 2018 Forschungszentrum Juelich GmbH
# Copyright (c) 2018, 2023, 2024 Forschungszentrum Juelich GmbH
# Copyright (c) 2018, 2023 CEA
#
# Author: Yann Leprince <y.leprince@fz-juelich.de>
# Author: Xiao Gui <xgui3783@gmail.com>
#
# This software is made available under the MIT licence, see LICENCE.txt.

Expand Down Expand Up @@ -97,6 +99,56 @@ def test_all_in_one_conversion(examples_dir, tmpdir):
# with --mmap / --load-full-volume


def test_sharded_conversion(examples_dir, tmpdir):
input_nifti = examples_dir / "JuBrain" / "colin27T1_seg.nii.gz"
# The file may be present but be a git-lfs pointer file, so we need to open
# it to make sure that it is the actual correct file.
try:
gzip.open(str(input_nifti)).read(348)
except OSError as exc:
pytest.skip("Cannot find a valid example file {0} for testing: {1}"
.format(input_nifti, exc))

output_dir = tmpdir / "colin27T1_seg_sharded"
assert subprocess.call([
"volume-to-precomputed",
"--generate-info",
"--sharding", "1,1,0",
str(input_nifti),
str(output_dir)
], env=env) == 4 # datatype not supported by neuroglancer

with open(output_dir / "info_fullres.json", "r") as fp:
fullres_info = json.load(fp=fp)
with open(output_dir / "info_fullres.json", "w") as fp:
fullres_info["data_type"] = "uint8"
json.dump(fullres_info, fp=fp, indent="\t")

assert subprocess.call([
"generate-scales-info",
str(output_dir / "info_fullres.json"),
str(output_dir)
], env=env) == 0
assert subprocess.call([
"volume-to-precomputed",
"--sharding", "1,1,0",
str(input_nifti),
str(output_dir)
], env=env) == 0
assert subprocess.call([
"compute-scales",
"--downscaling-method=stride", # for test speed
str(output_dir)
], env=env) == 0

all_files = [f"{dirpath}/{filename}" for dirpath, _, filenames
in os.walk(output_dir)
for filename in filenames]

assert len(all_files) == 7, ("Expecting 7 files, but got "
f"{len(all_files)}.\n{all_files}")


def test_slice_conversion(tmpdir):
# Prepare dummy slices
path_to_slices = tmpdir / "slices"
Expand Down
3 changes: 1 addition & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ classifiers =
Intended Audience :: Science/Research
License :: OSI Approved :: MIT License
Programming Language :: Python :: 3
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Expand All @@ -28,7 +27,7 @@ keywords = neuroimaging
package_dir =
= src
packages = find:
python_requires = ~=3.5
python_requires = ~=3.6
install_requires =
nibabel >= 2
numpy >= 1.11.0
Expand Down
53 changes: 49 additions & 4 deletions src/neuroglancer_scripts/accessor.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Copyright (c) 2018 Forschungszentrum Juelich GmbH
# Copyright (c) 2018, 2023 Forschungszentrum Juelich GmbH
#
# Author: Yann Leprince <y.leprince@fz-juelich.de>
# Author: Xiao Gui <xgui3783@gmail.com>
#
# This software is made available under the MIT licence, see LICENCE.txt.

Expand All @@ -10,6 +12,7 @@
"""

import urllib.parse
import json

__all__ = [
"get_accessor_for_url",
Expand All @@ -35,15 +38,57 @@ def get_accessor_for_url(url, accessor_options={}):
r = urllib.parse.urlsplit(url)
if r.scheme in ("", "file"):
from neuroglancer_scripts import file_accessor
from neuroglancer_scripts import sharded_base
flat = accessor_options.get("flat", False)
gzip = accessor_options.get("gzip", True)
compresslevel = accessor_options.get("compresslevel", 9)
pathname = _convert_split_file_url_to_pathname(r)
return file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
compresslevel=compresslevel)

accessor = file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
compresslevel=compresslevel)
is_sharding = False
if accessor_options.get("sharding"):
is_sharding = True
if not is_sharding:
try:
info = json.loads(accessor.fetch_file("info"))
if sharded_base.ShardedAccessorBase.info_is_sharded(info):
is_sharding = True
except (DataAccessError, json.JSONDecodeError):
# In the event that info does not exist
# Or info is malformed
# Fallback to default behavior
...

if is_sharding:
from neuroglancer_scripts import sharded_file_accessor
return sharded_file_accessor.ShardedFileAccessor(pathname)

return accessor

elif r.scheme in ("http", "https"):
from neuroglancer_scripts import http_accessor
return http_accessor.HttpAccessor(url)
from neuroglancer_scripts import sharded_base
accessor = http_accessor.HttpAccessor(url)

is_sharding = False
if "sharding" in accessor_options:
is_sharding = True
if not is_sharding:
try:
info = json.loads(accessor.fetch_file("info"))
if sharded_base.ShardedAccessorBase.info_is_sharded(info):
is_sharding = True
except (DataAccessError, json.JSONDecodeError):
# In the event that info does not exist
# Or info is malformed
# Fallback to default behavior
...

if is_sharding:
from neuroglancer_scripts import sharded_http_accessor
return sharded_http_accessor.ShardedHttpAccessor(url)
return accessor
else:
raise URLError("Unsupported URL scheme {0} (must be file, http, or "
"https)".format(r.scheme))
Expand Down
4 changes: 4 additions & 0 deletions src/neuroglancer_scripts/dyadic_pyramid.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,10 +148,14 @@ def downscale_info(scale_level):


def compute_dyadic_scales(precomputed_io, downscaler):
from neuroglancer_scripts import sharded_file_accessor
for i in range(len(precomputed_io.info["scales"]) - 1):
compute_dyadic_downscaling(
precomputed_io.info, i, downscaler, precomputed_io, precomputed_io
)
if isinstance(precomputed_io.accessor,
sharded_file_accessor.ShardedFileAccessor):
precomputed_io.accessor.close()


def compute_dyadic_downscaling(info, source_scale_index, downscaler,
Expand Down
Loading

0 comments on commit 0205370

Please sign in to comment.