Signification modification to unittests for the water-atcor-develop b…

…ranch (#115) * changed to include resampling of imagery and simple naming convention (grid name) in as_geo_docs() * modified normalise_band_name() to account for Sentinel-2's 8A band * extensive modifications to perform MNDWI masking and work-around for Sentinel-2's absent level-1 metadata yaml file * mndwi h5 file added for the integration tests * changes to water-atcor packaging implemented to eodatasets3/scripts/packagewagl.py * changes to water-atcor packaging applied to tests/integration/test_packagewagl.py * black formatting to eodatasets3/images.py * black formatting to eodatasets3/scripts/packagewagl.py * black formatting to eodatasets3/scripts/tostac.py * black formatting to eodatasets3/wagl.py * black formatting to tests/integration/test_packagewagl.py * reformated eodatasets3/images.py with latest version of black * reformated eodatasets3/scripts/tostac.py with latest version of black * reformatted eodatasets3/wagl.py with latest version of black * removed LC08_L1TP_092084_20160628_20170323_01_T1.yaml * removed LC80920842016180LGN01.fmask.img-luigi-tmp-8404334440.aux.xm * removed LC80920842016180LGN01.fmask.yaml * removed LC80920842016180LGN01.gqa.yaml * removed LC80920842016180LGN01.mndwi.h5 * removed LC80920842016180LGN01.tesp.yaml * removed LC80920842016180LGN01.wagl.h5 * added LC08_L1TP_091086_20141106_20170417_01_T1.odc-metadata.yaml * added LC80920842016180LGN01.fmask.img * added LC80920842016180LGN01.fmask.img-luigi-tmp-0242778241.aux.xml * added LC80920842016180LGN01.fmask.tmp..img.aux.xml * added LC80910862014310LGN01.fmask.yaml * added LC80910862014310LGN01.gqa.yaml * added LC80910862014310LGN01.mndwi.h5 * added LC80910862014310LGN01.tesp.yaml * added LC80910862014310LGN01.wagl.h5 * added LC80910862014310LGN01.fmask.img * added LC80910862014310LGN01.fmask.img-luigi-tmp-1118760197.aux.xml * added LC80910862014310LGN01.fmask.tmp..img.aux.xml * modification: naming of grids back to default for common, and RES_XX for others * modification: packaging folder hierarchy to include an additional time folder for S2A/B * modification: to package wagl-water-atcor products * modification: infer_datetime_range set to True * modification: added sh module in tests_require * modification: naming of grids * modification: code cleanup and include mndwi.h5 downsampling * modification: size_bytes to current value - fudged as this isn't used in wagl-water-atcor packaging * modification: accommodate new path names and S2A/B folder hierarchy * modification: packaging of wagl-water-atcor lambertian product * formatting: flake8 and black * formatting: flake8 and black * formatting: flake8 and black * formatting: flake8 and black * formatting: flake8 and black * Mention release process in readme And minor doc reword + correction * Update DEA naming for non-final products As requested by the C3 ARD people: interim datasets go in a different folder than final datasets. This is implemented as an expansion to the existing 'DEA' naming conventions rather than a new naming convention as it doesn't modify existing (final) products. * Don't Error if no h5py when generating thumbnails * Add bits'n'bobs to test with `make docker-test` * Allow inheriting geometry from source datasets Useful when it's potentially very expensive to do all the vector operations to determine the valid region from the data itself. * Add in a lookuptable thumbnail function * Black formatting * Fix bug in image renaming * Fix another type annotation * Resolve Jeremy's comments * Fix silly bug * Black, it's always black * Fix the makefile to rebuild before tests * Update tests, data type for singleband writer * Tighten the public stac conversion methods, expand them * Add WKT2 fallback for EO3 CRSes * Only declare stac extensions that we use * Move new stac api into a library, restore old api The old dc_to_stac function is used externally so we need to keep a shim of it for backwards compatibilty. * Clearer properties, docs, for stac api * More robust Stac schema downloads - Use one session for all downloads - Use a timeout - Keep cached items much longer when they're stable Stac versions. - Allow skipping of the cache - Allow no-network-access usage - Verbose option for users * Use stac location for measurements in legacy eo3-to-stac * Fix Travis' Python 3.6 dependencies. Cattrs removed 3.6 support, yet it's still trying to install the new version. * Consistent float handling in doc comparison * Improve, clarify stac conversion docs * Move rapidjson from test to install requires * removed LC80920842016180LGN01.mndwi.h5 * resolved flake8 issues tests/integration/test_packagewagl.py * added hdf5-tools to .travis.yml * changed filesize to pass travis-ci build. This file/test is unused in water-atcor-develop Co-authored-by: Rodrigo Garcia <rg3290@gadi-login-08.gadi.nci.org.au> Co-authored-by: Rodrigo Garcia <rg3290@gadi-login-04.gadi.nci.org.au> Co-authored-by: Jeremy Hooke <jez@stulk.com> Co-authored-by: Damien Ayers <damien@omad.net> Co-authored-by: Alex Leith <alexgleith@gmail.com> Co-authored-by: Rodrigo Garcia <rg3290@gadi-login-06.gadi.nci.org.au> Co-authored-by: Rodrigo Garcia <rg3290@gadi-login-09.gadi.nci.org.au>
opendatacube · Nov 13, 2020 · 9d8f984 · 9d8f984
1 parent b08eade
commit 9d8f984
Show file tree

Hide file tree

Showing 45 changed files with 1,266 additions and 681 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -11,6 +11,7 @@ addons:
   apt:
     packages:
     - gdal-bin
+    - hdf5-tools
     - gfortran
     - libatlas-base-dev
     - libatlas-dev
@@ -25,6 +26,8 @@ install:
 - travis_retry pip install --upgrade pytest pytest-cov coveralls GDAL==1.10.0 rasterio[s3] 'scipy<1.5.0' pandas==1.0.5
   # flake8 and black versions should match .pre-commit-config.yaml
 - travis_retry pip install flake8==3.8.2 black==20.8b1
+  # Cattrs removed Python 3.6 support in 1.1.0
+- travis_retry pip install cattrs==1.0.0
 - travis_retry pip install -e .[test]
 - pip freeze
   # Either both set or none. See: https://github.com/mapbox/rasterio/issues/1494

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,52 @@
+FROM opendatacube/geobase:wheels as env_builder
+ARG py_env_path=/env
+ARG ENVIRONMENT=test
+
+COPY requirements*.txt /tmp/
+# RUN env-build-tool new /tmp/requirements.txt ${py_env_path}
+RUN if [ "$ENVIRONMENT" = "test" ] ; then \
+        env-build-tool new /tmp/requirements-test.txt ${py_env_path} ; \
+    else \
+        env-build-tool new /tmp/requirements.txt ${py_env_path} ; \
+    fi
+
+ENV PATH=${py_env_path}/bin:$PATH
+
+# Copy source code and install it
+RUN mkdir -p /code
+WORKDIR /code
+ADD . /code
+
+RUN pip install --use-feature=2020-resolver .
+
+# Build the production runner stage from here
+FROM opendatacube/geobase:runner
+
+ENV LC_ALL=C.UTF-8 \
+    DEBIAN_FRONTEND=noninteractive \
+    SHELL=bash
+
+COPY --from=env_builder /env /env
+ENV PATH=/env/bin:$PATH
+
+#  # Environment can be whatever is supported by setup.py
+#  # so, either deployment, test
+#  ARG ENVIRONMENT=test
+#  RUN echo "Environment is: $ENVIRONMENT"
+#
+#  # Set up a nice workdir, and only copy the things we care about in
+#  ENV APPDIR=/code
+#  RUN mkdir -p $APPDIR
+#  WORKDIR $APPDIR
+#  ADD . $APPDIR
+#
+#  # These ENVIRONMENT flags make this a bit complex, but basically, if we are in dev
+#  # then we want to link the source (with the -e flag) and if we're in prod, we
+#  # want to delete the stuff in the /code folder to keep it simple.
+#  RUN if [ "$ENVIRONMENT" = "deployment" ] ; then rm -rf $APPDIR ; \
+#      else pip install --editable .[$ENVIRONMENT] ; \
+#      fi
+
+RUN python
+
+CMD ["python"]
diff --git a/Makefile b/Makefile
@@ -0,0 +1,6 @@
+
+.PHONY: docker-tests
+
+docker-tests:
+	docker build -t eodatasets:test .
+	docker run -it --rm --volume "${PWD}/tests":/tests eodatasets:test pytest --cov eodatasets --durations=5 /tests
diff --git a/README.md b/README.md
@@ -189,3 +189,23 @@ Some preparers need the ancillary dependencies: `pip install .[ancillary]`
       --with-oa / --no-oa             Include observation attributes (default:
                                       true)
       --help                          Show this message and exit.
+
+
+## Creating Releases
+
+```
+git fetch origin
+
+# Create a tag for the new version
+git tag eodatasets3-<version> origin/eodatasets3
+
+# Push it to main repository
+git push origin --tags
+
+# Create a wheel locally
+python3 setup.py sdist bdist_wheel
+
+# Upload it (Jeremy, Damien, Kirill have pypi ownership) 
+python3 -m twine upload  dist/*
+
+```
diff --git a/docs/index.rst b/docs/index.rst
@@ -66,7 +66,7 @@ the provenance, and the assembler can optionally copy any common metadata automa
       # Set our product information.
       # It's a GA product of "numerus-unus" ("the number one").
       p.producer = "ga.gov.au"
-      p.product_family = "blues"
+      p.product_family = "numerus-unus"
       p.dataset_version = "3.0.0"
 
       ...
@@ -116,10 +116,11 @@ of the current image::
       ...
 
 Note that the assembler will throw an error if the path lives outside
-the dataset (location), as they will be absolute rather than relative paths.
+the dataset (location), as this will require absolute paths.
 Relative paths are considered best-practice for Open Data Cube.
 
-You can allow absolute paths with a field on assembler construction :meth:`eodatasets3.DatasetAssembler.__init__`::
+You can allow absolute paths with a field on assembler construction
+:meth:`eodatasets3.DatasetAssembler.__init__`::
 
    with DatasetAssembler(
       dataset_location=usgs_level1,

diff --git a/eodatasets3/assemble.py b/eodatasets3/assemble.py
@@ -163,6 +163,7 @@ def __init__(
         self._user_metadata = dict()
         self._software_versions: List[Dict] = []
         self._lineage: Dict[str, List[uuid.UUID]] = defaultdict(list)
+        self._inherited_geometry = None
 
         if naming_conventions == "default":
             self.names = ComplicatedNamingConventions(self)
@@ -211,6 +212,12 @@ def _work_path(self) -> Path:
     def properties(self) -> StacPropertyView:
         return self._props
 
+    @property
+    def measurements(self) -> Dict[str, Tuple[GridSpec, Path]]:
+        return dict(
+            (name, (grid, path)) for grid, name, path in self._measurements.iter_paths()
+        )
+
     @property
     def label(self) -> Optional[str]:
         """
@@ -317,6 +324,7 @@ def add_source_dataset(
         dataset: DatasetDoc,
         classifier: Optional[str] = None,
         auto_inherit_properties: bool = False,
+        inherit_geometry: bool = False,
     ):
         """
         Record a source dataset using its metadata document.
@@ -335,6 +343,9 @@ def add_source_dataset(
                            are used for different purposes. Such as having a second level1 dataset
                            that was used for QA (but is not this same scene).
 
+        :param inherit_geometry: Instead of re-calculating the valid bounds geometry based on the
+                            data, which can be very computationally expensive e.g. Landsat 7
+                            striped data, use the valid data geometry from this source dataset.
 
         See :func:`add_source_path` if you have a filepath reference instead of a document.
 
@@ -353,6 +364,8 @@ def add_source_dataset(
         self._lineage[classifier].append(dataset.id)
         if auto_inherit_properties:
             self._inherit_properties_from(dataset)
+        if inherit_geometry:
+            self._inherited_geometry = dataset.geometry
 
     def _inherit_properties_from(self, source_dataset: DatasetDoc):
         for name in self.INHERITABLE_PROPERTIES:
@@ -669,7 +682,10 @@ def done(
         if measurement_docs and sort_measurements:
             measurement_docs = dict(sorted(measurement_docs.items()))
 
-        valid_data = self._measurements.consume_and_get_valid_data()
+        if self._inherited_geometry:
+            valid_data = self._inherited_geometry
+        else:
+            valid_data = self._measurements.consume_and_get_valid_data()
         # Avoid the messiness of different empty collection types.
         # (to have a non-null geometry we'd also need non-null grids and crses)
         if valid_data.is_empty:
@@ -782,6 +798,14 @@ def done(
     def _crs_str(self, crs: CRS) -> str:
         return f"epsg:{crs.to_epsg()}" if crs.is_epsg_code else crs.to_wkt()
 
+    def _document_thumbnail(self, thumb_path, kind=None):
+        self._checksum.add_file(thumb_path)
+
+        accessory_name = "thumbnail"
+        if kind:
+            accessory_name += f":{kind}"
+        self.add_accessory_file(accessory_name, thumb_path)
+
     def write_thumbnail(
         self,
         red: str,
@@ -815,21 +839,18 @@ def write_thumbnail(
         :param static_stretch: Use a static upper/lower value to stretch by instead of dynamic stretch.
         """
         thumb_path = self.names.thumbnail_name(self._work_path, kind=kind)
-        measurements = dict(
-            (name, (grid, path)) for grid, name, path in self._measurements.iter_paths()
-        )
 
-        missing_measurements = {red, green, blue} - set(measurements)
+        missing_measurements = {red, green, blue} - set(self.measurements)
         if missing_measurements:
             raise IncompleteDatasetError(
                 ValidationMessage(
                     Level.error,
                     "missing_thumb_measurements",
                     f"Thumbnail measurements are missing: no measurements called {missing_measurements!r}. ",
-                    hint=f"Available measurements: {', '.join(measurements)}",
+                    hint=f"Available measurements: {', '.join(self.measurements)}",
                 )
             )
-        rgbs = [measurements[b] for b in (red, green, blue)]
+        rgbs = [self.measurements[b] for b in (red, green, blue)]
         unique_grids: List[GridSpec] = list(set(grid for grid, path in rgbs))
         if len(unique_grids) != 1:
             raise NotImplementedError(
@@ -846,12 +867,51 @@ def write_thumbnail(
             percentile_stretch=percentile_stretch,
             input_geobox=grid,
         )
-        self._checksum.add_file(thumb_path)
 
-        accessory_name = "thumbnail"
-        if kind:
-            accessory_name += f":{kind}"
-        self.add_accessory_file(accessory_name, thumb_path)
+        self._document_thumbnail(thumb_path, kind)
+
+    def write_thumbnail_singleband(
+        self,
+        measurement: str,
+        bit: int = None,
+        lookup_table: Dict[int, Tuple[int, int, int]] = None,
+        kind: str = None,
+    ):
+        """
+        Write a singleband thumbnail out, taking in an input measurement and
+        outputting a JPG with appropriate settings.
+
+        Options are to
+        EITHER
+        Use a bit (int) as the value to scale from black to white to
+        i.e., 0 will be BLACK and bit will be WHITE, with a linear scale between.
+        OR
+        Provide a lookuptable (dict) of int (key) [R, G, B] (value) fields
+        to make the image with.
+        """
+
+        thumb_path = self.names.thumbnail_name(self._work_path, kind=kind)
+
+        _, image_path = self.measurements.get(measurement, (None, None))
+
+        if image_path is None:
+            raise IncompleteDatasetError(
+                ValidationMessage(
+                    Level.error,
+                    "missing_thumb_measurement",
+                    f"Thumbnail measurement is missing: no measurements called {measurement!r}. ",
+                    hint=f"Available measurements: {', '.join(self.measurements)}",
+                )
+            )
+
+        FileWrite().create_thumbnail_singleband(
+            image_path,
+            thumb_path,
+            bit,
+            lookup_table,
+        )
+
+        self._document_thumbnail(thumb_path, kind)
 
     def add_accessory_file(self, name: str, path: Path):
         """

diff --git a/eodatasets3/images.py b/eodatasets3/images.py
@@ -55,6 +55,8 @@ class GridSpec:
 
     @classmethod
     def from_dataset_doc(cls, ds: DatasetDoc, grid="default") -> "GridSpec":
+
+        print(list(ds.grids))
         g = ds.grids[grid]
 
         if ds.crs.startswith("epsg:"):
@@ -272,8 +274,15 @@ def as_geo_docs(self) -> Tuple[CRS, Dict[str, GridDoc], Dict[str, MeasurementDoc
                     f"\t{grid.crs.to_string()!r}\n"
                 )
 
-            # create a simple name for the each resolution groups
-            grid_name = "RES_{0}m".format(int(grid.transform.a))
+            if i == 0:
+                # as stated above, grids have been ordered from most
+                # (i=0) to fewest (i>0) measurements. The grid with
+                # the most measurements will be set as "default"
+                grid_name = "default"
+
+            else:
+                # create a simple name for the each resolution groups
+                grid_name = "RES_{0}m".format(int(grid.transform.a))
 
             grid_docs[grid_name] = GridDoc(grid.shape, grid.transform)
 
@@ -548,7 +557,7 @@ def write_from_ndarray(
             """
             with rasterio.open(unstructured_image, "w", **rio_args) as outds:
                 if bands == 1:
-                    if isinstance(array, h5py.Dataset):
+                    if h5py is not None and isinstance(array, h5py.Dataset):
                         for tile in tiles:
                             idx = (
                                 slice(tile[0][0], tile[0][1]),
@@ -558,7 +567,7 @@ def write_from_ndarray(
                     else:
                         outds.write(array, 1)
                 else:
-                    if isinstance(array, h5py.Dataset):
+                    if h5py is not None and isinstance(array, h5py.Dataset):
                         for tile in tiles:
                             idx = (
                                 slice(tile[0][0], tile[0][1]),
@@ -687,6 +696,10 @@ def create_thumbnail_singleband(
             raise ValueError(
                 "Please set either bit or lookup_table, and not both of them"
             )
+        if bit is None and lookup_table is None:
+            raise ValueError(
+                "Please set either bit or lookup_table, you haven't set either of them"
+            )
 
         with rasterio.open(in_file) as dataset:
             data = dataset.read()

diff --git a/eodatasets3/model.py b/eodatasets3/model.py
@@ -225,7 +225,9 @@ def dataset_label(self) -> str:
     def destination_folder(self, base: Path):
         self._check_enough_properties_to_name()
         # DEA naming conventions folder hierarchy.
-        # Example: "ga_ls8c_ard_3/092/084/2016/06/28"
+        # Examples:
+        # For L8:    "ga_ls8c_aard_3/092/084/2016/06/28"
+        # For S2A/B: "ga_s2bm_aard_2/55/KDT/2016/06/28/003241"
 
         parts = [self.product_name]
 
@@ -234,7 +236,21 @@ def destination_folder(self, base: Path):
         if region_code:
             parts.extend(utils.subfolderise(region_code))
 
-        parts.extend(f"{self.dataset.datetime:%Y/%m/%d}".split("/"))
+        if self.dataset.platform:
+            # added to pass test_assemble.py, where self.dataset.platform = None
+            if self.dataset.platform.startswith("sentinel-2"):
+                # modified output dir so to include HHMMSS to account for
+                # multiple acquisitions per day
+                parts.extend(f"{self.dataset.datetime:%Y/%m/%d/%H%M%S}".split("/"))
+            else:
+                parts.extend(f"{self.dataset.datetime:%Y/%m/%d}".split("/"))
+        else:
+            parts.extend(f"{self.dataset.datetime:%Y/%m/%d}".split("/"))
+
+        # If it's not a final product, append the maturity to the folder.
+        maturity: str = self.dataset.properties.get("dea:dataset_maturity")
+        if maturity and maturity != "final":
+            parts[-1] = f"{parts[-1]}_{maturity}"
 
         if self.dataset_separator_field is not None:
             val = self.dataset.properties[self.dataset_separator_field]

diff --git a/eodatasets3/scripts/packagewagl.py b/eodatasets3/scripts/packagewagl.py
@@ -53,7 +53,8 @@ def run(
         products = set(p.lower() for p in products)
     else:
         # products = wagl.DEFAULT_PRODUCTS
-        products = "lambertian"
+        products = ["lambertian"]
+
     with rasterio.Env():
         for granule in wagl.Granule.for_path(h5_file, level1_metadata_path=level1):
             with wagl.do(