-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest NLCD 2016 Layers into Geoprocessing Service #3393
Comments
These are the files in the download:
The data is in the ERDAS IMAGINE file format:
|
Notably, we should be using https://www.mrlc.gov/data/nlcd-land-cover-conus-all-years which has the 2016 edition of data for all years, since we also need to ingest the 2016 version of NLCD 2011, and potentially the 2016 version of NLCD 2006 and NLCD 2001 as well. |
As can be seen in the $ gdal_translate NLCD_2016_Land_Cover_L48_20190424.img nlcd-2016-mrls.tif
$ gdalwarp -t_srs 'EPSG:5070' nlcd-2016-mrls.tif nlcd-2016-mrls-5070.tif Then, to verify that this transformation did not result in any loss of data, I cropped a small portion of both the GeoTiff in the original projection to the Delaware River Basin, using the DRB GeoJSON we use in the app: $ gdalwarp -cutline drb.geojson -crop_to_cutline nlcd-2016-mrls.tif nlcd-2016-mrls-cropped.tif
$ gdalwarp -cutline drb.geojson -crop_to_cutline nlcd-2016-mrls-5070.tif nlcd-2016-mrls-5070-cropped.tif Then, I reprojected each cropped GeoTiff, one in the original projection and one EPSG:5070, to EPSG:3857: $ gdalwarp -t_srs 'EPSG:3857' nlcd-2016-mrls-cropped.tif nlcd-2016-mrls-cropped-3857.tif
$ gdalwarp -t_srs 'EPSG:3857' nlcd-2016-mrls-5070-cropped.tif nlcd-2016-mrls-5070-cropped-3857.tif The test being that if there are differences in the final EPSG:3857 files, then that would indicate data loss going from the original to EPSG:5070. For a raster diff, I performed a subtract operation in QGIS: And ran statistics and histogram for the diff: $ gdalinfo -stats -hist nlcd-2016-mrls-5070-cropped-3857-diff.tif
...
256 buckets from -47.0961 to 2.09608:
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 139490096 0 0 0 0 0 0 0 0 0 1
NoData Value=-3.4028234663852886e+38
Metadata:
STATISTICS_MAXIMUM=2
STATISTICS_MEAN=-3.4411044224625e-07
STATISTICS_MINIMUM=-47
STATISTICS_STDDEV=0.0039875769431737
STATISTICS_VALID_PERCENT=100 This revealed that the highest value of the diff was I also hexdumped both files and ran a text diff on it: $ hexdump nlcd-2016-mrls-cropped-3857.tif > nlcd-2016-mrls-cropped-3857.tif.hexdump
$ hexdump nlcd-2016-mrls-5070-cropped-3857.tif > nlcd-2016-mrls-5070-cropped-3857.tif.hexdump
$ diff -u nlcd-2016-mrls-cropped-3857.tif.hexdump nlcd-2016-mrls-5070-cropped-3857.tif.hexdump Which showed me a total difference of 64 bytes out of 126 MB, making the files virtually identical: --- nlcd-2016-mrls-cropped-3857.tif.hexdump 2021-06-15 10:55:46.000000000 -0400
+++ nlcd-2016-mrls-5070-cropped-3857.tif.hexdump 2021-06-15 10:57:16.000000000 -0400
@@ -3869,10 +3869,10 @@
0016bc0 6f 6c 65 3d 22 64 65 73 63 72 69 70 74 69 6f 6e
0016bd0 22 3e 4c 61 79 65 72 5f 31 3c 2f 49 74 65 6d 3e
0016be0 0a 3c 2f 47 44 41 4c 4d 65 74 61 64 61 74 61 3e
-0016bf0 0a 00 5b 8b f8 6e 5e aa 43 40 5b 8b f8 6e 5e aa
+0016bf0 0a 00 e0 89 f9 6e 5e aa 43 40 e0 89 f9 6e 5e aa
0016c00 43 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0016c10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0016c20 00 00 f3 7c 69 6b f4 51 60 c1 fc 56 df 7a f6 10
+0016c20 00 00 85 dc 69 6b f4 51 60 c1 12 77 df 7a f6 10
0016c30 54 41 00 00 00 00 00 00 00 00 01 00 01 00 00 00
0016c40 07 00 00 04 00 00 01 00 01 00 01 04 00 00 01 00
0016c50 01 00 02 04 b1 87 19 00 00 00 01 08 b1 87 07 00
@@ -114161,7 +114161,7 @@
0a0c300 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
*
0a0c320 29 29 29 29 29 29 29 29 29 29 29 29 29 29 2b 2b
-0a0c330 2b 2b 2b 2b 2b 2b 2b 2b 2b 29 29 29 29 29 29 29
+0a0c330 2b 2b 2b 2b 2b 2b 2b 2b 29 29 29 29 29 29 29 29
0a0c340 15 15 29 29 29 51 51 51 51 51 51 51 29 29 29 29
0a0c350 29 29 29 29 29 2b 2b 2b 29 29 29 29 29 29 29 29
0a0c360 29 29 29 2b 2b 2b 2a 2a 2b 2b 51 51 51 51 51 51
@@ -211393,7 +211393,7 @@
0ed4610 29 29 29 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b
0ed4620 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b
0ed4630 2b 2b 2b 2b 2b 2b 2b 29 29 29 29 29 29 29 29 29
-0ed4640 29 29 2b 29 29 2b 29 29 29 29 29 29 29 29 2b 2b
+0ed4640 29 29 2b 29 2b 2b 29 29 29 29 29 29 29 29 2b 2b
0ed4650 2b 29 29 29 29 29 29 2b 2b 2b 2b 2b 2b 2b 2b 2b
0ed4660 29 2b 29 2b 2b 29 2b 2b 2b 2b 2b 2b 2b 2b 29 29
0ed4670 2b 29 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 29 29
@@ -321140,7 +321140,7 @@
14d35e0 29 29 29 29 29 29 29 29 29 29 29 29 29 29 2b 2b
14d35f0 2b 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
14d3600 29 29 29 29 2b 2b 2b 2b 2a 2b 2b 2b 2b 29 29 15
-14d3610 2b 2a 2b 2b 34 2b 2b 2b 2b 2b 2a 2b 2b 5a 2a 2a
+14d3610 2b 2a 2b 2b 34 2b 2b 2b 2b 2b 2a 2b 5a 5a 2a 2a
14d3620 2b 2b 29 29 29 29 2b 29 29 29 29 2b 2b 2a 2a 2a
14d3630 2b 29 2b 2b 29 29 29 29 29 2b 2b 29 29 29 29 29
14d3640 29 2b 2b 2b 15 15 2b 2b 34 47 47 2a 2b 2b 29 29
@@ -492760,7 +492760,7 @@
1e0b650 2b 2a 2a 2b 2b 2b 2b 2a 2a 2a 15 15 2b 2b 2a 2a
1e0b660 2a 2b 2b 2b 2b 2a 2a 2b 2a 15 2b 2b 2b 29 2b 29
1e0b670 29 2b 29 2b 2b 2b 2b 29 29 29 2b 2b 29 2a 2a 2a
-1e0b680 2a 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2a
+1e0b680 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2a
1e0b690 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b
1e0b6a0 2b 2b 2a 2a 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 29 29
1e0b6b0 15 15 2b 29 2b 2b 2b 2b 2b 2b 2b 15 2a 2a 2a 0b With this verification, I am satisfied that using |
In order to get our ETL tool working again, I had to install Scala 2.11 and Spark v2.xx. Also, I had to ensure that both |
I decided to paint the visual tiles via GDAL, since it is more reliable and does not create the artifacts with EPSG:5070 that GeoTrellis used to. This has since been fixed in more recent versions of GeoTrellis, but upgrading the ETL pipeline would be a large effort. I made a t3.medium EC2 instance in the #!/bin/sh
set -ex
# Example Usage: ./make-tiles.sh 2016
# Unzip the relevant files for the given year only
unzip NLCD_Land_Cover_L48_20190424_full_zip.zip NLCD_$1*
# Reproject to EPSG:3857 (Web Mercator), the projection used on web maps
docker run -v $PWD:/data -w /data --rm -ti osgeo/gdal:alpine-normal-3.3.0 gdalwarp -t_srs 'EPSG:3857' NLCD_$1_Land_Cover_L48_20190424.img nlcd-$1-30m.img
# Convert to RGBA
docker run -v $PWD:/data -w /data --rm -ti osgeo/gdal:alpine-normal-3.3.0 gdal_translate -of vrt -expand rgba nlcd-$1-30m.img nlcd-$1-30m.vrt
# Paint tiles, from zoom levels 0-13
mkdir nlcd-$1-30m
docker run -v $PWD:/data -w /data --rm -ti osgeo/gdal:alpine-normal-3.3.0 gdal2tiles.py --zoom=0-13 --resampling=near --resume --xyz nlcd-$1-30m.vrt nlcd-$1-30m/
# Upload tiles to S3
aws s3 sync --exclude "*.aux.xml" --acl public-read nlcd-$1-30m/ s3://tiles.us-east-1.azavea.com/nlcd-$1-30m/
# Delete all working files
# NOTE: This needs to be sudo because the tiles created from within docker are owned by `root`
sudo rm -rf NLCD_$1_Land_Cover_L48* nlcd-$1-30m* Note: I went with running GDAL via Docker because it was hard to find a recent enough native version for Ubuntu that had the latest version of Then, I ran the script for each year: $ ./make-tiles.sh 2016
$ ./make-tiles.sh 2011
$ ./make-tiles.sh 2006
$ ./make-tiles.sh 2001 I ran these scripts inside a |
Using the work in this branch: https://github.com/azavea/civic-apps-etl/pull/15, I created a new EC2 instance with a c4.2xlarge configuration, with 8 CPU cores and 16GB of RAM, and installed Scala 2.11 and Spark 2.4.8 and jq and awscli and docker.io on it, downloaded the NLCD full zip, then ran the following script: #!/bin/bash
set -ex
# Unzip the relevant files for the given year only
unzip NLCD_Land_Cover_L48_20190424_full_zip.zip NLCD_$1*
# Reproject to EPSG:5070 (Conus Albers), the projection we use for analysis in MMW
docker run -v $PWD:/data -w /data --rm -ti osgeo/gdal:alpine-normal-3.3.0 gdalwarp -t_srs 'EPSG:5070' NLCD_$1_Land_Cover_L48_20190424.img dev/ca-etl/local-test-data/nlcd-$1-30m-epsg5070.tif
# Run the ingest
pushd dev/ca-etl
# make mmw-create-jar # I had made the JAR on my local and copied it to the server to skip this step
make YEAR=$1 mmw-ingest-nlcd
popd
# Delete all working files
rm -rf NLCD_$1_Land_Cover* dev/ca-etl/local-test-data/* dev/ca-etl/local-test-catalogs/* Then ran it with:
|
The RDDs have successfully been ingested and are available in the datahub:
As are the tiles:
|
From the MRLC site: https://www.mrlc.gov/data/nlcd-2016-land-cover-conus, download and ingest the NLCD 2016 data into the MMW Geoprocessing Service. The data should be ingested into EPSG:5070 Conus / Albers projection. This should be a data ingest for calculations, as well as a tiled image ingest for visualization.
The tasks for this card include:
The text was updated successfully, but these errors were encountered: