### This notebook is to explore and develop functionality to output our downsampled imagery as COGeoTIFF

In [41]:
import os
import sys
from pathlib import Path


module_path = os.path.abspath(os.path.join('..')) # add the projects src code directory to the sys path to allow finding our module
if module_path not in sys.path:
    sys.path.append(module_path+"\\src\\data")
    #sys.path.insert(0, module_path+"\\src\\data")
    
import downSampler # load pipeline module

In [42]:
# other useful libraries

import json
import h5py
import gdal

#### Lets look at our generated file

In [43]:
input_data_path = Path(os.getcwd()).parents[0] / 'data' / 'pipeline_input' # path to input data files
output_data_path = Path(os.getcwd()).parents[0] / 'data' / 'pipeline_output' # path to input data files

In [44]:
input_file_dict  = downSampler.find_files(input_data_path)
input_file_dict

Finding data files...


{'NEON_D16_ABBY_DP3_552000_5071000_reflectance.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_input\\NEON.D16.ABBY.DP3.30006.001.2019-07.basic.20210211T072435Z.RELEASE-2021\\NEON_D16_ABBY_DP3_552000_5071000_reflectance.h5',
 'NEON_D16_ABBY_DP3_552000_5072000_reflectance.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_input\\NEON.D16.ABBY.DP3.30006.001.2019-07.basic.20210211T072435Z.RELEASE-2021\\NEON_D16_ABBY_DP3_552000_5072000_reflectance.h5',
 'NEON_D16_ABBY_DP3_552000_5073000_reflectance.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_input\\NEON.D16.ABBY.DP3.30006.001.2019-07.basic.20210211T072435Z.RELEASE-2021\\NEON_D16_ABBY_DP3_552000_5073000_reflectance.h5',
 'NEON_D16_ABBY_DP3_552000_5074000_reflectance.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_input\\NEON.D16.ABBY.DP3.30006.001.2019-07.basic.20210211T072435Z.

In [45]:
output_file_dict = downSampler.find_files(output_data_path)
output_file_dict

Finding data files...


{'TEST_output_1.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_output\\TEST_output_1.h5',
 'TEST_output_Geo1.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_output\\TEST_output_Geo1.h5',
 'Wyvern_D10_RMNP_DP3_459000_4448000_2017_reflectance_downsampled_4mGSD.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_output\\hdf5_downsampled\\Wyvern_D10_RMNP_DP3_459000_4448000_2017_reflectance_downsampled_4mGSD.h5',
 'Wyvern_D10_RMNP_DP3_459000_4448000_2018_reflectance_downsampled_4mGSD.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_output\\hdf5_downsampled\\Wyvern_D10_RMNP_DP3_459000_4448000_2018_reflectance_downsampled_4mGSD.h5',
 'Wyvern_D16_ABBY_DP3_552000_5071000_reflectance_downsampled_4mGSD.h5': 'C:\\Users\\Chris\\Documents\\SideProjects_C\\hyperspectral-project\\data\\pipeline_output\\hdf5_downsampled\\Wyvern_D16_ABBY_DP3_5520

#### Check the input data structure

In [46]:
in_keys_ls = list(input_file_dict.keys())
in_keys_ls
# next(iter(input_file_dict)) alternative way
downSampler.h5dump(input_file_dict[in_keys_ls[0]])

	 - ABBY : <HDF5 group "/ABBY" (1 members)>
		 - Reflectance : <HDF5 group "/ABBY/Reflectance" (2 members)>
			 - Metadata : <HDF5 group "/ABBY/Reflectance/Metadata" (7 members)>
				 - Ancillary_Imagery : <HDF5 group "/ABBY/Reflectance/Metadata/Ancillary_Imagery" (14 members)>
					 - Aerosol_Optical_Depth : <HDF5 dataset "Aerosol_Optical_Depth": shape (1000, 1000), type "<i2">
							 - Band_Names : b'AOT (aerosol optical thickness at 550 nm)*1000'
							 - Data_Ignore_Value : -9999.0
							 - Description : b'Aerosol Optical Depth at 500 nm.'
							 - Scale_Factor : 1000.0
							 - Units : b'Aerosol Optical Depth at 500 nm.'
					 - Aspect : <HDF5 dataset "Aspect": shape (1000, 1000), type "<f4">
							 - Data_Ignore_Value : -9999.0
							 - Description : b'Aspect used as input to ATCOR'
							 - Dimension_Labels : b'-'
							 - Scale_Factor : 1.0
							 - Units : b'degrees'
					 - Cast_Shadow : <HDF5 dataset "Cast_Shadow": shape (1000, 1000), type "|u1">
							 - Data_Ign

In [47]:
out_keys_ls = list(output_file_dict.keys())
out_keys_ls
# next(iter(input_file_dict)) alternative way
downSampler.h5dump(output_file_dict[out_keys_ls[1]])

	 - Reflectance : <HDF5 group "/Reflectance" (19 members)>
		 - Band 1 Reflectance_Data : <HDF5 dataset "Band 1 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - reflectance scale factor : 10000.0
				 - spatial extent : [ 459000.  460000. 4448000. 4449000.]
		 - Band 10 Reflectance_Data : <HDF5 dataset "Band 10 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - reflectance scale factor : 10000.0
				 - spatial extent : [ 459000.  460000. 4448000. 4449000.]
		 - Band 11 Reflectance_Data : <HDF5 dataset "Band 11 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - refle

### Dig into Coordinate Reference System

In [48]:
print('Filename: ', in_keys_ls[0])
print('Path: ', input_file_dict[in_keys_ls[0]])

f = h5py.File(input_file_dict[in_keys_ls[0]], 'r') # read in file

Filename:  NEON_D16_ABBY_DP3_552000_5071000_reflectance.h5
Path:  C:\Users\Chris\Documents\SideProjects_C\hyperspectral-project\data\pipeline_input\NEON.D16.ABBY.DP3.30006.001.2019-07.basic.20210211T072435Z.RELEASE-2021\NEON_D16_ABBY_DP3_552000_5071000_reflectance.h5


In [49]:
Coordinate_System = f['ABBY']['Reflectance']['Metadata']['Coordinate_System']
Coordinate_System.keys()

<KeysViewHDF5 ['Coordinate_System_String', 'EPSG Code', 'Map_Info', 'Proj4']>

In [50]:
# look at what is in the coordinate system string:
print(Coordinate_System['Coordinate_System_String'].shape)
print(Coordinate_System['Coordinate_System_String'].dtype)
Coordinate_System['Coordinate_System_String'][()]

()
object


b'PROJCS["UTM_Zone_10N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-123.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]'

In [51]:
# look at what is in the EPSG Code:
print(Coordinate_System['EPSG Code'].shape)
print(Coordinate_System['EPSG Code'].dtype)
Coordinate_System['EPSG Code'][()]

()
object


b'32610'

In [52]:
# look at what is in the Map Info:
print(Coordinate_System['Map_Info'].shape)
print(Coordinate_System['Map_Info'].dtype)
Coordinate_System['Map_Info'][()]

()
object


b'UTM,  1.000,  1.000,       552000.00,       5072000.0,       1.0000000,       1.0000000,  10,  North,  WGS-84,  units=Meters, 0'

In [53]:
# look at what is in the Proj4:
print(Coordinate_System['Proj4'].shape)
print(Coordinate_System['Proj4'].dtype)
Coordinate_System['Proj4'][()]

()
object


b'+proj=UTM +zone=10 +ellps=WGS84 +datum=WGS84 +units=m +no_defs'

Looks like we can use this Proj4 with GDAL to help convert to Lon,Lat used in STAC spec

### Array to hdf5 raster layer image:

In [61]:
# work on the output function

def array2h5raster(refl_array, wavelength_array, FWHM_array, metadata_dict, filename_output):
    """
    Takes in a 3-D reflectance array, an array of band centre wavelengths, FWHM array,
    and an additional metadata dictionary and generates a HDF5 file with the given filename.
    """

    # scale the reflectance data up by the original reflectance factor to save disk space
    scale_fac = metadata_dict['reflectance scale factor']
    refl_array = refl_array*scale_fac

    hf = h5py.File(filename_output, 'w') # create hdf5 file
    g1 = hf.create_group('Reflectance') # create main group
    g2 = hf.create_group('Reflectance/Metadata') # group for metadata

    # datasets
    # ------------------------------------------------------------------------------
    # reflectance data
    
    for band in range(len(refl_array[0,0,:])):
        print("band",band)
        refl_dset = g1.create_dataset('Band ' + str(band+1) + ' Reflectance_Data',data=refl_array[:,:,band], dtype='i') # dataset for each band of reflectance data
        refl_dset.attrs['Description'] =  'Atmospherically corrected reflectance.'
        refl_dset.attrs['data ignore value'] = metadata_dict['data ignore value']
        refl_dset.attrs['reflectance scale factor'] = metadata_dict['reflectance scale factor']
        refl_dset.attrs['Spatial_Resolution_X_Y'] = metadata_dict['Spatial_Resolution_X_Y']
        refl_dset.attrs['spatial extent'] = metadata_dict['spatial extent']

    # wavelength data
    wav_dset = g2.create_dataset('Wavelength',data=wavelength_array) # band centre wavelength data
    wav_dset.attrs['Description'] = 'Central wavelength of the reflectance bands.'
    wav_dset.attrs['Units'] = 'nanometers'

    # FWHM data
    FWHM_dset = g2.create_dataset('FWHM',data=FWHM_array) # FWHM data
    FWHM_dset.attrs['Description'] = 'Full width half maximum of reflectance bands.'
    FWHM_dset.attrs['Units'] = 'nanometers'

    # coordinates data
    g2_1 = g2.create_group('Coordinate_System') # group for metadata
    Proj4_dset = g2_1.create_dataset('Proj4',data=metadata_dict['Proj4']) # band Proj4 data
    EPSG_dset = g2_1.create_dataset('EPSG Code',data=metadata_dict['EPSG Code']) # EPSG Code
    map_dset = g2_1.create_dataset('Map_Info',data=metadata_dict['EPSG Code']) # EPSG Code
    coor_dset = g2_1.create_dataset('Coordinate_System_String',data=metadata_dict['Coordinate_System_String']) # coordinate system
    map_dset.attrs['Description'] = ("List of geographic information in the following order:\n"
                                    "   - Projection name\n"
                                    "   - Reference (tie point) pixel x location (in file coordinates)\n"
                                    "   - Reference (tie point) pixel y location (in file coordinates)\n"
                                    "   - Pixel easting\n"
                                    "   - Pixel northing\n"
                                    "   - x pixel size\n"
                                    "   - y pixel size\n"
                                    "   - Projection zone (UTM only)\n"
                                    "   - North or South (UTM only)\n"
                                    "   - Datum\n"
                                    "   - Units\n"
                                    "   - Rotation Angle\n"
                                    )
    
    #     - Description : List of geographic information in the following order: 
    #         - Projection name
    #         - Reference (tie point) pixel x location (in file coordinates)
    #         - Reference (tie point) pixel y location (in file coordinates)
    #         - Pixel easting
    #         - Pixel northing
    #         - x pixel size
    #         - y pixel size
    #         - Projection zone (UTM only)
    #         - North or South (UTM only)
    #         - Datum
    #         - Units
    #         - Rotation Angle
    
    hf.close() # close file to save and write to disk


In [55]:
refl_array, wavelength_array, FWHM_array, metadata_dict = downSampler.h5data2array(input_file_dict[in_keys_ls[5]])

In [56]:
# parameter setup:
data_dir_path = Path(os.getcwd()).parents[0] / 'data' / 'pipeline_input' # path to input data files
output_data_path = Path(os.getcwd()).parents[0] / 'data' / 'pipeline_output' # path to input data files
desired_GSD = 4 # GSD in m
desired_band_centres = [0.505, 0.526, 0.544, 0.565, 0.586, 0.606, 0.626, 0.646, 0.665, 0.682, 0.699, 0.715, 0.730, 0.745, 0.762, 0.779, 0.787, 0.804]
print("Parameters set!")

Parameters set!


In [57]:
# perform downsampling
resamp_refl_array = downSampler.downSample_reband_array(refl_array, metadata_dict['Spatial_Resolution_X_Y'][0], desired_GSD, wavelength_array, desired_band_centres) # downsample
metadata_dict['Spatial_Resolution_X_Y'] = [float(desired_GSD), float(desired_GSD)] # adjust resolution metadata to reflect downsampling
print("Downsampling Complete...")

Downsampling Complete...


In [58]:
resamp_refl_array.shape

(1000, 1000, 18)

In [62]:
refl_arrayname_output =  Path(output_data_path / 'TEST_output_Geo1.h5')
band_width_array = downSampler.band_widths(desired_band_centres)
array2h5raster(resamp_refl_array, desired_band_centres, band_width_array, metadata_dict, refl_arrayname_output)

band 0
band 1
band 2
band 3
band 4
band 5
band 6
band 7
band 8
band 9
band 10
band 11
band 12
band 13
band 14
band 15
band 16
band 17


In [63]:
#Path("C:\Users\Chris\Documents\SideProjects_C\hyperspectral-project\data\pipeline_output\TEST_output_1.h5")
downSampler.h5dump(refl_arrayname_output)

	 - Reflectance : <HDF5 group "/Reflectance" (19 members)>
		 - Band 1 Reflectance_Data : <HDF5 dataset "Band 1 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - reflectance scale factor : 10000.0
				 - spatial extent : [ 459000.  460000. 4448000. 4449000.]
		 - Band 10 Reflectance_Data : <HDF5 dataset "Band 10 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - reflectance scale factor : 10000.0
				 - spatial extent : [ 459000.  460000. 4448000. 4449000.]
		 - Band 11 Reflectance_Data : <HDF5 dataset "Band 11 Reflectance_Data": shape (1000, 1000), type "<i4">
				 - Description : Atmospherically corrected reflectance.
				 - Spatial_Resolution_X_Y : [4. 4.]
				 - data ignore value : -9999.0
				 - refle

In [None]:
print(resamp_refl_array.shape)
resamp_refl_array[:,:,0].shape

In [None]:
len(resamp_refl_array[:,0,0])

In [None]:
range(1,len(resamp_refl_array[0,0,:]))

In [None]:
out_keys_ls
# next(iter(input_file_dict)) alternative way
downSampler.h5dump(output_file_dict[out_keys_ls[1]])

### Now write out a GeoTIFF:

In [None]:
# Good SO answer here: https://gis.stackexchange.com/questions/164853/reading-modifying-and-writing-a-geotiff-with-gdal-in-python
output_data_path

rows = len(resamp_refl_array[:,0,0]) # 1000 x 1000
cols = len(resamp_refl_array[0,:,0])
outFileName = 'GeoTiff_test_1'


driver = gdal.GetDriverByName("GTiff")
outdata = driver.Create(outFileName, rows, cols, 1, gdal.GDT_UInt16)
outdata.SetProjection("EPSG:4326")


for band in range(1,len(resamp_refl_array[0,0,:])):
    outdata.GetRasterBand(band).WriteArray(resamp_refl_array[:,:,0])

outdata.FlushCache() # save to disk

# clean up to free memory
#outdata = None
#band=None
#ds=None

# The "1" in driver.Create() indicates the number of bands. Then you can write on each band with outdata.GetRasterBand(band_number). It starts at 1, not zero.

In [50]:
dataset = gdal.Open(r'C:\Users\Chris\Documents\SideProjects_C\hyperspectral-project\data\pipeline_output\OUTPUT.tif')
dataset

<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x000001FCE0EC34E0> >

In [None]:
print(dataset.GetMetadata())

In [48]:
# import os
# import gdal

# file = "path+filename"
# ds = gdal.Open(file)
# band = ds.GetRasterBand(1)
# arr = band.ReadAsArray()
# [cols, rows] = arr.shape
# arr_min = arr.min()
# arr_max = arr.max()
# arr_mean = int(arr.mean())
# arr_out = numpy.where((arr < arr_mean), 10000, arr)
# driver = gdal.GetDriverByName("GTiff")
# outdata = driver.Create(outFileName, rows, cols, 1, gdal.GDT_UInt16)
# outdata.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
# outdata.SetProjection(ds.GetProjection())##sets same projection as input
# outdata.GetRasterBand(1).WriteArray(arr_out)
# outdata.GetRasterBand(1).SetNoDataValue(10000)##if you want these values transparent
# outdata.FlushCache() ##saves to disk!!
# outdata = None
# band=None
# ds=None

### Change Log

- Split output hdf5 image into separate bands
- Coordinates added to output HDF5 file
- Convert to GeoTiff functionality

## Questions:

- Does Skywatches standard implement all of the STAC stardards (Item, Catalog, Collection, API)? - Looks like yes
- "WYVERN_DE_20211209T2133_OJcjZkG_metadata.json" - Metadata:
    - Where do you define the CRS used? WGS84?
    - How much of the metadata to use from NEON and how much should be from our planned satellite? - e.g. for the satellite parameters
    - Is any metadata stored with the .tiff file(s) itself or all in the STAC geojson file?
    - It looks like each layer is separate .tiff?
- "WYVERN_DE_20211209T2133_OJcjZkG_rpc.json" - Rational Polynomial Coefficients (RPCs) - do we have these? (Compact representation of camera model for photogrammetry)
- "WYVERN_DE_20211209T2133_OJcjZkG_eph.json" - Orbital state vectors - don't have these from the NEON aircraft data - do we have/would we need to simulate satellite's orbit to generate?

TODO:

- Adapt output code to output to GeoTIFF format - 
- fix coordinates - Done
- Find out specification of SkyWatches data format, GeoTIFF? COG/COGeoTIFF? (Cloud-optimized GeoTIFF) - DONE See below:

If you already have a sample product structure we could work with it. We would primarily like that the raster image be in COG GeoTiff so they are easy to clip.
If you're just start to package something then it would be ideal if the product format could follow STAC for the metadata and COG GeoTiff for the raster images.
- https://stacspec.org/
- https://www.cogeo.org/
- RPCs should follow the NITF RPC00B
- http://everyspec.com/DoD/DoD-PUBLICATIONS/STDI-0002_v2--1-_CE_for_NITFS_2994/
- I could send you an example product if you like




- TEST!
    - QGIS
    - ENVI tool: https://www.l3harrisgeospatial.com/Software-Technology/ENVI
    - ERDAS ER Mapper 2020 tool: https://download.hexagongeospatial.com/en/downloads/imagine/erdas-er-mapper-2020
 

# ------------------------------ WIP After This Point --------------------------------------------

In [66]:
# count = 0
# for i in range(1, len(refl_array[0,0,:])+1):
#     count += 1
#     #print(i)
# print('count',count)

In [None]:
# import os
# import gdal

# file = "path+filename"
# ds = gdal.Open(file)
# band = ds.GetRasterBand(1)
# arr = band.ReadAsArray()
# [cols, rows] = arr.shape
# arr_min = arr.min()
# arr_max = arr.max()
# arr_mean = int(arr.mean())
# arr_out = numpy.where((arr < arr_mean), 10000, arr)
# driver = gdal.GetDriverByName("GTiff")
# outdata = driver.Create(outFileName, rows, cols, 1, gdal.GDT_UInt16)
# outdata.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
# outdata.SetProjection(ds.GetProjection())##sets same projection as input
# outdata.GetRasterBand(1).WriteArray(arr_out)
# outdata.GetRasterBand(1).SetNoDataValue(10000)##if you want these values transparent
# outdata.FlushCache() ##saves to disk!!
# outdata = None
# band=None
# ds=None