# Prepare ATL08 transects 
This notebook stores the code I used to process the ICESat-2 ATL08 data.

The data was already stored in local computer in .h5 format. The code reads the data into a geopandas dataframe while different filters are applied in order to only include data from the area of interest and to reject data that does not meet the required parameters. Additional parameters are saved for each data point.

Then, the data is changed from WGS84 crs to local coordinate system and saved to a new geopandas dataframe.

The data is finally exported as geopackage, which can be opened in QGIS, and as csv file, which both can be read into the next notebooks in the workflow.

In [1]:
import os
import h5py
from datetime import datetime
import pandas as pd
import geopandas as gpd
from shapely.geometry import MultiLineString, LineString, Point, Polygon

## 1. Import data from local computer to a geodataframe while applying filters

The ATL08 data will be filtered to 
1. reject segments with RG98 canopy heights smaller than 0 m and greater than 50 m, as the tallest tree recorded in Estonia, which is 48,6 meters tall (State
Forest Management Center, 2015). 
2. Second, the segments with radiometric parameter values exceeding 16 photons per shot will be eliminated as the ATLAS detector can only detect 16 photons per outgoing shot. 
3. Using the height difference between the estimated ground surface by ICESat-2 and the MERIT DEM used by the ICESat-2 systems, segments where the absolute height difference is greater than 30 meters will be considered as noise and filtered
out. 
4. Lastly, the signal return strength can vary due to snow (Neuenschwander et al., 2022), therefore the transects acquired in the presence of snow are filtered out by using ATL08 snow flag. The snow flag is derived from the daily NOAA Global Multi-sensor Snow/Ice Cover map (Palm et al., 2018).

The following data is saved:
* The track ID
* The segment ID
* Timestamp together with year and month extracted from the timestamp
* Point geometry marking the middle of the ATL08 segment
* Beam number
* Beam type
* To detect scattering, msq flag from ATL09 product can be used where value larger than 0 indicates scattering which may affect the results - not implemented. 
* Canopy height of the segment
* Number of canopy photons
* Number of top of canopy photons
* Number of terrain photons
* Canopy photon rate
* Terrain photon rate
* Solar elevation
* Scattering flag

Document used for understanding the data structure: https://icesat-2.gsfc.nasa.gov/sites/default/files/page_files/ICESat2_ATL08_ATBD_r005_0.pdf 
together with HDFView.

Once the suitable ATL08 transects are retrieved, they are projected to the Estonian horizontal reference
system (EPSG: 3301). 

The data is then exported to geodataframe.



In [2]:
data_loc = 'Z:\\Your\\Data\\Location'

dataframe = [] 
tracks = ['gt1l', 'gt1r', 'gt2r', 'gt2l', 'gt3l', 'gt3r']
gps_epoch = 315964800

for file in os.listdir(data_loc): 
    if file.startswith('ATL08_') and file.endswith('.h5'):
        FILE_NAME = data_loc+file

        with h5py.File(FILE_NAME, mode='r') as f: 
            # read the epoch to be able to correct the time format of data
            epoch = f['/ancillary_data/atlas_sdp_gps_epoch'][0]
                
            # iterate over each track in the list specified above
            for track in tracks:
                    
                path = '/' + str(track) + '/'
                    
                # save the values for all segments in the track into list variables
                # get beam number and type
                beam_type = str(f[path].attrs['atlas_beam_type'].decode("utf-8"))
                beam_nr = int((f[path].attrs['atlas_spot_number']).decode("utf-8"))        
                time =  f[path + 'land_segments/delta_time'][:]
               
            
                # values used for filtering
                # reference dem elevation
                h_dif_ref = f[path + 'land_segments/h_dif_ref'][:]
                # 98% height of all the individual relative canopy heights for segment.
                canopy_98h = f[path + 'land_segments/canopy/h_canopy'][:]
                snow = f[path + 'land_segments/segment_snowcover'][:]
                    
                # Center latitude of signal photons within each segment
                latvar = f[path +'land_segments/latitude']
                lat = latvar[:]
                lonvar = f[path +'/land_segments/longitude']
                lon = lonvar[:]
                    
                # Use the iterator to go over every segment in the track
                inner_iter = 0
                while inner_iter < lat.size:
                    # calculate photon sum for the segment
                    photon_sum = float(f[path + 'land_segments/canopy/photon_rate_can'][:][inner_iter]) + float(f[path + 'land_segments/terrain/photon_rate_te'][:][inner_iter])  
                    
                    # filter data using if statement
                    # Estonia's bbox is 'EE': ('Estonia', (23.3397953631, 57.4745283067, 28.1316992531, 59.6110903998)), (https://gist.github.com/graydon/11198540)
                    if lat[inner_iter]<59.6110903998 and lat[inner_iter] > 57.4745283067 and lon[inner_iter] <28.1316992531 and lon[inner_iter]>23.3397953631 and \
                    canopy_98h[inner_iter] > 0 and canopy_98h[inner_iter] < 50 and h_dif_ref[inner_iter] < 30 and photon_sum < 16 and snow[inner_iter] == 1:   
                            
                        # first need to get epoch time which is in the metadata of the file. Then add from that
                        #date = datetime.fromtimestamp(time[inner_iter]+epoch).strftime('%Y-%m-%d %H:%M:%S')
                        year = datetime.fromtimestamp(time[inner_iter]+epoch+gps_epoch).strftime('%Y')
                        month = datetime.fromtimestamp(time[inner_iter]+epoch+gps_epoch).strftime('%m')
                            
                        # create point geometry
                        point = Point(lon[inner_iter], lat[inner_iter])
                            
                        # add to the dataframe
                        dataframe.append({ 'track': track,  'seg': f[path + 'land_segments/segment_id_beg'][:][inner_iter],\
                                          'timestamp':time[inner_iter], 'year': year, 'month': month,\
                                            'geometry': point, 'beam_nr': beam_nr, 'beam_t': beam_type,\
                                            'can_98h': canopy_98h[inner_iter], 'n_can_pho': f[path + 'land_segments/canopy/n_ca_photons'][:][inner_iter], 'n_topcan_pho':f[path + 'land_segments/canopy/n_toc_photons'][:][inner_iter],\
                                          'n_ter_pho': f[path + 'land_segments/terrain/n_te_photons'][:][inner_iter], 'can_pho_rate': f[path + 'land_segments/canopy/photon_rate_can'][:][inner_iter], 'ter_pho_rate': f[path + 'land_segments/terrain/photon_rate_te'][:][inner_iter],\
                                            'solar_el': f[path + 'land_segments/solar_elevation'][:][inner_iter], 'cloud': f[path + 'land_segments/msw_flag'][:][inner_iter]
                                            })
                        inner_iter += 1
                    else:
                        inner_iter += 1

    

In [3]:
df = pd.DataFrame(dataframe)
df.head()

Unnamed: 0,track,seg,timestamp,year,month,geometry,beam_nr,beam_t,can_98h,n_can_pho,n_topcan_pho,n_ter_pho,can_pho_rate,ter_pho_rate,solar_el,cloud
0,gt1l,670809,24718180.0,2018,10,POINT (24.893220901489258 59.54066848754883),6,weak,9.191999,24,6,176,0.447761,2.626866,-21.46711,3
1,gt1l,670864,24718180.0,2018,10,POINT (24.8912296295166 59.53083419799805),6,weak,13.351807,22,4,250,0.317073,3.04878,-21.471407,3
2,gt1l,670869,24718180.0,2018,10,POINT (24.891050338745117 59.52994155883789),6,weak,14.949738,27,8,20,0.921053,0.526316,-21.47179,3
3,gt1l,670914,24718180.0,2018,10,POINT (24.88941192626953 59.52189636230469),6,weak,21.750122,38,13,6,1.416667,0.166667,-21.475302,3
4,gt1l,670934,24718180.0,2018,10,POINT (24.888683319091797 59.5183219909668),6,weak,20.947483,36,5,11,1.242424,0.333333,-21.476864,3


In [4]:
# Create Geodataframe
gdf = gpd.GeoDataFrame(data = dataframe, geometry =df.geometry, crs= 'EPSG:4326')

In [5]:
gdf.head()

Unnamed: 0,track,seg,timestamp,year,month,geometry,beam_nr,beam_t,can_98h,n_can_pho,n_topcan_pho,n_ter_pho,can_pho_rate,ter_pho_rate,solar_el,cloud
0,gt1l,670809,24718180.0,2018,10,POINT (24.89322 59.54067),6,weak,9.191999,24,6,176,0.447761,2.626866,-21.46711,3
1,gt1l,670864,24718180.0,2018,10,POINT (24.89123 59.53083),6,weak,13.351807,22,4,250,0.317073,3.04878,-21.471407,3
2,gt1l,670869,24718180.0,2018,10,POINT (24.89105 59.52994),6,weak,14.949738,27,8,20,0.921053,0.526316,-21.47179,3
3,gt1l,670914,24718180.0,2018,10,POINT (24.88941 59.52190),6,weak,21.750122,38,13,6,1.416667,0.166667,-21.475302,3
4,gt1l,670934,24718180.0,2018,10,POINT (24.88868 59.51832),6,weak,20.947483,36,5,11,1.242424,0.333333,-21.476864,3


## Reproject to Estonian coordinate system

In [6]:
gdf_est = gdf.to_crs('epsg:3301')

In [7]:
gdf_est.head()

Unnamed: 0,track,seg,timestamp,year,month,geometry,beam_nr,beam_t,can_98h,n_can_pho,n_topcan_pho,n_ter_pho,can_pho_rate,ter_pho_rate,solar_el,cloud
0,gt1l,670809,24718180.0,2018,10,POINT (550531.997 6600681.958),6,weak,9.191999,24,6,176,0.447761,2.626866,-21.46711,3
1,gt1l,670864,24718180.0,2018,10,POINT (550433.908 6599584.921),6,weak,13.351807,22,4,250,0.317073,3.04878,-21.471407,3
2,gt1l,670869,24718180.0,2018,10,POINT (550425.084 6599485.347),6,weak,14.949738,27,8,20,0.921053,0.526316,-21.47179,3
3,gt1l,670914,24718180.0,2018,10,POINT (550344.255 6598587.886),6,weak,21.750122,38,13,6,1.416667,0.166667,-21.475302,3
4,gt1l,670934,24718180.0,2018,10,POINT (550308.291 6598189.157),6,weak,20.947483,36,5,11,1.242424,0.333333,-21.476864,3


In [8]:
gdf_est['lon'] = gdf_est['geometry'].x
gdf_est['lat'] = gdf_est['geometry'].y
gdf_est.head()

Unnamed: 0,track,seg,timestamp,year,month,geometry,beam_nr,beam_t,can_98h,n_can_pho,n_topcan_pho,n_ter_pho,can_pho_rate,ter_pho_rate,solar_el,cloud,lon,lat
0,gt1l,670809,24718180.0,2018,10,POINT (550531.997 6600681.958),6,weak,9.191999,24,6,176,0.447761,2.626866,-21.46711,3,550531.996546,6600682.0
1,gt1l,670864,24718180.0,2018,10,POINT (550433.908 6599584.921),6,weak,13.351807,22,4,250,0.317073,3.04878,-21.471407,3,550433.908225,6599585.0
2,gt1l,670869,24718180.0,2018,10,POINT (550425.084 6599485.347),6,weak,14.949738,27,8,20,0.921053,0.526316,-21.47179,3,550425.083941,6599485.0
3,gt1l,670914,24718180.0,2018,10,POINT (550344.255 6598587.886),6,weak,21.750122,38,13,6,1.416667,0.166667,-21.475302,3,550344.254889,6598588.0
4,gt1l,670934,24718180.0,2018,10,POINT (550308.291 6598189.157),6,weak,20.947483,36,5,11,1.242424,0.333333,-21.476864,3,550308.29092,6598189.0


# Export to geodataframe

In [None]:
# export only seg_id and geometry for QGIS
export = gdf_est[['seg', 'geometry']]
export.to_file('..\\Data\\icesat_data\\icesat2_est.gpkg', driver='GPKG', layer='icesat2_transect_points')

# export all of the data
export_2 = gdf_est
export_2.to_file('..\\Data\\icesat_data\\icesat2_est.gpkg', driver='GPKG', layer='all_icesat2_data')