# Combine

Combine together the subsets as a single data layer (geojson?)

1. make a list of all the subsets (geojson)
2. read in the 1st subset
    a. add column for uid (granuleid, or name of the file)? What does the user need to find this
    Would the user know what collection this came from?
3. read next subset (loop over remaining)
    a. add column for uid (granuleid, or name of the file)? What does the user need to find this
    b. append to the 1st subset https://geopandas.org/en/stable/docs/user_guide/mergingdata.html
4. save final geodataframe as new geojson

In [19]:
import os.path
from os import PathLike
from functools import reduce
from pathlib import Path
from typing import Iterable, Union

import geopandas as gpd
from profilehooks import timecall

In [20]:
def chext(ext: str, path: str) -> str:
    return f'{os.path.splitext(path)[0]}{ext}'

In [21]:
def combine_subsets(paths: Iterable[Union[str, os.PathLike]]) -> gpd.GeoDataFrame:
    gdfs = (
        gpd.read_file(path).assign(filename=chext('.h5', os.path.basename(path)))
        for path in paths
    )
    
    return reduce(gpd.GeoDataFrame.append, gdfs)

In [13]:
dirpath = Path('/') / 'projects' / 'my-public-bucket' / 'gedi-l4a' / 'gabon'

gabon_gdf = timecall(combine_subsets)(dirpath.glob('*.fgb'))
gabon_gdf.to_file(dirpath.parent / 'gabon.fgb', driver='FlatGeobuf')
gabon_gdf


  combine_subsets (/tmp/ipykernel_25390/1141370264.py:1):
    17.079 seconds

