# Spatial subsampling
This notebook presents how spatial subsampling can be performed using `py4dgeo`. 
    
**Implemented by**  
Ronald Tabernig ([@tabernig](https://github.com/tabernig), Heidelberg University)

In [None]:
import py4dgeo
import pooch

As a first step, we specify the path to the point cloud which we want to subsample and the path to the output file.

In [None]:
# Set up pooch to download data from Zenodo
p = pooch.Pooch(base_url="doi:10.5281/zenodo.18432391/", path=pooch.os_cache("py4dgeo"))
p.load_registry_from_doi()

try:
    # Download and extract the dataset
    p.fetch("trier_sim.zip", processor=pooch.Unzip(members=["trier_sim"]))

    # Define path to the extracted data
    data_path = p.path / "trier_sim.zip.unzip"
    print(f"Data path: {data_path}")

    infile = (
        data_path / "trier_sim_epoch_0.laz"
    ) 

    epoch1 = py4dgeo.read_from_las(infile)

except Exception as e:
    print(f"Failed to download or extract data: {e}")

In [None]:
outfile = str(infile).replace(".laz", "_subsampled.laz")

When subsampling, we may want to keep dimensions that are present in the original file. `py4dgeo` does not load any dimensions other than `X`,`Y`, or `Z` by default. Accordingly we have to define which dimensions (i.e., point attributes) we want to carry over from the original point cloud to the subsampled point cloud.

In [None]:
dims = {"return_number": "return_number", "number_of_returns": "number_of_returns"}

epoch = py4dgeo.read_from_las(infile, additional_dimensions=dims)

For the spatial subsampling, we convert the `Epoch` object into a `Vapc` object, which allows voxel-based point cloud operations. Using this `Vapc` object, we subsample the point cloud to one point per voxel. Accordingly, the `voxel_size` parameter lets us control the spatial resolution. We need to select which point to keep per voxel. We offer the following options:

* "closest_to_centroids": keeping the point closest to the centroid (R)
* "closest_to_voxel_centers": keeping the point closest to the voxel center (R)
* "centroid": keeping the centroid (S)
* "voxel_center": keeping the voxel center (S)

**R** indicates that real points from the original point cloud are kept, whereas **S** indicates that new synthetic points are created. Real points keep the attributes from the original point cloud, whereas synthetic points are assigned the average of all points per voxel and attribute.

In [None]:
# Mute vapc function trace and timeit for cleaner output
py4dgeo.enable_trace(False)
py4dgeo.enable_timeit(False)

voxel_size = 2
reduce_to_mode = "closest_to_centroids"  # other options are "closest_to_voxel_centers", "centroid", "voxel_center"
voxel_epoch = py4dgeo.Vapc(epoch, voxel_size=voxel_size)
reduced_vapc = voxel_epoch.reduce_to_feature(reduce_to_mode)

After reducing the point cloud to one point per voxel, we save the output.

In [None]:
reduced_vapc.save_as_las(outfile=outfile)

We may also wish to save th voxels as 3D boxes. The `save_as_ply` function accomplishes this by saving occupied voxels as cubes in a triangle mesh. The edge length of these cubes is defined by the voxel size set before. The `features` to be stored with each voxel must be listed. In this example, we select all available features. The `mode` option allows us to define the center of each cube. Just like for the `reduce_to_feature` method, the following options are available: "closest_to_centroids", "closest_to_voxel_centers", "centroid" and "voxel_center".

In [None]:
try:
    outfile_ply = outfile.replace(".laz", ".ply")
    reduced_vapc.save_as_ply(
        outfile=outfile_ply, features=reduced_vapc.out.keys(), mode=reduce_to_mode
    )
    print(f"Results saved to folder: {data_path}")
except:
    print("Failed to save PLY file. Check if 'plyfile' is installed.")
    print("You can try installing it by uncommenting and running the following lines of code in a new cell:")
    print("import sys")
    print("!conda install --yes --prefix {sys.prefix} conda-forge::plyfile")