# Writing a Pandas data frame with LAS data to a HDF5 file

## Populate data frame

In [2]:
%pip install laspy
import laspy
las = laspy.read("../data/test.las")

import numpy as np
import pandas as pd
df = pd.DataFrame({'X': np.array(las.x), 'Y': np.array(las.y), 'Z': np.array(las.z), 'intensity': las.intensity, 'raw_classification': las.raw_classification, 'gps_time': las.gps_time})
df.head(5)

Note: you may need to restart the kernel to use updated packages.


Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125


## Write the data

There is a built-in HDF export for data frames, see the [Pandas doc](http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables). However 'pytables' is an implicit dependency.
The 'key' identifies the object in the HDF file in case it contains multiple objects. We use 'LAS' as a key to indicate LAS data.

In [6]:
%pip install tables

df.to_hdf('../data/test.h5', mode='w', key='LAS', format='fixed', complib='blosc', complevel=9)

Note: you may need to restart the kernel to use updated packages.


The HDF5 file can be compressed (depending on the version of pandas and parameters). But the data contains the coordinates as doubles, whereas LAS stores 4 byte integres (plus offset & scale in the header). We can compare the file sizes for this simple example of the original LAS and the resulting HDF file. Some general perfomrance considerations are given [here](https://pandas.pydata.org/pandas-docs/stable/io.html#performance-considerations).

In [13]:
import os

print('LAS file: {} MB'.format(os.stat('../data/test.las').st_size/(1024*1024)))
print('HDF file: {} MB'.format(os.stat('../data/test.h5').st_size/(1024*1024)))


LAS file: 8.686984062194824 MB
HDF file: 6.291470527648926 MB


## Read the data back

In [14]:
df2 = pd.read_hdf('../data/test.h5', 'LAS')
df2.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125
