# Writing a Pandas data frame with LAS data to a HDF5 file

## Populate data frame

In [19]:
from laspy.file import File
inFile = File("../data/test.las", mode = "r")

import pandas as pd
df = pd.DataFrame({'X': inFile.x, 'Y': inFile.y, 'Z': inFile.z, 'intensity': inFile.intensity, 'raw_classification': inFile.raw_classification, 'gps_time': inFile.gps_time})
df.head(5)

Unnamed: 0,X,Y,Z,gps_time,intensity,raw_classification
0,555000.0625,4887200.0,120.940003,467000.4375,30,1
1,555000.6875,4887199.5,117.330002,467000.5,22,1
2,555001.3125,4887200.0,115.339996,467000.5,10,1
3,555000.1875,4887197.0,123.910004,467000.53125,31,1
4,555001.9375,4887200.0,111.110001,467000.53125,8,1


## Write the data

There is a built-in HDF export for data frames, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
The 'key' identifies the object in the HDF file in case it conatins multiple objects. We use 'LAS' to indicate LAS data.

In [30]:
df.to_hdf('../data/test.h5', mode='w', key='LAS', format='fixed', complib='blosc', complevel=9)

The HDF5 file can be compressed (depending on the version of pandas and parameters). But the data contains the coordinates as doubles, whereas LAS stores 4 byte integres (plus offset & scale in the header).

In [22]:
import os
statinfo = os.stat('../data/test.las')
statinfo.st_size

9108963

In [31]:
statinfo = os.stat('../data/test.h5')
statinfo.st_size

6505778

## Read the data back

In [27]:
df2 = pd.read_hdf('../data/test.h5', 'LAS')
df2.head(5)

Unnamed: 0,X,Y,Z,gps_time,intensity,raw_classification
0,555000.0625,4887200.0,120.940003,467000.4375,30,1
1,555000.6875,4887199.5,117.330002,467000.5,22,1
2,555001.3125,4887200.0,115.339996,467000.5,10,1
3,555000.1875,4887197.0,123.910004,467000.53125,31,1
4,555001.9375,4887200.0,111.110001,467000.53125,8,1
