# Reading a LAS file into a Pandas data frame via laspy

## Install laspy library

In [49]:
!pip install laspy

[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Open LAS file

We follow the tutorial from https://pythonhosted.org/laspy/tut_part_1.html

In [50]:
from laspy.file import File
inFile = File("../data/test.las", mode = "r")

Find out which attributes are stored with each record in the LAS file. This depends on the version of the LAS format and the application that created the file.

In [51]:
pointformat = inFile.point_format
for spec in inFile.point_format:
    print(spec.name)

X
Y
Z
intensity
flag_byte
raw_classification
scan_angle_rank
user_data
pt_src_id
gps_time
red
green
blue


## Read the data

Using pandas (https://pandas.pydata.org/) load selected attributes of the records into a table with named headers.

In [99]:
import pandas as pd
df = pd.DataFrame({'X': inFile.x, 'Y': inFile.y, 'Z': inFile.z, 'intensity': inFile.intensity, 'raw_classification': inFile.raw_classification})

Check how many records were loaded and show the first 5 rows of the data frame.

In [92]:
df.size

1607424

In [54]:
df.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification
0,555000.0625,4887200.0,120.940003,30,1
1,555000.6875,4887199.5,117.330002,22,1
2,555001.3125,4887200.0,115.339996,10,1
3,555000.1875,4887197.0,123.910004,31,1
4,555001.9375,4887200.0,111.110001,8,1


Further columns can be added at a later time

In [63]:
df2 = df.assign(gps_time = inFile.gps_time)
df2.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125


But carefull as using 'assign' copies the whole data frame to a new one. Inserting should avoid this.

In [67]:
df['gps_time'] = inFile.gps_time
df.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125


We can query simple statistics for all attributes (i.e. along each column)

In [90]:
df.min()

X                     5.543902e+05
Y                     4.886536e+06
Z                    -1.160000e+00
intensity             1.000000e+00
raw_classification    1.000000e+00
gps_time              4.659909e+05
dtype: float64

The resulting series is of uniform type. Check the data type for a specific attribute (i.e. a single column)

In [84]:
df['intensity'].dtype

dtype('uint16')

In [95]:
df['Z'].max()

395.3800050094404