# Reading a LAS file into a Pandas data frame via laspy

## Install laspy library

The [laspy](https://pypi.org/project/laspy/) package implements reading and writing of the [LAS fromat](https://www.asprs.org/divisions-committees/lidar-division/laser-las-file-format-exchange-activities) 

In [2]:
%pip install "laspy"

Note: you may need to restart the kernel to use updated packages.


## Open LAS file

We follow the tutorial from https://pythonhosted.org/laspy/tut_part_1.html and adapt to laspy 2.0 using https://laspy.readthedocs.io/en/latest/migration.html#from-laspy-1-7-x-to-laspy-2-0-0

In [3]:
import laspy
from laspy.file import File
las = laspy.read("../data/test.las")

Find out which attributes are stored with each record in the LAS file. This depends on the version of the LAS format and the application that created the file.

In [4]:
pointformat = las.point_format
for spec in las.point_format:
    print(spec.name)

X
Y
Z
intensity
return_number
number_of_returns
scan_direction_flag
edge_of_flight_line
classification
synthetic
key_point
withheld
scan_angle_rank
user_data
point_source_id
gps_time
red
green
blue


## Read the data

Using pandas (https://pandas.pydata.org/) load selected attributes of the records into a table with named headers.

In [5]:
import pandas as pd
import numpy as np
df = pd.DataFrame({'X': np.array(las.x), 'Y': np.array(las.y), 'Z': np.array(las.z), 'intensity': las.intensity, 'raw_classification': las.raw_classification})

Check how many records were loaded and show the first 5 rows of the data frame.

In [6]:
df.size

1339520

In [7]:
df.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification
0,555000.0625,4887200.0,120.940003,30,1
1,555000.6875,4887199.5,117.330002,22,1
2,555001.3125,4887200.0,115.339996,10,1
3,555000.1875,4887197.0,123.910004,31,1
4,555001.9375,4887200.0,111.110001,8,1


Further columns can be added at a later time

In [8]:
df2 = df.assign(gps_time = las.gps_time)
df2.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125


But carefull as using 'assign' copies the whole data frame to a new one. Inserting should avoid this.

In [9]:
df['gps_time'] = las.gps_time
df.head(5)

Unnamed: 0,X,Y,Z,intensity,raw_classification,gps_time
0,555000.0625,4887200.0,120.940003,30,1,467000.4375
1,555000.6875,4887199.5,117.330002,22,1,467000.5
2,555001.3125,4887200.0,115.339996,10,1,467000.5
3,555000.1875,4887197.0,123.910004,31,1,467000.53125
4,555001.9375,4887200.0,111.110001,8,1,467000.53125


We can query simple statistics for all attributes (i.e. along each column)

In [10]:
df['Z'].max()

395.3800050094404

In [11]:
df.min()

X                     5.543902e+05
Y                     4.886536e+06
Z                    -1.160000e+00
intensity             1.000000e+00
raw_classification    1.000000e+00
gps_time              4.659909e+05
dtype: float64

The resulting series is of uniform type. Check the data type for a specific attribute (i.e. a single column)

In [12]:
df['intensity'].dtype

dtype('uint16')