# Writing a Pandas data frame with LAS data to a Parquet file

## Populate data frame

In [4]:
from laspy.file import File
inFile = File("../data/test.las", mode = "r")

import pandas as pd
df = pd.DataFrame({'X': inFile.x, 'Y': inFile.y, 'Z': inFile.z, 'intensity': inFile.intensity, 'raw_classification': inFile.raw_classification, 'gps_time': inFile.gps_time})
df.head(5)

Unnamed: 0,X,Y,Z,gps_time,intensity,raw_classification
0,555000.0625,4887200.0,120.940003,467000.4375,30,1
1,555000.6875,4887199.5,117.330002,467000.5,22,1
2,555001.3125,4887200.0,115.339996,467000.5,10,1
3,555000.1875,4887197.0,123.910004,467000.53125,31,1
4,555001.9375,4887200.0,111.110001,467000.53125,8,1


show pandas version installed

In [100]:
pd.__version__

u'0.19.2'

The built-in parquet export only works if pandas is version 0.21.0 or newer. See the [Pandas doc](https://pandas.pydata.org/pandas-docs/stable/io.html#parquet)

In [12]:
df.to_parquet('df.parquet.gzip', compression='gzip')

AttributeError: 'DataFrame' object has no attribute 'to_parquet'

Alternativeley we can use the Apache Arrow Python module to export to Parquet. But the data frame has to be converted to a pyarrow table (copy data?)

In [14]:
!pip install pyarrow
import pyarrow as pa
import pyarrow.parquet as pq

[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [10]:
table = pa.Table.from_pandas(df)
pq.write_table(table, '../data/test.parquet')

Or one can use fastparquet to achive the same without conversion directly on the pandas data frame.

In [15]:
!pip install fastparquet

[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [11]:
from fastparquet import write
write('../data/test_compressed.parq', df,
      compression='GZIP', file_scheme='hive')