# A Gentle Introduction to PDAL
Hopefully you've arrived here by starting the Docker container. By doing so, you've already installed the prerequisites and should have no issues stepping through the remainder of the notebooks.

We start by importing some packages that are required to download our sample data, and create and execute a simple PDAL pipeline that reads the point cloud into a Numpy `ndarray`.

In [19]:
from __future__ import print_function

import os
import sys

import pdal
from six.moves.urllib.request import urlretrieve

The next batch of code is simply to check whether or not we have already downloaded the sample data, and to do so now, if needed. We will be working with some of the ISPRS ground filtering datasets that are hosted in the `PDAL/data` repository on GitHub.

In [6]:
url = 'https://github.com/PDAL/data/raw/master/isprs/'
last_percent_reported = None
data_root = '.' # Change me to store data elsewhere

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()
      
    last_percent_reported = percent
        
def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
  if force or not os.path.exists(dest_filename):
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', dest_filename)
  else:
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

pc_filename = maybe_download('samp11-utm.laz', 99563)

('Found and verified', './samp11-utm.laz')


Next, we create our first, very simply pipeline. It specifies only the path to the input data we wish to read. In later tutorials, we will expand on the pipeline, adding filters.

In [7]:
json = u'''
{
  "pipeline":[
    "%s"
  ]
}''' % pc_filename

Create the pipeline, check that it's valid, and execute it. We check that we have read the expected number of points and number of dimensions.

In [24]:
p = pdal.Pipeline(json)
p.validate() # check if our JSON and options were good
p.loglevel = 8 #really noisy
count = p.execute()
data = p.arrays[0]
metadata = p.metadata
log = p.log
print('Read', count, 'points with', len(data.dtype), 'dimensions')
print('Dimension names are', data.dtype.names)

Read 38010 points with 12 dimensions
Dimension names are (u'X', u'Y', u'Z', u'Intensity', u'ReturnNumber', u'NumberOfReturns', u'ScanDirectionFlag', u'EdgeOfFlightLine', u'Classification', u'ScanAngleRank', u'UserData', u'PointSourceId')
