# Removing noise
This exercise uses PDAL to remove unwanted noise in an ALS collection.

## Exercise
PDAL provides the outlier filter to apply a statistical filter to data.

Because this operation is somewhat complex, we are going to use a pipeline to define it.

In [7]:
%matplotlib inline

import os
import sys

import matplotlib.pyplot as plt
import pdal
from six.moves.urllib.request import urlretrieve

If you've run the previous tutorials, then you probably already have the data, but if not, this will download some sample data.

In [8]:
url = 'https://github.com/PDAL/data/raw/master/isprs/'
last_percent_reported = None
data_root = '.' # Change me to store data elsewhere

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()
      
    last_percent_reported = percent
        
def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
  if force or not os.path.exists(dest_filename):
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', dest_filename)
  else:
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

pc_filename = maybe_download('samp11-utm.laz', 99563)
# pc_filename = maybe_download('CSite1_orig-utm.laz', 4539968)

('Found and verified', './samp11-utm.laz')


## Pipeline breakdown

### 1. Reader

### 2. filters.outlier
The PDAL outlier filter does most of the work for this operation.

```json
{
  "type": "filters.outlier",
  "method": "statistical",
  "multiplier": 3,
  "mean_k": 8
},
```

In [9]:
json = u'''
{
  "pipeline":[
    "%s",
    {
      "type": "filters.outlier",
      "method": "statistical",
      "multiplier": 3,
      "mean_k": 8
    },
    {
      "type": "filters.range",
      "limits": "Classification![7:7],Z[-100:3000]"
    },
    {
      "type": "writers.las",
      "compression": "true",
      "minor_version": "2",
      "dataformat_id": "0",
      "filename":"./clean.laz"
    }
  ]
}''' % pc_filename

p = pdal.Pipeline(json)
p.validate()
p.loglevel = 8
count = p.execute()
log = p.log
print(log)

(pypipeline filters.outlier Debug) 		Labeled 241 outliers as noise!
(pypipeline writers.las Debug) Wrote 37769 points to the LAS file

