Spatial algorithms and data structures

**`scipy.spatial`** can compute triangulations, Voronoi diagrams, and convex hulls of a set of points, by leveraging the **Qhull** library.

Moreover, it contains **KDTree** implementations for nearest-neighbor point queries, and utilities for distance computations in various metrics.

In [6]:
from scipy import spatial

In [2]:
help(spatial)

Help on package scipy.spatial in scipy:

NAME
    scipy.spatial

DESCRIPTION
    Spatial algorithms and data structures (:mod:`scipy.spatial`)
    
    .. currentmodule:: scipy.spatial
    
    Spatial Transformations
    Contained in the `scipy.spatial.transform` submodule.
    
    Nearest-neighbor Queries
    .. autosummary::
       :toctree: generated/
    
       KDTree      -- class for efficient nearest-neighbor queries
       cKDTree     -- class for efficient nearest-neighbor queries (faster impl.)
       Rectangle
    
    Distance metrics are contained in the :mod:`scipy.spatial.distance` submodule.
    
    Delaunay Triangulation, Convex Hulls and Voronoi Diagrams
    
    .. autosummary::
       :toctree: generated/
    
       Delaunay    -- compute Delaunay triangulation of input points
       ConvexHull  -- compute a convex hull for input points
       Voronoi     -- compute a Voronoi diagram hull from input points
       SphericalVoronoi -- compute a Voronoi diagram fr

In [7]:
dir(spatial)

['ConvexHull',
 'Delaunay',
 'HalfspaceIntersection',
 'KDTree',
 'Rectangle',
 'SphericalVoronoi',
 'Voronoi',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_distance_wrap',
 '_hausdorff',
 '_plotutils',
 '_procrustes',
 '_spherical_voronoi',
 '_voronoi',
 'absolute_import',
 'cKDTree',
 'ckdtree',
 'convex_hull_plot_2d',
 'delaunay_plot_2d',
 'distance',
 'distance_matrix',
 'division',
 'kdtree',
 'minkowski_distance',
 'minkowski_distance_p',
 'print_function',
 'procrustes',
 'qhull',
 'test',
 'transform',
 'tsearch',
 'voronoi_plot_2d']

# Distance computations

In [1]:
from scipy.spatial import distance

In [5]:
help(distance)

Help on module scipy.spatial.distance in scipy.spatial:

NAME
    scipy.spatial.distance

DESCRIPTION
    Distance computations (:mod:`scipy.spatial.distance`)
    
    .. sectionauthor:: Damian Eads
    
    Function Reference
    ------------------
    
    Distance matrix computation from a collection of raw observation vectors
    stored in a rectangular array.
    
    .. autosummary::
       :toctree: generated/
    
       pdist   -- pairwise distances between observation vectors.
       cdist   -- distances between two collections of observation vectors
       squareform -- convert distance matrix to a condensed one and vice versa
       directed_hausdorff -- directed Hausdorff distance between arrays
    
    Predicates for checking the validity of distance matrices, both
    condensed and redundant. Also contained in this module are functions
    for computing the number of observations in a distance matrix.
    
    .. autosummary::
       :toctree: generated/
    
       is

In [2]:
dir(distance)

['MetricInfo',
 '_METRICS',
 '_METRICS_NAMES',
 '_METRIC_ALIAS',
 '_TEST_METRICS',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_args_to_kwargs_xdist',
 '_asarray_validated',
 '_convert_to_bool',
 '_convert_to_double',
 '_convert_to_type',
 '_copy_array_if_base_present',
 '_correlation_cdist_wrap',
 '_correlation_pdist_wrap',
 '_distance_wrap',
 '_filter_deprecated_kwargs',
 '_hausdorff',
 '_nbool_correspond_all',
 '_nbool_correspond_ft_tf',
 '_select_weighted_metric',
 '_validate_cdist_input',
 '_validate_mahalanobis_kwargs',
 '_validate_minkowski_kwargs',
 '_validate_pdist_input',
 '_validate_seuclidean_kwargs',
 '_validate_vector',
 '_validate_weights',
 '_validate_wminkowski_kwargs',
 'absolute_import',
 'braycurtis',
 'callable',
 'canberra',
 'cdist',
 'chebyshev',
 'cityblock',
 'correlation',
 'cosine',
 'dice',
 'directed_hausdorff',
 'division',
 'euclidean',
 'hamming',
 'is_valid_dm',
 'is_va

## pdist

pairwise distances between observation vectors

```python
distance.pdist(X, metric='euclidean', *args, **kwargs)
```

In [10]:
points = [[0,1], [0,2], [0,3]]

In [11]:
distance.pdist(points)

array([1., 2., 1.])

In [14]:
#mahattan distance
v = [[0,0], [0,1], [1,0], [1,1]]
distance.pdist(v, metric = 'cityblock')

array([1., 1., 2., 2., 1., 1.])

## cdist

distances between two collections of observation vectors

```python
distance.cdist(XA, XB, metric='euclidean', *args, **kwargs)
```

In [16]:
u = [[0,0], [0,2]]
v = [[0,3], [0,4]]

distance.cdist(u, v)

array([[3., 4.],
       [1., 2.]])

**`scipy.spatial.distance_matrix`**

In [26]:
#equivalent
from scipy.spatial import distance_matrix
distance_matrix(u,v)

array([[3., 4.],
       [1., 2.]])

In [30]:
#correlation distance
distance.cdist(u, v, metric = 'correlation')

array([[           nan,            nan],
       [0.00000000e+00, 2.22044605e-16]])

In [31]:
#cosine distance
distance.cdist(u, v, metric = 'cosine')

array([[nan, nan],
       [ 0.,  0.]])

## some distance functions

In [3]:
from scipy.spatial.distance import euclidean, cityblock, sqeuclidean

In [20]:
#euclidean distance
euclidean([0,0], [1,1], w = None)

1.4142135623730951

In [21]:
#sqared euclidean distane
sqeuclidean([0,0],[1,1], w = None)

2.0

In [22]:
#manhattan distance
cityblock([0,0], [1,1])

2

<b style = 'color:red'>First warning</b>: We must use these routines, instead of creating our own
definitions of the corresponding distance functions whenever possible.
They guarantee a faster result, and optimal coding to take care of situations
in which the inputs are either too large or too small.


<b style = 'color:red'>Second warning</b>: These functions work great when comparing two vectors;
however, for the pairwise computation of many vectors, we must resort to
the **`pdist`** routine. This command takes an m x n array representing m vectors
of dimension n, and computes the distance of each of them to each other. 

<b style = 'color:red'>Third warning</b>: When computing the distance between each pair of
two collections of inputs, we use the cdist routine

<b style = 'color:red'>Fourth warning</b>: When we have a large amount of data points, and we
need to address the problem of nearest neighbors (for example, to locate the
closest element of the data to a new instance point), we seldom do it by brute
force. The optimal algorithm to perform this search is based in the idea of
k-dimensional trees. SciPy has two classes to handle these objects – **`KDTree`** and
**`cKDTree`**. The latter is a subset of the former, a little faster since it is wrapped
from C code, but with very limited use. 

# KDTree

fast neighbor look up

```python
KDTree(data, leafsize=10)

----------------
Docstring:
kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which
can be used to rapidly look up the nearest neighbors of any point.
----------------
Parameters
----------
data : (N,K) array_like
    The data points to be indexed. This array is not copied, and
    so modifying this data will result in bogus results.
leafsize : int, optional
    The number of points at which the algorithm switches over to
    brute-force.  Has to be positive.

```

In [1]:
import numpy as np
from scipy.spatial import KDTree
points = np.random.randint(0, 10, size = (10, 2))
points

array([[2, 1],
       [1, 0],
       [4, 8],
       [8, 6],
       [5, 3],
       [0, 3],
       [1, 9],
       [8, 5],
       [1, 1],
       [5, 4]])

In [6]:
#Tree construction
tree = KDTree(points)


In [7]:
dir(tree)

['_KDTree__build',
 '_KDTree__query',
 '_KDTree__query_ball_point',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'count_neighbors',
 'data',
 'innernode',
 'leafnode',
 'leafsize',
 'm',
 'maxes',
 'mins',
 'n',
 'node',
 'query',
 'query_ball_point',
 'query_ball_tree',
 'query_pairs',
 'sparse_distance_matrix',
 'tree']

Query Tree: **`KDTree.query`**

```python
tree.query(x, k=1, eps=0, p=2, distance_upper_bound=inf)
--------
x : array_like, last dimension self.m
    An array of points to query.
k : int, optional
    The number of nearest neighbors to return.
```

In [8]:
#query data
#let's find 5 nearest neighbors of [0,0]
tree.query([0,0], 5) #return distance_array, index_of_target_aary

(array([1.        , 1.41421356, 2.23606798, 3.        , 5.83095189]),
 array([1, 8, 0, 5, 4]))

In [9]:
distance, index = tree.query([0,0], 5)
#let's see 5 nearest points in the dataset to [0,0]
points[index]

array([[1, 0],
       [1, 1],
       [2, 1],
       [0, 3],
       [5, 3]])

In [10]:
#their correspond distance to [0,0]
distance

array([1.        , 1.41421356, 2.23606798, 3.        , 5.83095189])

In [13]:
#query multiple points: return a list of tuple having from (distance, index) for each point in the query list
tree.query([[0,0], [1,1]], k = 5)

(array([[1.        , 1.41421356, 2.23606798, 3.        , 5.83095189],
        [0.        , 1.        , 1.        , 2.23606798, 4.47213595]]),
 array([[1, 8, 0, 5, 4],
        [8, 0, 1, 5, 4]]))

### Some KDtree methods

**`count_neighbors`**: compute the number of nearby pairs that can be formed with another KDTree  
**`query_ball_point`**: find all points at a given distance from the input  
**`query_ball_tree`** and **`query_pairs`**: find all pairs of points within certain distance  
**`sparse_distance_matrix`**: that computes a sparse matrix with the distances between two KDTree classes