# DATA PROXIMITY
Important Notes:
* Proximity always works with <b>min. 2 points</b>.
* <b>One list[dimension-1, dimension-2, ..., dimension-n]</b> represents <b>one point</b> and its elements represent dimension in Cartesius Coordinate.

In [1]:
import scipy.spatial.distance as ssd
# Documentation: https://docs.scipy.org/doc/scipy/reference/spatial.distance.html

## Quantitative (Numeric) Attributes Proximity

In [2]:
# 3-Dimensional Space (n=3)
alpha_point = [1, 0, 1]
beta_point = [1, 1, 0]

### 1-D Arrays (between 2 points)

In [3]:
print("Cosine Distance/Similarity:", round(ssd.cosine(alpha_point, beta_point), 5))
print("Minkowski Distance:", round(ssd.minkowski(alpha_point, beta_point), 5))
print("Manhattan/Cityblock Distance:", round(ssd.cityblock(alpha_point, beta_point), 5))
print("Chebyshev/Supremum Distance:", round(ssd.chebyshev(alpha_point, beta_point), 5))
print("Canberra Distance:", round(ssd.canberra(alpha_point, beta_point), 5))
print("Correlation Distance:", round(ssd.correlation(alpha_point, beta_point), 5))
print("Euclidean Distance:", round(ssd.euclidean(alpha_point, beta_point), 5))
print("Squared Euclidean Distance:", round(ssd.sqeuclidean(alpha_point, beta_point), 5))
print("Jensen-Shannon Distance/Metric:", round(ssd.jensenshannon(alpha_point, beta_point), 5))

Cosine Distance/Similarity: 0.5
Minkowski Distance: 1.41421
Manhattan/Cityblock Distance: 2
Chebyshev/Supremum Distance: 1
Canberra Distance: 2.0
Correlation Distance: 1.5
Euclidean Distance: 1.41421
Squared Euclidean Distance: 2.0
Jensen-Shannon Distance/Metric: 0.58871


### 2-D Arrays (between 2 line segments*)
<i>*<b>Line segment</b> is a line bounded between two distinct end points.</i>

In [4]:
# 2-Dimensional Space (n=2)
first_line = [[2, 5], [6, 0]]
second_line = [[1, 3], [3, 4]]

In [5]:
print("Cosine Distance/Similarity:\n", ssd.cdist(first_line, second_line, 'cosine'), "\n")
print("Minkowski Distance:\n", ssd.cdist(first_line, second_line, 'minkowski'), "\n")
print("Manhattan/Cityblock Distance:\n", ssd.cdist(first_line, second_line, 'cityblock'), "\n")
print("Chebyshev/Supremum Distance:\n", ssd.cdist(first_line, second_line, 'chebyshev'), "\n")
print("Canberra Distance:\n", ssd.cdist(first_line, second_line, 'canberra'), "\n")
print("Correlation Distance:\n", ssd.cdist(first_line, second_line, 'correlation'), "\n")
print("Euclidean Distance:\n", ssd.cdist(first_line, second_line, 'euclidean'), "\n")
print("Squared Euclidean Distance:\n", ssd.cdist(first_line, second_line, 'sqeuclidean'), "\n")
print("Jensen-Shannon Distance/Metric:\n", ssd.cdist(first_line, second_line, 'jensenshannon'), "\n")

Cosine Distance/Similarity:
 [[0.00172563 0.03438424]
 [0.68377223 0.4       ]] 

Minkowski Distance:
 [[2.23606798 1.41421356]
 [5.83095189 5.        ]] 

Manhattan/Cityblock Distance:
 [[3. 2.]
 [8. 7.]] 

Chebyshev/Supremum Distance:
 [[2. 1.]
 [5. 4.]] 

Canberra Distance:
 [[0.58333333 0.31111111]
 [1.71428571 1.33333333]] 

Correlation Distance:
 [[0. 0.]
 [2. 2.]] 

Euclidean Distance:
 [[2.23606798 1.41421356]
 [5.83095189 5.        ]] 

Squared Euclidean Distance:
 [[ 5.  2.]
 [34. 25.]] 

Jensen-Shannon Distance/Metric:
 [[0.02852142 0.10567741]
 [0.61676224 0.50676971]] 



### n-D Arrays (between many points)
Using .pdist() will return an array with size of (n(n+1))/2, where n starts from 0 (in this proximity case, min. n > 1 because it represents min. 2 points).<br>
For example,
* with two points [..],[..] (n = 1), it'll return array with size of 1
* with three points [..],[..],[..] (n = 2), it'll return array with size of 3
* with four points [..],[..],[..],[..] (n = 3), it'll return array with size of 6
* with five points [..],[..],[..],[..],[..] (n = 4), it'll return array with size of 10

In [6]:
ssd.pdist([[1, 2, 5], [0, 3, 1], [1, 2, 5], [0, 3, 1], [1, 2, 5], [0, 3, 1]],'euclidean')

array([4.24264069, 0.        , 4.24264069, 0.        , 4.24264069,
       4.24264069, 0.        , 4.24264069, 0.        , 4.24264069,
       0.        , 4.24264069, 4.24264069, 0.        , 4.24264069])