# High Dimensional Space
This notebook aims to help gain a better understanding of high dimensional spaces and how properities such as distance between points and volume changes as the dimensions of theh input data increases

### Import
Import the libraries needed to compute the distance and volume given input data and its dimensionality

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.special as sci
from scipy.spatial.distance import pdist 

### Define Data Examples
Define the number of data examples across all the dimensions

In [2]:
np.random.seed(0)
n_data = 1000

### Define Dimensionality of Data
Define data each with different dimensions (2 dimensions, 100 dimensions, 1000 dimensions). 

In [3]:
# Create 1000 data examples (columns) each with 2 dimensions (rows)
n_dim = 2
x_2D = np.random.normal(size=(n_dim,n_data))

# Create 1000 data examples (columns) each with 100 dimensions (rows)
n_dim = 100
x_100D = np.random.normal(size=(n_dim,n_data))

# Create 1000 data examples (columns) each with 1000 dimensions (rows)
n_dim = 1000
x_1000D = np.random.normal(size=(n_dim,n_data))

### Define Distance Function
Define a function that calculates the ratio between the smallest Euclidean distance between two points and the largest Euclidean distance between two points 

In [4]:
def distance_ratio(x):
  # Compute the smallest Euclidean distance between two points
  smallest_dist = np.min(pdist(x.T, metric='euclidean'))

  # Compute the largest Euclidean distance between two points
  largest_dist = np.max(pdist(x.T, metric='euclidean'))

  # Calculate the ratio and return
  dist_ratio = largest_dist / smallest_dist
  return dist_ratio

### Compute Distance Ratio 
Print the distance ratio for increasing dimensions of the input data

In [5]:
print('Ratio of largest to smallest distance 2D: %3.3f'%(distance_ratio(x_2D)))
print('Ratio of largest to smallest distance 100D: %3.3f'%(distance_ratio(x_100D)))
print('Ratio of largest to smallest distance 1000D: %3.3f'%(distance_ratio(x_1000D)))

Ratio of largest to smallest distance 2D: 2840.258
Ratio of largest to smallest distance 100D: 2.038
Ratio of largest to smallest distance 1000D: 1.221


### Define Volume of Hypersphere Function
Define a function that calculates the volume of a hypersphere given its diameter and its dimensions

In [6]:
def volume_of_hypersphere(diameter, dimensions):
  pi = np.pi
  radius = diameter/2
  volume = ((radius**dimensions)*(pi**(dimensions/2)))/(sci.gamma(dimensions/2+1))

  return volume

### Compute Hypersphere Volume 
Compute the volume of a hypersphere for increasing dimensions of the input data

In [7]:
diameter = 1.0
for c_dim in range(1,11):
  print("Volume of unit diameter hypersphere in %d dimensions is %3.3f"%(c_dim, volume_of_hypersphere(diameter, c_dim)))

Volume of unit diameter hypersphere in 1 dimensions is 1.000
Volume of unit diameter hypersphere in 2 dimensions is 0.785
Volume of unit diameter hypersphere in 3 dimensions is 0.524
Volume of unit diameter hypersphere in 4 dimensions is 0.308
Volume of unit diameter hypersphere in 5 dimensions is 0.164
Volume of unit diameter hypersphere in 6 dimensions is 0.081
Volume of unit diameter hypersphere in 7 dimensions is 0.037
Volume of unit diameter hypersphere in 8 dimensions is 0.016
Volume of unit diameter hypersphere in 9 dimensions is 0.006
Volume of unit diameter hypersphere in 10 dimensions is 0.002


### Define 1% Radius Volume Proportion Function
Define a function that calculates the volume of a hypersphere in the outer 1% of the radius and its proportion to the entire volume of a hypersphere

In [8]:
def get_prop_of_volume_in_outer_1_percent(dimension):
  volume = volume_of_hypersphere(diameter, dimension)
  volume99= volume_of_hypersphere(0.99*diameter, dimension)
  proportion = (volume - volume99)/volume
  return proportion

### Compute 1% Radius Volume Proportion
Compute the proportion between the volume of a hypersphere in the outer 1% of the radius and the entire volume of a hypersphere for an increasing number of dimensions

In [9]:
for c_dim in [1,2,10,20,50,100,150,200,250,300]:
  print('Proportion of volume in outer 1 percent of radius in %d dimensions =%3.3f'%(c_dim, get_prop_of_volume_in_outer_1_percent(c_dim)))

Proportion of volume in outer 1 percent of radius in 1 dimensions =0.010
Proportion of volume in outer 1 percent of radius in 2 dimensions =0.020
Proportion of volume in outer 1 percent of radius in 10 dimensions =0.096
Proportion of volume in outer 1 percent of radius in 20 dimensions =0.182
Proportion of volume in outer 1 percent of radius in 50 dimensions =0.395
Proportion of volume in outer 1 percent of radius in 100 dimensions =0.634
Proportion of volume in outer 1 percent of radius in 150 dimensions =0.779
Proportion of volume in outer 1 percent of radius in 200 dimensions =0.866
Proportion of volume in outer 1 percent of radius in 250 dimensions =0.919
Proportion of volume in outer 1 percent of radius in 300 dimensions =0.951
