# Curse of Dimensionality Demo

In this example, we generate random data in hypercubes of different dimensions, and calculate the time it takes to compute the distances between a randomly chosen point and all the other points in the dataset.

As the dimensionality increases, the distance between points becomes less meaningful, and the amount of data required to "fill" the space increases exponentially. This leads to a significant increase in computation time as the dimensionality grows.

You should see that the time required to calculate the distances increases dramatically as the dimensionality of the data increases, which is a clear demonstration of the curse of dimensionality.

In [2]:
import numpy as np
import time

def generate_data(dimension, num_points):
    """
    Generate random data in a hypercube of a given dimension and number of points.
    """
    data = np.random.rand(num_points, dimension)
    return data

def calculate_distance(point, data):
    """
    Calculate the distance between a point and all other points in a dataset.
    """
    distances = np.linalg.norm(data - point, axis=1)
    return distances

def main():
    # Define the number of points and dimensions to test
    num_points = 1000
    dimensions = [2, 5, 10, 50, 100, 500, 1000]

    # Generate random data in each dimension and time how long it takes to calculate the distances
    for d in dimensions:
        data = generate_data(d, num_points)
        point = np.random.rand(d)
        start_time = time.time()
        distances = calculate_distance(point, data)
        end_time = time.time()
        print(f"Dimension {d}: {end_time - start_time:.5f} seconds")
        
if __name__ == "__main__":
    main()


Dimension 2: 0.00017 seconds
Dimension 5: 0.00016 seconds
Dimension 10: 0.00023 seconds
Dimension 50: 0.00072 seconds
Dimension 100: 0.00127 seconds
Dimension 500: 0.00489 seconds
Dimension 1000: 0.00913 seconds
