# Advanced Python for Data Science
### DS-GA 1019

### Homework Assignment 04
### Due date: 02/28/2024, 4:00PM
### Student's Name: Jiasheng Ni
### Student's e-mail: jn2294@nyu.edu

# Problem 1 (100 points)

The task is to optimize your solutions by using "line_profiler". 

Your submission "spring2024_sol04_yourid.ipynb" will contain:
- the first part is your original solution (a solution that you originally wrote); 
- the second part is your final, optimized solution after using line_profiler; 
- both of which will include the line_profiler results, and your detailed comments.


The problem is to simulate a random motion of $n$ objects over a discrete time. 

Concretely, there is:
- a unit square $[0,1]^2$, 
- $n$ points within the unit square, 
- and the time is discrete $t=0, 1, 2, \dots$. 

At time $t=0$, the positions of $n$ points are randomly and uniformly distributed within the unit square; call these positions $\{p_0, p_1, p_2,\dots, p_{n-1}\}$. 

At every time step $t \geq 0$, every point $i$, chooses to randomly move in one of four directions: left, right, up, down. The distance is also random uniform number on $[0, \delta]$, where $\delta$ is given. 

That is, at every time step $t$ and for every $i$ we generate a random move as: 
$$ p_i := p_i + r_i \cdot u_i$$
where 
$$ r_i \sim uniform[0, \delta],$$ 
and 
$u_i$ represents a random direction, i.e. a randomly chosen vector among $(-1, 0), (1, 0), (0, -1), (0, 1)$.

**Dynamics**

Now, one would like to examine and plot the diagram of the minimum distance $d_{\min}$ among these $n$ points over $T$ iterations.

The task is to complete the rest of this notebook, where definitions of the functions main_orignal and main_optimized are given below. 

 ---
 
 ### The original code description: 
 
 The solution adheres to the principle of OOP, where I defined the each point to be an object for easier management. 

 Point Object
 - Each point object has two state variables, representing x coordinate and y coordinate. 
 - The update method is used to update the state of point object across different iteration. 
 
 iteration()
 - Update the state of each point using the formula above.

 compute_min_distance()
 - This function calculates the minimum euclidean distance across all points

 initialize_points()
 - This function initialize points with randomized point coordinates.
 
 ---


In [7]:
import math
import itertools
import numpy as np

class Point:

    def __init__(self, x, y):
        self._x = x
        self._y = y

    @staticmethod
    def distance(point1, point2):
        """
        Compute the euclidean distance between two points
        """
        return math.sqrt((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2)

    def update(self, offset, step_size):
        self.x += offset[0] * step_size
        self.y += offset[1] * step_size

    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @x.setter
    def x(self, value):
        self._x = value

    @y.setter
    def y(self, value):
        self._y = value



def iteration(points, delta):
    direction_lst = [(-1,0),(1,0),(0,-1),(0,1)]
    for point in points:
        step_size = np.random.uniform(0, delta)
        direction = direction_lst[np.random.choice([0,1,2,3], size = 1, p = [0.25] * 4)[0]]
        point.update(direction, step_size)


def compute_min_distance(points):
    return min(map(lambda pair: Point.distance(pair[0], pair[1]), itertools.combinations(points, 2)))


def initialize_points(n):
    point_lst = []
    for _ in range(n):
        random_x = np.random.uniform(0, 1)
        random_y = np.random.uniform(0, 1)
        point_lst.append(Point(random_x, random_y))

    return point_lst


def main_original(n, delta, T): 
    point_lst = initialize_points(n)
    lst_of_min_distance = []
    for _ in range(T):
        lst_of_min_distance.append(compute_min_distance(point_lst))
        iteration(point_lst, delta)

    return lst_of_min_distance


print(main_original(5, 3, 50))

[0.11450411811261453, 0.2230457527956587, 0.7470379417899513, 0.5292947223970594, 0.9027443421439343, 1.3356094952213642, 0.9518496387723032, 1.9684092576741188, 1.4517113296606177, 1.3731185948673696, 2.5152577465119905, 2.6716619736108833, 3.4461453024314213, 1.1953047595135518, 1.9383215515949537, 0.7310069759153576, 0.6864921430314398, 2.1988411818196782, 2.5718657483058203, 1.4900449430615053, 2.4042882356940996, 5.585664229829599, 4.8376200862332, 2.1752626015988286, 2.7484909431714937, 1.6107687580588819, 1.5997771763936475, 2.4507384700511707, 3.0164313136636856, 3.08789173673001, 1.2564283160659169, 1.9631838845997633, 2.7761219790314815, 2.1971388858336307, 2.0913630474924525, 3.1395753643251307, 4.409813762920556, 3.275712254374627, 0.8838644083638882, 1.7143014908775094, 2.0978777274066136, 3.7786776496014114, 5.584106382126308, 5.624303332500556, 5.607642405649521, 4.124858746507339, 4.125167571497679, 5.16930575667008, 4.815054792922085, 5.596687034101521]


#### Performance Monitoring

In [8]:
# Import Performance Tuning Tools
import cProfile
import re
%load_ext line_profiler

In [10]:
cProfile.run('main_original(5, 3, 50)')

         8225 function calls in 0.006 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      500    0.001    0.000    0.001    0.000 2004844512.py:11(distance)
      250    0.000    0.000    0.000    0.000 2004844512.py:18(update)
     1250    0.000    0.000    0.000    0.000 2004844512.py:22(x)
     1250    0.000    0.000    0.000    0.000 2004844512.py:26(y)
      250    0.000    0.000    0.000    0.000 2004844512.py:30(x)
      250    0.000    0.000    0.000    0.000 2004844512.py:34(y)
       50    0.000    0.000    0.005    0.000 2004844512.py:40(iteration)
       50    0.000    0.000    0.001    0.000 2004844512.py:48(compute_min_distance)
      500    0.000    0.000    0.001    0.000 2004844512.py:49(<lambda>)
        1    0.000    0.000    0.000    0.000 2004844512.py:52(initialize_points)
        1    0.000    0.000    0.006    0.006 2004844512.py:62(main_original)
        5    0.000    0.000    0.000    0.000 20048

In [9]:
%lprun -f main_original main_original(5, 3, 50)

Timer unit: 1e-07 s

Total time: 0.0083498 s
File: C:\Users\alexm\AppData\Local\Temp\ipykernel_62380\2004844512.py
Function: main_original at line 62

Line #      Hits         Time  Per Hit   % Time  Line Contents
    62                                           def main_original(n, delta, T): 
    63                                               """ 
    64                                               n: is the number of uniformly at random generated points in the unit square 
    65                                               delta: a maximal move of a point in one of four random directions: left, right, up, or down 
    66                                               T: number of iterations
    67                                               return: 
    68                                               lst_of_min_distances: of the minimum distances among all n points over times: t=0, 1, 2, \dots, T - 1,
    69                                               it is a list of reals 

---

#### The optimized code description: 

** TO BE POPULATED **

** EXPLAIN THE SOLUTION ** 

 ---
 

In [56]:
import itertools
import numpy as np


def iteration(points, delta):
    direction_lst = np.array([[-1,0],[1,0],[0,-1],[0,1]])

    for i in range(points.shape[1]):
        points[:,i] += delta * direction_lst[np.random.choice([0,1,2,3], size = 1, p = [0.25] * 4)[0]]
    


def compute_min_distance(points):
    # Expand the points array to a square form where each slice along the third dimension
    # represents the difference between one point and all others
    diff = points[:, :, np.newaxis] - points[:, np.newaxis, :]
    # Compute the squared distances (avoid sqrt for speed and square later)
    sq_distances = np.sum(diff ** 2, axis=0)
    # Ensure the diagonal (distances to self) is ignored by setting it to infinity
    np.fill_diagonal(sq_distances, np.inf)
    # Find and return the minimum distance (square root of the minimum squared distance)
    return np.sqrt(np.min(sq_distances))


def initialize_points(n):
    """
    Initialize n points in the second axis
    """
    point_lst = np.random.uniform(low = 0, high = 1, size = (2, n))
    return point_lst


def main_original(n, delta, T): 
    point_lst = initialize_points(n)
    lst_of_min_distance = []
    for _ in range(T):
        lst_of_min_distance.append(compute_min_distance(point_lst))
        iteration(point_lst, delta)

    return lst_of_min_distance

print(main_original(5, 3, 50))

[0.16895294790084103, 0.24456844176006207, 0.5976161291425454, 0.24456844176006182, 4.250495798143483, 4.670614980291789, 3.9980736802658234, 4.164269938224258, 0.16895294790084103, 4.095177690663402, 4.700622708045667, 0.7946406607702514, 4.7006227080456675, 0.7946406607702514, 3.5842386532298764, 5.857573455531934, 8.337410601476341, 5.894250411674614, 8.85888353342021, 9.57936602400659, 4.632251283375962, 3.704068359137133, 9.166603726501679, 3.704068359137133, 9.166603726501679, 12.59726464450057, 9.166603726501679, 9.166603726501679, 3.704068359137133, 3.9260184588979263, 6.174883397799439, 6.174883397799439, 6.174883397799439, 9.048028211693227, 13.288430251779493, 13.84053153061843, 16.82249345301153, 13.816524026569263, 15.480502953063104, 15.480502953063104, 13.816524026569263, 9.111204768332444, 6.865258078484886, 6.865258078484886, 12.865254809929896, 16.144578956035875, 16.144578956035875, 14.191443378695, 17.586339841393603, 17.586339841393603]


#### Performance Monitoring

In [63]:
%lprun -f main_original main_original(5, 3, 50)

Timer unit: 1e-07 s

Total time: 0.0066054 s
File: C:\Users\alexm\AppData\Local\Temp\ipykernel_62380\1378250577.py
Function: main_original at line 36

Line #      Hits         Time  Per Hit   % Time  Line Contents
    36                                           def main_original(n, delta, T): 
    37         1        281.0    281.0      0.4      point_lst = initialize_points(n)
    38         1          2.0      2.0      0.0      lst_of_min_distance = []
    39        51        118.0      2.3      0.2      for _ in range(T):
    40        50      11682.0    233.6     17.7          lst_of_min_distance.append(compute_min_distance_optimized(point_lst))
    41        50      53970.0   1079.4     81.7          iteration(point_lst, delta)
    42                                           
    43         1          1.0      1.0      0.0      return lst_of_min_distance

In [None]:
import math
import itertools
import numpy as np

class Point:

    def __init__(self, x, y):
        self._x = x
        self._y = y

    @staticmethod
    def distance(point1, point2):
        """
        Compute the euclidean distance between two points
        """
        return math.sqrt((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2)

    def update(self, offset, step_size):
        self.x += offset[0] * step_size
        self.y += offset[1] * step_size

    @property
    def x(self):
        return self._x

    @property
    def y(self):
        return self._y

    @x.setter
    def x(self, value):
        self._x = value

    @y.setter
    def y(self, value):
        self._y = value



def iteration(points, delta):
    direction_lst = [(-1,0),(1,0),(0,-1),(0,1)]
    for point in points:
        step_size = np.random.uniform(0, delta)
        direction = direction_lst[np.random.choice([0,1,2,3], size = 1, p = [0.25] * 4)[0]]
        point.update(direction, step_size)


def compute_min_distance(points):
    return min(map(lambda pair: Point.distance(pair[0], pair[1]), itertools.combinations(points, 2)))


def initialize_points(n):
    point_lst = []
    for _ in range(n):
        random_x = np.random.uniform(0, 1)
        random_y = np.random.uniform(0, 1)
        point_lst.append(Point(random_x, random_y))

    return point_lst


def main_original(n, delta, T): 
    point_lst = initialize_points(n)
    lst_of_min_distance = []
    for _ in range(T):
        lst_of_min_distance.append(compute_min_distance(point_lst))
        iteration(point_lst, delta)

    return lst_of_min_distance


print(main_original(5, 3, 50))

#### Performance Tuning

%lprun -f main_original main_original(5, 3, 50)