In [1]:
import pandas as pd
import math

from IPython.display import Image

# Data

Let's see what we have first

In [2]:
df = pd.read_csv('traks.csv')
df.head()

Unnamed: 0,track;time;x;y
0,1;10:32:13;1598;526
1,1;10:32:14;1524;544
2,1;10:32:15;1441;557
3,1;10:32:16;1357;392
4,1;10:32:17;1395;573


Data seems a little messy. Let's clean up a little

In [3]:
columns = df.columns[0].split(";")
df = df[df.columns[0]].str.split(";", expand = True)
df.columns = columns
df.head()

Unnamed: 0,track,time,x,y
0,1,10:32:13,1598,526
1,1,10:32:14,1524,544
2,1,10:32:15,1441,557
3,1,10:32:16,1357,392
4,1,10:32:17,1395,573


# Comparison

Now we need to think about how we would compare the trajectories...
The easiest strategy to compare trajectories "a" and "b" as it seems will be just to sum up the differencies between them in a following way:

    1) We find for each dot of trajectory "a" the closest dot of trajectory "b"
    2) Then we just add the distance between these 2 dots to the total sum
    3) And make conclusions based on the sums we found
    
But first, it seems like we have to make sure that all our trajectories have the same amount of rows, because it will contribute to the final sum otherwise

In [4]:
[len(df.loc[df['track'] == str(i)]) for i in range(1,5)]

[35, 27, 33, 19]

### First problems

Now, when I see that all the trajectories have quite different amount of dots, I notice that there is a whole bunch of problems that will face the algorithm proposed before:

    1) Dots might be too far away, what will cause the problem of recognizing the trajectories as not similar, 
    while they will be actually very close. All because we will calculate tutal distance(sum) between trajectories 
    using dots' distance
![title](Picture1.png?1)

    2) Another problem of proposed aproach comes from total dots amount difference. Imagine difference between summing 
    up 20 distances and 200. The following picture explain the difference in sums. Ofc we can introduce smth like 
    average distance between trajectories, where we will devide total sum by the numbers of dots considered, or 
    fill up dots amount to some  standard (200 dots for example). And this will solve partially problem 2), but 
    problem 1) still remains
![title](Picture2.png?1)

### Solution

We will try to combine 2 methods:

    1) we will introduce average distance between trajectories as follows:
        avg_dist = sum(dist(a[i], b))/len(a), where a[i] - i-th dot on trajectory a,
        dist(a[i], b) - distance between a[i] and the closest dot on trajectory "b",
        and additionally we take as "a" trajectory the trajectory with the largest amount of points
    
    2) we calculate dist(a[i], b) by making a parametrization of each line segment of trajectory "b" and 
    by finding the closest point to dot a[i] out of all these segments (the simplest optimization problem)
    and we add min of these distances to the subsequent sum.


In [5]:
def distance(a_i, b_lines):
    """
    Computes distance between dot a_i (a[i]) and trajectory "b" represented by b_lines
    
    """
    
    x0 = a_i[0]
    y0 = a_i[1]
    distances = []
    for line in b_lines:
        A = line[0]
        B = line[1]
        C = line[2]
        distances.append(abs(A*x0 + B*y0 + C)/math.sqrt(A**2 + B**2))
        
    return min(distances)

In [6]:
def comparison(a, b: pd.core.frame.DataFrame):
    """
    Computes average distance between 2 trajectories
    """
    if len(a)<len(b): b, a = a, b
    a_dots = [list(map(int, i)) for i in zip(a.x.tolist(), a.y.tolist())]
    b_dots = [list(map(int, i)) for i in zip(b.x.tolist(), b.y.tolist())]
    b_lines = []
    for i in range(len(b_dots) - 1):
        x1 = b_dots[i][0]                # Ax+By+C=0 - equation of a line
        x2 = b_dots[i+1][0]              # A = y1-y2,
        y1 = b_dots[i][1]                # B = x2-x1,
        y2 = b_dots[i+1][1]              # C = (x1-x2)*y1 + (y2-y1)*x1.
        b_lines.append([y1-y2, x2-x1, (x1-x2)*y1 + (y2-y1)*x1])
    
    sum = 0
    for i in range(len(a_dots)):
        sum += distance(a_dots[i], b_lines)
    
    avg_dist = sum/len(a_dots)
    
    return avg_dist

# Results

Now, when we have a complete algorithm to compare trajectories, let's find som avg_distancies between them as was asked in the task

### Trajectory 3 and Trajectory 1

In [7]:
print("avg_dist(tr3, tr1) =", round(comparison(df.loc[df['track'] == '3'], df.loc[df['track'] == '1'])), "pixels")

avg_dist(tr3, tr1) = 10 pixels


We cannot say anything except that the average distance between trajectories is 10 pixels (we don't know if it's a lot or not)

### Trajectory 2 abd Trajectory 1

In [8]:
print("avg_dist(tr3, tr1) =", round(comparison(df.loc[df['track'] == '2'], df.loc[df['track'] == '1'])), "pixels")

avg_dist(tr3, tr1) = 19 pixels


Almost 2 times more than in the first case

### Trajectory 4 and Trajectories 1 to 3

In [9]:
for i in range(1,4):
    print("avg_dist(tr4, tr"+str(i)+") =", round(comparison(df.loc[df['track'] == '4'], df.loc[df['track'] == str(i)])), "pixels")


avg_dist(tr4, tr1) = 36 pixels
avg_dist(tr4, tr2) = 38 pixels
avg_dist(tr4, tr3) = 42 pixels


Yes, the 4th trajectory really stands out