<a href="https://colab.research.google.com/github/cmeneses1/GeokMedoidsCalculator/blob/main/k_Medoids_Calculator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# k-Medoids Calculator for geolocations
-----------
This calculator runs the kMedoids algorithm for locations having latitude and longitud from two different approachs: firstly, using the travel distance by car, in minutes; secondly, using the geodesical distance. After that, both series of results are plotted in a map and some mathematical constants are calculated.

## 1. Installing and importing dependencies
Install this required package.

In [1]:
!pip install scikit-learn-extra



Import those required packages.

In [2]:
from geopy import distance
from sklearn_extra.cluster import KMedoids
from seaborn import color_palette
import folium
import pandas as pd
import numpy as np
import requests
import json

## 2. Importing data
Loading a test data file with three columns: `Name`, `Latitude` and `Longitude`.

In [3]:
%%file test.txt
Name	Latitude	Longitude
Is	43.28503	-6.75724
Beveraso	43.2781441	-6.7986945
San Emiliano	43.2574581	-6.8336809
Riodecoba	43.3073637	-6.7926056
Herías	43.3154383	-6.805375

Overwriting test.txt


In [4]:
name = 'test.txt'
data = pd.read_table(name)
data

Unnamed: 0,Name,Latitude,Longitude
0,Is,43.28503,-6.75724
1,Beveraso,43.278144,-6.798695
2,San Emiliano,43.257458,-6.833681
3,Riodecoba,43.307364,-6.792606
4,Herías,43.315438,-6.805375


## 3. Arrays of traveling distances and geodesic distances
Now, we calculate both distances arrays. There are two of them: one for traveling time distance (travel by car and time in minutes), called `durationArray` and another for geodesical distance, called `distanceArray`. The first one is computed by the [OSMR API](http://project-osrm.org/docs/v5.5.1/api/?language=cURL#general-options), the second one, using the [`geopy.distance` function](https://geopy.readthedocs.io/en/stable/#module-geopy.distance).

In [5]:
# Longitude and Latitude vectors
Longitude = data.Longitude
Latitude = data.Latitude

# Number of locations and ID list
n = len(Longitude)
id = np.array(list(zip(range(0,n), range(0, n))))

# Creating a convenient string for OSRM service
lonLatString = ''
for i in range(0, n):
    lon = Longitude[i]
    lat = Latitude[i]
    lonLatString += str(lon) + ',' + str(lat) + ';'

# Not interested in the last ';' string
lonLatString = lonLatString[0:-1]

# call the OSMR API
osmrString = "http://router.project-osrm.org/table/v1/driving/" + lonLatString 
r = requests.get(osmrString)

# Extracting driving time duration array, in minutes, and making it symmetrical
durationArray = 1/60 * np.array(json.loads(r.content)['durations'])
durationArray = 1/2 * (durationArray + np.transpose(durationArray))

# Creating a matrix of geodesic distances
distanceArray = np.zeros((n,n))

# Calculating distances from geopy function `distance`in kilometres
for i in range(0, n):
    t1 = (Latitude[i], Longitude[i])

    for j in range(0, n):
        t2 = (Latitude[j], Longitude[j])

        if i < j:
            distanceArray[i, j] = distance.distance(t1, t2).km
            distanceArray[j, i] = distanceArray[i, j]

## 4. kMedoids for traveling distance and plotting
Lets calculate KMedoids for `durationArray`.

In [6]:
# Choose a number of cluster
n_clustersDuration = 2

# Apply KMedoids function.
kmedoidsDuration = KMedoids(n_clusters=n_clustersDuration, metric='precomputed').fit(durationArray)
durationMedoids = kmedoidsDuration.medoid_indices_
durationLabels = kmedoidsDuration.labels_
print('Medoid indices:', durationMedoids)
print('Medoid labels:', durationLabels)

Medoid indices: [0 4]
Medoid labels: [0 0 1 1 1]


Lets represent in a map.

In [7]:
# Creates the map
f = folium.Figure(width='65%')
m = folium.Map(location=[Latitude.mean(), Longitude.mean()]).add_to(f)

# Having a touch of color.
color = color_palette("husl", n_clustersDuration).as_hex()

# Representing our clustering medoids
for i, elem in enumerate(durationMedoids):
    folium.Circle(
        location=[Latitude[elem], Longitude[elem]],
        radius=300,
        color=color[i],
        fill=False,
        fill_color=color[i],
    ).add_to(m)

# Representing our clustering output
for i, elem in enumerate(durationLabels):
    folium.Circle(
        location=[Latitude[i], Longitude[i]],
        radius=100,
        popup=data.Name[i],
        color=color[elem],
        fill=True,
        fill_color=color[elem],
    ).add_to(m)

# Adjust zoom
sw = data[['Latitude', 'Longitude']].min().values.tolist()
ne = data[['Latitude', 'Longitude']].max().values.tolist()
m.fit_bounds([sw, ne]) 

m

## 5. kMedoids for traveling distance and plotting
Lets calculate KMedoids for `distanceArray`.

In [8]:
# Choose a number of cluster
n_clustersDistance = 2

# Apply KMedoids function.
kmedoidsDistance = KMedoids(n_clusters=n_clustersDistance, metric='precomputed').fit(distanceArray)
distanceMedoids = kmedoidsDistance.medoid_indices_
distanceLabels = kmedoidsDistance.labels_
print('Medoid indices:', distanceMedoids)
print('Medoid labels:', distanceLabels)

Medoid indices: [1 3]
Medoid labels: [0 0 0 1 1]


Lets represent in a map.

In [9]:
# Creates the map
f = folium.Figure(width='65%')
m = folium.Map(location=[Latitude.mean(), Longitude.mean()]).add_to(f)

# Having a touch of color.
color = color_palette("husl", n_clustersDistance).as_hex()

# Representing our clustering medoids
for i, elem in enumerate(distanceMedoids):
    folium.Circle(
        location=[Latitude[elem], Longitude[elem]],
        radius=300,
        color=color[i],
        fill=False,
        fill_color=color[i],
    ).add_to(m)

# Representing our clustering output
for i, elem in enumerate(distanceLabels):
    folium.Circle(
        location=[Latitude[i], Longitude[i]],
        radius=100,
        popup=data.Name[i],
        color=color[elem],
        fill=True,
        fill_color=color[elem],
    ).add_to(m)

# Adjust zoom
sw = data[['Latitude', 'Longitude']].min().values.tolist()
ne = data[['Latitude', 'Longitude']].max().values.tolist()
m.fit_bounds([sw, ne]) 

m

## 6. Applying some math
After having used `KMedoids` algorithm and represented in maps, lets create the function `sumDecomposition` for mathematical cluster analysis. It computes $T$, the total sum of distances between objects, i.e.,
$$T = \dfrac{1}{2} \sum_{i=1}^n \sum_{i=1}^n dist(i, j).$$

If $C$ is that function that identifies every object with its cluster, then $T$ can be decomposed in the sum of distances whithin clusters, $W$ and the sum of distances between clusters, $B$, i.e.,
$$T = \dfrac{1}{2} \sum_{i=1}^k \sum_{C(i)=c} \left( \sum_{C(j) = c}dist(i, j) + \sum_{C(j)\neq c} dist(i, j) \right) = W + B,$$
where $k$ is the total number of clusters.

In [10]:
def sumDecomposition(distances, medoids, labels):
    """
    This function calculates the total sum of distances, T, the sum of distances 
    within clusters, W, and the sum of distances between clusters, B.
    Arguments:
        distances: array of distances.
        medoids: list of medoid indices.
        labels: list of clustering labels.
    """
    T = 1/2 * np.sum(distances)
    W = 0
    B = 0
    for cluster in range(0, len(medoids)):
        for i, elem in enumerate(labels):
            if elem == cluster:
                for j, elem2 in enumerate(labels):
                    if elem2 == cluster:
                        W += distances[i, j]
                    else:
                        B += distances[i, j]
    W *= 1/2
    B *= 1/2
    return T, W, B

Some results:

In [11]:
Tduration, Wduration, Bduration = sumDecomposition(durationArray, durationMedoids, durationLabels)
print(f"Unsig traveling time distance, T = {Tduration}, W = {Wduration},\n",
 f"B = {Bduration}, W/T = {Wduration/Tduration}")

Unsig traveling time distance, T = 589.5591666666667, W = 99.13916666666665,
 B = 490.4200000000001, W/T = 0.16815812944982902


In [12]:
Tdistance, Wdistance, Bdistance = sumDecomposition(distanceArray, distanceMedoids, distanceLabels)
print(f"Unsig geodesic distance, T = {Tdistance}, W = {Wdistance},\n",
 f"B = {Bdistance}, W/T = {Wdistance/Tdistance}")

Unsig geodesic distance, T = 45.123849468424055, W = 15.395138140521212,
 B = 29.72871132790284, W/T = 0.34117519497741744
