<a href="https://colab.research.google.com/github/cmeneses1/GeokMedoidsCalculator/blob/main/k_Medoids_Calculator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# k-Medoids Calculator
-----------

Install required package.

In [3]:
!pip install scikit-learn-extra

Collecting scikit-learn-extra
  Downloading scikit_learn_extra-0.2.0-cp37-cp37m-manylinux2010_x86_64.whl (1.7 MB)
[?25l[K     |▏                               | 10 kB 26.8 MB/s eta 0:00:01[K     |▍                               | 20 kB 31.9 MB/s eta 0:00:01[K     |▋                               | 30 kB 23.1 MB/s eta 0:00:01[K     |▊                               | 40 kB 19.0 MB/s eta 0:00:01[K     |█                               | 51 kB 8.2 MB/s eta 0:00:01[K     |█▏                              | 61 kB 9.4 MB/s eta 0:00:01[K     |█▍                              | 71 kB 9.0 MB/s eta 0:00:01[K     |█▌                              | 81 kB 9.9 MB/s eta 0:00:01[K     |█▊                              | 92 kB 10.6 MB/s eta 0:00:01[K     |██                              | 102 kB 8.0 MB/s eta 0:00:01[K     |██                              | 112 kB 8.0 MB/s eta 0:00:01[K     |██▎                             | 122 kB 8.0 MB/s eta 0:00:01[K     |██▌                   

Import required packages.

In [4]:
from geopy import distance
from sklearn_extra.cluster import KMedoids
import folium
import pandas as pd
import numpy as np
import requests
import json

Loading a test data file with three columns: `Name`, `Latitude` and `Longitude`.

In [5]:
%%file test.txt
Name	Latitude	Longitude
Is	43.28503	-6.75724
Beveraso	43.2781441	-6.7986945
San Emiliano	43.2574581	-6.8336809
Riodecoba	43.3073637	-6.7926056
Herías	43.3154383	-6.805375

Writing test.txt


In [6]:
name = 'test.txt'
data = pd.read_table(name)
data

Unnamed: 0,Name,Latitude,Longitude
0,Is,43.28503,-6.75724
1,Beveraso,43.278144,-6.798695
2,San Emiliano,43.257458,-6.833681
3,Riodecoba,43.307364,-6.792606
4,Herías,43.315438,-6.805375


Now, we calculate metric matrices. There are two of them: one for geodesical distance, called `distancesMatrix`. And another for traveling time distance

In [53]:
# Longitude and Latitude vectors
Longitude = data.Longitude
Latitude = data.Latitude

# Number of locations and ID list and 
n = len(Longitude)
id = np.array(list(zip(range(0,n), range(0, n))))

# Creating a convenient string for OSRM service.
lonLatString = ''
for i in range(0, n):
    lon = Longitude[i]
    lat = Latitude[i]
    lonLatString += str(lon) + ',' + str(lat) + ';'

lonLatString = lonLatString[0:-1]

# call the OSMR API
osmrString = "http://router.project-osrm.org/table/v1/driving/" + lonLatString 
r = requests.get(osmrString)

# Extracting driving time duration Matrix, in minutes, and making it symmetrical
durationsMatrix = 1/60 * np.array(json.loads(r.content)['durations'])
durationsMatrix = 1/2 * (durationsMatrix + np.transpose(durationsMatrix))

# Creating a matrix of geodesic distances
distancesMatrix = np.zeros((n,n))

# Calculating distances from geopy function `distance`in kilometres
for i in range(0, n):
    t1 = (Latitude[i], Longitude[i])

    for j in range(0, n):
        t2 = (Latitude[j], Longitude[j])

        if i < j:
            distancesMatrix[i, j] = distance.distance(t1, t2).km
            distancesMatrix[j, i] = distancesMatrix[i, j]

In [None]:
def sumDecomposition(x, m, l):
    W = 0
    B = 0
    for cluster in range(0, len(m)):
        for i, elem in enumerate(l):
            if elem == cluster:
                for j, elem2 in enumerate(l):
                    if elem2 == cluster:
                        W += x[i, j]
                    else:
                        B += x[i, j]
    return (1/2 * np.sum(x), 0.5 * W, 0.5 * B)

In [54]:
kmedoidsDuration = KMedoids(n_clusters=2, metric='precomputed').fit(durationsMatrix)
print(kmedoidsDuration.medoid_indices_)
print(kmedoidsDuration.labels_)
print(kmedoidsDuration.inertia_)

[0 4]
[0 0 1 1 1]
75.77666666666667


In [55]:
kmedoidsDistance = KMedoids(n_clusters=2, metric='precomputed').fit(distancesMatrix)
print(kmedoidsDistance.medoid_indices_)
print(kmedoidsDistance.labels_)
print(kmedoidsDistance.inertia_)

[1 3]
[0 0 0 1 1]
8.47471369811256


In [56]:
medoids = kmedoidsDuration.medoid_indices_
labels = kmedoidsDuration.labels_
inertia = kmedoidsDuration.inertia_

In [57]:



T = totalDistance(durationsMatrix)
W, B = sumDecomposition(durationsMatrix, medoids, labels)

In [58]:
T-W-B

-5.684341886080802e-14

In [59]:
W

99.13916666666665

In [50]:
medoids2 = kmedoidsDistance.medoid_indices_
labels2 = kmedoidsDistance.labels_
inertia2 = kmedoidsDistance.inertia_
print(inertia2)

T = totalDistance(distancesMatrix)
W, B = sumDecomposition(distancesMatrix, medoids2, labels2)

8.47471369811256


In [51]:
T-W-B

3.552713678800501e-15

In [52]:
W

15.395138140521212

T - W - B

[Folium](http://python-visualization.github.io/folium/quickstart.html)

In [12]:
m = folium.Map(location=[45.5236, -122.6750])
m

In [13]:
m = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")

tooltip = "Click me!"

folium.Marker(
    [45.3288, -121.6625], popup="<i>Mt. Hood Meadows</i>", tooltip=tooltip
).add_to(m)
folium.Marker(
    [45.3311, -121.7113], popup="<b>Timberline Lodge</b>", tooltip=tooltip
).add_to(m)

m

In [14]:
m = folium.Map(location=[45.372, -121.6972], zoom_start=12, tiles="Stamen Terrain")

folium.Marker(
    location=[45.3288, -121.6625],
    popup="Mt. Hood Meadows",
    icon=folium.Icon(icon="cloud"),
).add_to(m)

folium.Marker(
    location=[45.3311, -121.7113],
    popup="Timberline Lodge",
    icon=folium.Icon(color="green"),
).add_to(m)

folium.Marker(
    location=[45.3300, -121.6823],
    popup="Some Other Location",
    icon=folium.Icon(color="red", icon="info-sign"),
).add_to(m)


m

In [15]:
m = folium.Map(location=[45.5236, -122.6750], tiles="Stamen Toner", zoom_start=13)

folium.Circle(
    radius=100,
    location=[45.5244, -122.6699],
    popup="The Waterfront",
    color="crimson",
    fill=False,
).add_to(m)

folium.CircleMarker(
    location=[45.5215, -122.6261],
    radius=50,
    popup="Laurelhurst Park",
    color="#3186cc",
    fill=True,
    fill_color="#3186cc",
).add_to(m)


m