# Using k-means clustering for LIDs allocation

## Introduction
Previous research states that aggregating LIDs is a good strategy to reduce runoff and Combined Sewer Overflows.
In this document we will exploite the k-means clustering algorithm to test wether aggregation increase the performance of Pervious Pavements to reduce CSO.

### Step 1: Cluster the subcatchments by location
We need the set of coordinates for each subcatchment

In [1]:
#load required packages
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [76]:
#function that creates clusters with the subcatchments
def kmeans_clusters(data,num_clusters):
    #data containing coordinates
    kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(data)
    return kmeans

In [84]:
#Create clusters from PP coordinates
#Load the text file contains the xy coordinates of all the subcatchments.
#We are interested in "Parking" and "SAimp"
#Define the number of clusters, observe that 'clusters_set' is a list, hence we can simultaneously create different number of clusters for testing.
#For illustration, we will only show a 100 clusters

clusters_set=[100]#number of clusters
#'Parking_SAimp_coord.txt'-all Subcatchments
#CSO014_ParkingSAimp_coord.txt- CSO014 ONLY

#Read the file with pandas and reassign the index with the subcatchment name
data_original= pd.read_csv('CSO014_ParkingSAimp_coord.txt',sep='\t') #read the original file
data_original=data_original.set_index('NAME')


#Create clusters and add a column to the original dataset
for i in clusters_set: 
    KMEANS=kmeans_clusters(data_original,i) #object that the function kmeans_clusters returns
    centers=KMEANS.cluster_centers_ #extract the cluster centers
    vector= np.expand_dims(KMEANS.labels_, axis=2) #extract the labels of each point
    data_w_clusters=np.append(data_original,vector,axis=1)

#convert into df again
data_w_clusters= pd.DataFrame(data_w_clusters, columns=['CX','CY','Cluster'], index=data_original.index)



In [82]:
###Print subcatchments with clusters
file1=open("CSO014cluster.dat","w")
data_w_clusters.to_string(file1)
file1.close()

###Print cluster's centroids
df_centers = pd.DataFrame(centers,columns=["X_CENTROID","Y_CENTROID"])
df_centers
file2=open("CSO014clusterCENTERS.dat","w")
df_centers.to_string(file2)
file2.close()