<h1> Coursera Capstone Project - The Battle of Neighborhoods </h1>

<h2> Introduction: Business Problem </h2>

**Cyprus**, a Greek island located in the Eastern part of the Mediterranian sea, has seen a great rise in demand for technology jobs in the city of **Limassol**. This resulted to a great deal of people moving from the island's capital city, **Nicosia**, to **Limassol** in search of a job.

In this project we will try to find an optimal location for a person looking to move to a similar area from one city to another. Specifically, this report will be targeted to stakeholders interested in moving from **Nicosia**,Cyprus to **Limassol**,Cyprus.

Helping people find a place to live in **Limassol** where the urban environment is similar to where they lived in **Nicosia** is beneficial for both the moving workers and their employees because moving to a similar urban environment takes a smaller toll on their productivity.

<h2> Data </h2>

Based on our Business problem the factors that will influence our decisions are:
* Most common venues in the locations

The **district** of Nicosia and **district** of Limassol have a diverse environment that includes coastal areas and mountainious areas. Because of that we decided to use each city's district's municipalities as locations of interest so that the whole range of different environments is captured.

The following data will be needed to extract and generate information:
* Municipalities of each District obtained from a dataset from the **Postal Service of the Republic of Cyprus**
* Number of nearby venues and their category obtained using **Forsquare API**
* Coordinates of each Municipality obtained using **Nominatim API**

<h2>Methodology - Analysis</h2>

The methodology followed and any kind of data analysis done are explained along with the code.

<h3>Municipalities of each District</h3>

A dataset from the Postal Service's <a href='https://www.cypruspost.post/en/find-postal-codes'>website</a> was downloaded and connected to this notebook via connections in Watson Studio.

The following code imports the dataset into our notebook.

In [3]:
import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

if os.environ.get('RUNTIME_ENV_LOCATION_TYPE') == 'external':
    endpoint_171f863b775c42eebc8c320ebd075039 = 'https://s3.eu-geo.objectstorage.softlayer.net'
else:
    endpoint_171f863b775c42eebc8c320ebd075039 = 'https://s3.eu-geo.objectstorage.service.networklayer.com'

client_171f863b775c42eebc8c320ebd075039 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='Sf9l9fSV4u3RXwK2sdorYSpud7ZeB3LItW0QX3MtAwl3',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url=endpoint_171f863b775c42eebc8c320ebd075039)

body = client_171f863b775c42eebc8c320ebd075039.get_object(Bucket='submissionsforapplieddatasciencep-donotdelete-pr-nqvdnwkhuyi9rp',Key='cps.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.head()

Unnamed: 0.1,Unnamed: 0,#,Ονομασία Δρόμου/Κοινότητας,Designation of Street/Community,Όριο Δρόμου ΑΠΟ - ΜΕΧΡΙ/Streetlimit FROM - TO,Ταχυδρομικός Κώδικας / Postal Code,Δήμος / Κοινότητα,Municipality / Community,Επαρχία,District,Ταχυδρομική Εξυπηρέτηση μέσω Ταχυδρομικού Γραφείου/Ταχυδρομικού Πρακτορείου,Postal Service through Post Office/Postal Agency,Description,Περιγραφή
0,0,201,14ης Σεπτεμβρίου,14 Septemvriou,,4760,Όμοδος (το),Omodos,Λεμεσός,Lemesos,Ταχυδρομικό Πρακτορείο Ομόδους - 4760,Omodos Postal Agency - 4760,,
1,1,202,15ης Αυγούστου,15 Avgoustou,,7501,Αβδελλερό,Avdellero,Λάρνακα,Larnaka,Ταχυδρομικό Πρακτορείο Αβδελλερού - 7501,Avdellero Postal Agency - 7501,,
2,2,203,15ης Ιανουαρίου,15 Ianouariou,,2412,Έγκωμη Λευκωσίας,Egkomi Lefkosias,Λευκωσία,Lefkosia,Ταχυδρομικό Γραφείο Έγκωμης Λευκωσίας - 1911,Egkomi Lefkosias Post Office - 1911,,
3,3,204,15ης Ιανουαρίου,15 Ianouariou,,6018,Λάρνακα,Larnaka,Λάρνακα,Larnaka,Επαρχιακό Ταχυδρομικό Γραφείο Λάρνακας - 6900,Larnaka District Post Office - 6900,,
4,4,205,16ης Αυγούστου,16 Avgoustou,,1040,Λευκωσία (Παλλουριώτισσα),Lefkosia (Pallouriotissa),Λευκωσία,Lefkosia,Ταχυδρομικό Γραφείο Παλλουριωτίσσης - 1906,Pallouriotissa Post Office - 1906,,


The dataset contains a lot of unecessary data including postal codes, street names and a duplicate version of each column but in Greek. We ignore any columns containing unecessary data and take only the ones useful to our project.

In [220]:
#dropping unnecessary columns
df = df[['Municipality / Community','District']]
df.head()

Unnamed: 0,Municipality / Community,District
0,Omodos,Lemesos
1,Avdellero,Larnaka
2,Egkomi Lefkosias,Lefkosia
3,Larnaka,Larnaka
4,Lefkosia (Pallouriotissa),Lefkosia


We rename the 'Municipality / Community' column as we are going to ignore any rural communities for our project. 

In [221]:
#Renaming columns
columns={'Municipality / Community' : 'Municipality'}
df.rename(columns=columns, inplace=True)
df.head(1)

Unnamed: 0,Municipality,District
0,Omodos,Lemesos


Now we create two dataframes containing each city's municipalities.

In [222]:
#creating a dataset for each city
nico_df = df.loc[df['District']=='Lefkosia']
lima_df = df.loc[df['District']=='Lemesos']

We drop any duplicate rows.

In [None]:
#drop duplicates
nico_df.drop_duplicates(inplace=True)
nico_df.reset_index(drop=True,inplace=True)
lima_df.drop_duplicates(inplace=True)
lima_df.reset_index(drop=True,inplace=True)

We clean our dataframes from Greek municipality names and any other inconsistencies.

In [None]:
#change the greek names to english names
nico_df['District'].loc[nico_df['District'] == 'Lefkosia'] = 'Nicosia'
lima_df['District'].loc[lima_df['District'] == 'Lemesos'] = 'Limassol'
#for each city there is a municipality that has the name as the city, so we have to change that too
nico_df['Municipality'].loc[nico_df['Municipality'] == 'Lefkosia'] = 'Nicosia'
lima_df['Municipality'].loc[lima_df['Municipality'] == 'Lemesos'] = 'Limassol'
#removing 'Lefkosia' or 'Lefkosias' and also cleaning the municipality names
import re
for i in range(nico_df.shape[0]):
    if re.search('Lefkosias',nico_df['Municipality'][i]):
        nico_df['Municipality'][i] = nico_df['Municipality'][i].replace('Lefkosias','')
    elif re.search('Lefkosia',nico_df['Municipality'][i]):
        nico_df['Municipality'][i] = nico_df['Municipality'][i].replace('Lefkosia','')
    if re.search('\(',nico_df['Municipality'][i]) or re.search('\)',nico_df['Municipality'][i]):
        nico_df['Municipality'][i] = nico_df['Municipality'][i].replace('(','')
        nico_df['Municipality'][i] = nico_df['Municipality'][i].replace(')','')
    nico_df['Municipality'][i] = nico_df['Municipality'][i].strip()

We create a function to get the coordinates for each district's municipalities. Also a numer of municipalities that are not recognised as addresses are dropped as we cannot get their coordinates. This is not necessarily a problem because in this way we get rid of less populated and lesser known areas.

In [228]:
#a funtion to get the coordinates for every municipality

def get_latlong(dataset):
    geolocator = Nominatim(user_agent='ny_explorer')
    temp = {'Latitude' : [], 'Longitude' : []}
    
    for m,d in zip(dataset['Municipality'],dataset['District']):
        try:
            address = '{}, {}'.format(m,d)
            location = geolocator.geocode(address)
            temp['Latitude'].append(location.latitude)
            temp['Longitude'].append(location.longitude)
        except:
            continue
            
    for i in passed:
        ind = dataset.loc[dataset['Municipality'] == i].index[0]
        dataset.drop(ind,axis=0,inplace=True)
    dataset.reset_index(drop=True,inplace=True)
    dataset = pd.concat([dataset,pd.DataFrame(temp)],axis=1)
    
    return dataset

We now use the function we created to get the coordinates of each municipality.

In [None]:
nico_df = get_latlong(nico_df)
lima_df = get_latlong(lima_df)

In [227]:
nico_df.head()

Unnamed: 0,Municipality,District,Latitude,Longitude
0,Egkomi,Nicosia,35.12952,33.292615
1,Pallouriotissa,Nicosia,35.174294,33.379755
2,Strovolos,Nicosia,35.132899,33.345497
3,Geri,Nicosia,35.10685,33.420415
4,Latsia,Nicosia,35.099565,33.381599


In [17]:
lima_df.head()

Unnamed: 0,Municipality,District,Latitude,Longitude
0,Omodos,Limassol,34.848106,32.809067
1,Limassol,Limassol,34.68529,33.033266
2,Souni,Limassol,34.734928,32.88247
3,Koilani,Limassol,34.844486,32.859852
4,Mesa Geitonia,Limassol,34.701862,33.044995


We create a function that will help us easily visualize our data on a map.

In [None]:
#! pip install folium
import folium

In [21]:
def vis_mun(dataset, lat, lng, col='blue'):
    city_map = folium.Map(location=[lat,lng], zoom_start=10)
    for m,d,lt,lg in zip(dataset['Municipality'], dataset['District'], dataset['Latitude'], dataset['Longitude']):
        folium.CircleMarker(
            [lt,lg],
            radius=5,
            popup='{},{}'.format(m,d),
            color=col,
            fill=True,
            fll_color='light {}'.format(col),
            fill_opacity=0.7,
            parse_html=False
        ).add_to(city_map)
    return city_map

Visualizing municipality locations in **Nicosia**.

In [229]:
#visualizing municipalities in Nicosia
latitude = nico_df['Latitude'].loc[nico_df['Municipality'] == 'Nicosia']
longitude = nico_df['Longitude'].loc[nico_df['Municipality'] == 'Nicosia']
vis_mun(nico_df, latitude, longitude)

In [230]:
#visualizing municipalities in Limassol
latitude = lima_df['Latitude'].loc[lima_df['Municipality'] == 'Limassol']
longitude = lima_df['Longitude'].loc[lima_df['Municipality'] == 'Limassol']
vis_mun(lima_df, latitude, longitude)

We created a function that will help us get each municipality's nearby venues and find out what venues are the most common in each municipality. We decided to go with the top 6 most common venues in every municipality because a higher number may be unavailable in the 2km radius of some municipalities.

In [24]:
import requests

In [112]:
#getting venues for each municipality
def get_venues(dataset,limit,num_of_cols):
    c_id = 'NZJJRHRTSR3UZHU5QJF5X53FIGCR2MWPHZDXRNOQH5RJCILM'
    c_sec = 'XCXQS02UIBZEK4SAX3KYPFKF2JQYMBO05I2TAUKIKXETAWKC'
    ver = '20180604'
    
    #create temporal directory
    ends = ['st' , 'nd', 'rd']
    num_of_cols = num_of_cols
    most_com = {}
    for i in range(num_of_cols):
        try:
            most_com['{}{} Most Common'.format(i+1,ends[i])] = []
        except:
            most_com['{}th Most Common'.format(i+1)] = []

    for lat,lng in zip(dataset['Latitude'],dataset['Longitude']):
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}'.format(c_id,c_sec,lat,lng,ver,limit)
        result = requests.get(url).json()
        venues = {'Category' : []}
        
        for j in range(limit):
            try:
                cat = result['response']['groups'][0]['items'][j]['venue']['categories'][0]['name']
                venues['Category'].append(cat)
            except:
                #in case there are not enough venues close
                break
                
        venues = pd.DataFrame(venues)
        
        for j in range(num_of_cols):
            try:
                most_com[list(most_com.keys())[j]].append(venues['Category'].value_counts().index[j])
            except:
                #in case there are no more than 10 categories
                most_com[list(most_com.keys())[j]].append(0)
    
    return pd.DataFrame(most_com)    

We now use our function to create two new Dataframes containing the 6 most common venues in each municipality for both cities.

In [113]:
most_com_nico = get_venues(nico_df,100,6)
most_com_lima = get_venues(lima_df,100,6)

In [231]:
most_com_nico.head()

Unnamed: 0,1st Most Common,2nd Most Common,3rd Most Common,4th Most Common,5th Most Common,6th Most Common
0,Bakery,Coffee Shop,Café,Greek Restaurant,Gym,Supermarket
1,Café,Bar,Greek Restaurant,Coffee Shop,Historic Site,Bakery
2,Coffee Shop,Bakery,Café,Wine Bar,Greek Restaurant,Gym
3,Bakery,Coffee Shop,Greek Restaurant,Supermarket,Park,Café
4,Bakery,Coffee Shop,Supermarket,Café,Greek Restaurant,Park


In [232]:
most_com_lima.head()

Unnamed: 0,1st Most Common,2nd Most Common,3rd Most Common,4th Most Common,5th Most Common,6th Most Common
0,Greek Restaurant,Restaurant,Waterfall,Hotel,Bakery,Café
1,Greek Restaurant,Coffee Shop,Bakery,Cocktail Bar,Bar,Italian Restaurant
2,Historic Site,Greek Restaurant,Mediterranean Restaurant,Beach,Food,Supermarket
3,Greek Restaurant,Restaurant,Hotel,Waterfall,Café,Bakery
4,Bakery,Greek Restaurant,Coffee Shop,Cocktail Bar,Supermarket,Bar


In order to be able to use this data we must one hot encode each position (1st,2nd,3rd...) with the venues' categories. A problem that may arise is that a certain category may have never appeared in a certain position e.g. Cocktail Bars may have never been the most common venue in any municipality, so one hot encoding that column using just the pandas library get_dummies() method will not be enough.

To get over this problem we create a function that takes as parameters all the unique categories in our datasets and the dataset containing the 6 most common venues for each municipality and returns a one hot encoded dataset with all of the categories one hot encoded for all positions.

In [114]:
def one_hot(dataset,cats):
    t_dic = {}

    for i in range(len(dataset.columns)):
        t_list = dataset[dataset.columns[i]].unique()
        t_dic[dataset.columns[i]] = []
        if len(t_list) < len(cats):
            for j in cats:
                if j not in t_list:
                    t_dic[dataset.columns[i]].append(j)
                else:
                    continue
        else:
            break
    #make all lists in t_dic the same size by adding 0s
    max_length = max([len(t_dic[i]) for i in t_dic.keys()])
    for i in t_dic.keys():
        if len(t_dic[i]) < max_length:
            for j in range(max_length-len(t_dic[i])):
                t_dic[i].append(0)
        else:
            continue
            
    t_dataset = pd.concat([pd.DataFrame(t_dic),dataset],axis=0)
    t_dataset = pd.get_dummies(t_dataset[t_dataset.columns])
    t_dataset.reset_index(drop=True,inplace=True)
    t_dataset.drop([i for i in range(max_length)],inplace=True)
    t_dataset.reset_index(drop=True,inplace=True)
    return t_dataset

We get all unique categories that appear in every position of most common venues from the municipalities of both cities and use our function to get the encoded data.

In [233]:
unique_cats = []
for i in most_com_nico.columns:
    unique_cats.extend([j for j in list(most_com_nico[i].unique()) if j not in unique_cats])
for i in most_com_lima.columns:
    unique_cats.extend([j for j in list(most_com_lima[i].unique()) if j not in unique_cats])
unique_cats.remove(0)

In [130]:
one_hot_nico = one_hot(most_com_nico,unique_cats)
one_hot_lima = one_hot(most_com_lima,unique_cats)

In [131]:
one_hot_nico.head()

Unnamed: 0,1st Most Common_Athletics & Sports,1st Most Common_Auto Garage,1st Most Common_Bakery,1st Most Common_Bar,1st Most Common_Beach,1st Most Common_Bistro,1st Most Common_Border Crossing,1st Most Common_Botanical Garden,1st Most Common_Breakfast Spot,1st Most Common_Brewery,...,6th Most Common_Trail,6th Most Common_Turkish Home Cooking Restaurant,6th Most Common_Turkish Restaurant,6th Most Common_Village,6th Most Common_Vineyard,6th Most Common_Waterfall,6th Most Common_Waterfront,6th Most Common_Wine Bar,6th Most Common_Winery,6th Most Common_Zoo
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [132]:
one_hot_lima.head()

Unnamed: 0,1st Most Common_Athletics & Sports,1st Most Common_Auto Garage,1st Most Common_Bakery,1st Most Common_Bar,1st Most Common_Beach,1st Most Common_Bistro,1st Most Common_Border Crossing,1st Most Common_Botanical Garden,1st Most Common_Breakfast Spot,1st Most Common_Brewery,...,6th Most Common_Trail,6th Most Common_Turkish Home Cooking Restaurant,6th Most Common_Turkish Restaurant,6th Most Common_Village,6th Most Common_Vineyard,6th Most Common_Waterfall,6th Most Common_Waterfront,6th Most Common_Wine Bar,6th Most Common_Winery,6th Most Common_Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We have decided to cluster Nicosia's municipalities using the k-means algorithm. The generated labels' use along with our one hot encoded data's use will be explained later in the notebook.

In [None]:
#I am going to cluster Nicosia's Municipalities using the k-means algorithm
from sklearn.cluster import KMeans
num = 4
kmeans = KMeans(init='k-means++',n_clusters=num,n_init=12)
kmeans.fit(one_hot_nico)
kmeans.labels_

A dataframe showing all of the data we obtained for each municipality in the district of Nicosia.

In [153]:
label_df = pd.DataFrame(kmeans.labels_)
label_df.rename(columns={label_df.columns[0] : 'Cluster No.'}, inplace=True)
nicosia_df = pd.concat([pd.concat([nico_df,most_com_nico],axis=1),label_df],axis=1)
nicosia_df.head()

Unnamed: 0,Municipality,District,Latitude,Longitude,1st Most Common,2nd Most Common,3rd Most Common,4th Most Common,5th Most Common,6th Most Common,Cluster No.
0,Egkomi,Nicosia,35.12952,33.292615,Bakery,Coffee Shop,Café,Greek Restaurant,Gym,Supermarket,1
1,Pallouriotissa,Nicosia,35.174294,33.379755,Café,Bar,Greek Restaurant,Coffee Shop,Historic Site,Bakery,2
2,Strovolos,Nicosia,35.132899,33.345497,Coffee Shop,Bakery,Café,Wine Bar,Greek Restaurant,Gym,3
3,Geri,Nicosia,35.10685,33.420415,Bakery,Coffee Shop,Greek Restaurant,Supermarket,Park,Café,1
4,Latsia,Nicosia,35.099565,33.381599,Bakery,Coffee Shop,Supermarket,Café,Greek Restaurant,Park,1


We create a function that will help us visualize the municipality clusters. 

In [154]:
def vis_mun_cluster(dataset, lat, lng):
    city_map = folium.Map(location=[lat,lng], zoom_start=10)
    cols = {0 : 'purple', 1 : 'blue', 2 : 'yellow', 3 : 'red', 4 : 'green'}
    for m,d,lt,lg,col in zip(dataset['Municipality'], dataset['District'], dataset['Latitude'], dataset['Longitude'],dataset['Cluster No.']):
        folium.CircleMarker(
            [lt,lg],
            radius=5,
            popup='{},{}'.format(m,d),
            color=cols[col],
            fill=True,
            fll_color='light {}'.format(cols[col]),
            fill_opacity=0.7,
            parse_html=False
        ).add_to(city_map)
    return city_map

We use the function we created to visualize the clusters in Nicosia district.

In [155]:
latitude = nicosia_df['Latitude'].loc[nicosia_df['Municipality'] == 'Nicosia']
longitude = nicosia_df['Longitude'].loc[nicosia_df['Municipality'] == 'Nicosia']
vis_mun_cluster(nicosia_df, latitude, longitude)

By analysing the map above we can roughly determinate what each cluster represents:
* **Purple**: tend to be on more mountainious land.
* **Yellow**: tend to be following highway roads.
* **Blue and Red**: both tend to be on more urban areas abd seem to be subclusters of the same parent cluster, their main difference being that red points tend to be closer to bodies of water. 

Now that the categories (Purple,Yellow,Blue,Red) were generated by the kmeans algorithm, we are going to use the k-Nearest Neighbor classification algorithm to classify each municipality of the district of Limassol to one of those categories.

We are going to use Nicosia's one hot encoded data as the independent variables that the model will be trained with and the labels generated by our k-means algorithm as the dependent variable that the model will be trained with.

In [184]:
x_train = one_hot_nico.values
y_train = kmeans.labels_

We normalize our Nicosia one hot encoded data.

In [185]:
from sklearn import preprocessing
x_train = preprocessing.StandardScaler().fit(x_train).transform(x_train.astype(float))

We build and train our model. We have decided to set the k neighbors to 1 as any higher value reults to an overfitted model.

In [210]:
from sklearn.neighbors import KNeighborsClassifier
k = 1
kneigh = KNeighborsClassifier(n_neighbors = k).fit(x_train,y_train)

We make prediction using the Limassol one hot encoded data as input.

In [211]:
x_test = preprocessing.StandardScaler().fit(one_hot_lima.values).transform(one_hot_lima.values.astype(float))

In [212]:
pred = kneigh.predict(x_test)
pred

array([2, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1,
       2, 2, 1, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 0, 3, 2, 3,
       2, 2, 1, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 2,
       0, 2, 0, 2, 0, 1, 2, 3, 0, 2, 0, 0, 0, 2, 2, 0, 2, 2, 2, 2, 2, 2,
       0, 2, 2, 0, 3, 2, 3, 2, 2, 2, 2, 0, 2, 2, 2, 1, 2, 0], dtype=int32)

A dataframe containing all information we have gathered for each municipality in Limassol.

In [213]:
predicted_labels = pd.DataFrame(pred)
predicted_labels.rename(columns={predicted_labels.columns[0] : 'Cluster No.'}, inplace=True)
limassol_df = pd.concat([pd.concat([lima_df,most_com_lima],axis=1),predicted_labels],axis=1)
limassol_df.head()

Unnamed: 0,Municipality,District,Latitude,Longitude,1st Most Common,2nd Most Common,3rd Most Common,4th Most Common,5th Most Common,6th Most Common,Cluster No.
0,Omodos,Limassol,34.848106,32.809067,Greek Restaurant,Restaurant,Waterfall,Hotel,Bakery,Café,2
1,Limassol,Limassol,34.68529,33.033266,Greek Restaurant,Coffee Shop,Bakery,Cocktail Bar,Bar,Italian Restaurant,3
2,Souni,Limassol,34.734928,32.88247,Historic Site,Greek Restaurant,Mediterranean Restaurant,Beach,Food,Supermarket,2
3,Koilani,Limassol,34.844486,32.859852,Greek Restaurant,Restaurant,Hotel,Waterfall,Café,Bakery,2
4,Mesa Geitonia,Limassol,34.701862,33.044995,Bakery,Greek Restaurant,Coffee Shop,Cocktail Bar,Supermarket,Bar,2


We use our previously created function to visualize clustered municipalities in Limassol.

In [214]:
latitude = limassol_df['Latitude'].loc[limassol_df['Municipality'] == 'Limassol']
longitude = limassol_df['Longitude'].loc[limassol_df['Municipality'] == 'Limassol']
vis_mun_cluster(limassol_df, latitude, longitude)

We observe that our previously made observations still hold up.

<h1>Results and Discussion</h1>

We have clustered our data on Nicosia municipalities using a k-means clustering algorithm, and then the municipalities of Limassol were classified to each cluster by using a k-nearest neighbor algorithm.

Our final results represent which municipalities in Limassol are better to move to for someone living in the District of Nicosia, based on in what municipality of Nicosia they live.

While the results are clear enough to reach to conclusions, more factors could be added for even better and more specified results.
These factors may include:
   * Distance between each municipality and bodies of water.
   * Distance between each municipality and each venue.
   * Type of ground.
   
Also better results could be reached if outliers were handled better.

<h1> Conclusion </h1>

This project's aim was to find similar environments between areas in the districts of Nicosia and Limassol. We have done this by clustering municipalities in the district of Nicosia based on the most common venues nearby each municipality, and then by classifying each municipality in the district of Limassol to the generated clusters based on the same criterion.

Final decision on best area to move to will be made by the stakeholders based on specific characteristics of municipalities in every reccomended one.