## Introduction: Business Problem

In this project, we will try to find a neighborhood that is most suitable to open a gym. Specifically, this report will be targeted to stakeholders interested in opening a gym in the city of Toronto, Ontario.

Firstly we need to bring one very important fact, the person needs to physically capable of workout in a gym. That means ability to attend workout in a gym must be forbidden for younger than 14 years old and older than 55 years old.  Populations that will be allowed to attend workout in a gym should be older than 14 years and younger than 55 years. Also, it is very important to include potential customers in the project. So we will pay attention to the population under 14 years old, and defined them as potential customers.

We will use our data science powers to generate a few most promising neighborhoods based on these criteria. The advantages of each area will then be clearly expressed so that the best possible final location can be chosen by stakeholders.

## Data

Based on the definition of our problem, factors that will influence our decision are:
<ol>
    <li>Children (0-14 years);</li>
    <li>Youth (15-24 years);</li>
    <li>Working Age (25-54 years);</li>
   <li>Pre-retirement (55-64 years);</li>
   <li>Seniors (65+ years);</li>
   <li>Older Seniors (85+ years);</li>
</ol>

We decided to use a regularly spaced grid of locations to define our neighborhoods.

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import pandas as pd
import numpy as np

!conda install -c conda-forge lxml --yes # uncomment this line if you haven't completed download
from lxml import etree

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed download
import geopy.geocoders as gc

import requests

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed download
import folium 

import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
import json
!wget --quiet https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/world_countries.json -O world_countries.json
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from sklearn.cluster import KMeans

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    libxslt-1.1.33             |       h7d1a2b0_0         426 KB
    lxml-3.8.0                 |           py36_0         3.8 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.5 MB

The following NEW packages will be INSTALLED:

  libxslt            pkgs/main/linux-64::libxslt-1.1.33-h7d1a2b0_0
  lxml               conda-forge/linux-64::lxml-3.8.0-py36_0

The following packages will be UPDATED:

  ca-c

Necessary data about neighborhoods is found on the website https://open.toronto.ca

#### Read tables from csv file: "neighbourhood-profiles-2016-csv.csv".

In [2]:
# read data from csv file
df = pd.read_csv("neighbourhood-profiles-2016-csv.csv")

# set label 'Category' as an index and include only categories which are important for research
df = df.set_index(['Category'])
df = df.loc[df.index.isin(['Population','Families, households and marital status'])]

# reset an index
df = df.reset_index(level = 'Category')

# incude only Topic where topic is not Family characteristics of adults
df = df[df['Topic'] != 'Family characteristics of adults']

In [3]:
# drop columns which are not important for research
df.drop(columns=['Topic','Data Source','Category','_id'], axis =1, inplace= True)

In [4]:
df = df.set_index(['Characteristic'])
df_transposed = df.transpose()

We have performed data cleaning because the original file contains data that is not relevant and does not help us to make the final decision about which a neighborhood is the most suitable for new gym.

### Neighborhood Geographical data

Let's create latitude & longitude coordinates for Circle Marker of our candidate neighborhoods. We will create a Circle of cells that will represent the center of the neighborhood.

Read Geographical data from Neighbourhoods.geojson

In [5]:
with open('Neighbourhoods.geojson') as json_data:
    neighborhood = json.load(json_data)

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [6]:
# from geojson file read only features
neighborhood_data = neighborhood['features']

Tranform the data into a *pandas* dataframe

In [7]:
# create new data frame
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

Then let's loop through the data and fill the dataframe one row at a time.

In [8]:
for data in neighborhood_data:
    count = 0
    name = ''.join(e for e in data['properties']['AREA_NAME'] if e.isalnum())
    neighborhood_name = ''.join([i for i in name if not i.isdigit()])
    name_2 =''
    for i in neighborhood_name: 
        if i.isupper():
            count += 1
            if count > 1:
                i = ' ' + i
        name_2 = name_2 + i
    neighborhood_lat = data['properties']['LATITUDE']
    neighborhood_lon = data['properties']['LONGITUDE']
    neighborhoods = neighborhoods.append({'Neighborhood': name_2,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Eliminate special characters and numbers from neighborhood's name

In [9]:
for indexs in df_transposed.index:
    name_2 =''
    count = 0
    name = ''.join(e for e in indexs if e.isalnum())
    for i in name: 
        if i.isupper():
            count += 1
            if count > 1:
                i = ' ' + i
        name_2 = name_2 + i
    df_transposed.rename(index={indexs:name_2}, inplace = True)

In [10]:
neighborhoods = neighborhoods.set_index(['Neighborhood'])

Let's create a new dataframe that includes geographical data and research data for each neighborhood.

In [11]:
result = pd.concat([neighborhoods, df_transposed], axis=1, sort=False)

In [12]:
result.head()  # check the last columns!

Unnamed: 0,Latitude,Longitude,"Population, 2016","Population, 2011",Population Change 2011-2016,Total private dwellings,Private dwellings occupied by usual residents,Population density per square kilometre,Land area in square kilometres,Children (0-14 years),...,Non-census-family households,3 or more children,Persons not in census families in private households,Private households by household type,One-census-family households,Without children in a census family,With children in a census family,Multiple-census-family households,One-person households,Two-or-more person non-census-family households
Wychwood,43.676919,-79.425515,14349,13986,2.60%,6185,5887,8541,1.68,1860,...,2510,65,3460,5885,3285,1245,2040,95,2075,435
Yonge Eglinton,43.704689,-79.40359,11817,10578,11.70%,6103,5676,7162,1.65,1800,...,2645,35,3120,5680,3000,1250,1750,30,2365,280
Yonge St Clair,43.687859,-79.397871,12528,11652,7.50%,7475,7012,10708,1.17,1210,...,3865,10,4430,7010,3130,1830,1295,15,3465,395
York University Heights,43.765736,-79.488883,27593,27713,-0.40%,11051,10170,2086,13.23,4045,...,3640,305,6880,10170,6090,1675,4430,445,2665,975
Yorkdale Glen Park,43.714672,-79.457108,14804,14687,0.80%,5847,5344,2451,6.04,1960,...,1625,140,2535,5345,3510,1065,2440,205,1355,275


We are interested only in data that brings information about the age structure of population in Toronto neighborhood.

In [13]:
df_new = result[['Latitude', 'Longitude', 'Population Change 2011-2016','Population density per square kilometre','Children (0-14 years)','Youth (15-24 years)','Working Age (25-54 years)','Pre-retirement (55-64 years)','Seniors (65+ years)','Older Seniors (85+ years)']].copy()

In [14]:
# drop NaN values from dataframe
df_new = df_new.dropna()

In [15]:
#reset index of dataframe
df_new.index.name = 'Neighborhoods'
df_new = df_new.reset_index(level = 'Neighborhoods')

#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>to_explorer</em>, as shown below.

In [16]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [17]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
for lat, lng, neighborhood in zip(df_new['Latitude'], df_new['Longitude'],df_new['Neighborhoods']):
    label = '{}'.format(df_new['Neighborhoods'])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

## Methodology

In this project, we will direct our efforts on detecting areas of Toronto that have.

In first step we have collected the required data: geographical location and age structure of Toronto neighborhoods. We had to clear data and make it more suitable for future research. 

The second step in our analysis will be to create new data frames. We will perform three independent k-means clustering calculations.
The first calculation will include a data frame with data that include the age structure of two age groups(Children (0-14 years), Youth (15-24 years)).
The second calculation will include a data frame with data that include the age structure of two age groups(Working Age (25-54 years), Pre-retirement (55-64 years)).
The second calculation will include a data frame with data that include the age structure of two age groups(Seniors (65+ years), Older Seniors (85+ years)).    

In the third, we will focus on the results of k-means clustering calculations. By examinations of the created cluster, we will define potential neighborhoods that are most suitable for new gym.

In [18]:
# number of clusters
kclusters = 4

# remove % from values in Population Change 2011-2016 column
df_new['Population Change 2011-2016'] = df_new['Population Change 2011-2016'].str.replace(r'%', '')

# remove special character from values in columns that contains values about age strucuture of population
for column in df_new.columns: 
    if column in ['Children (0-14 years)','Youth (15-24 years)','Working Age (25-54 years)','Pre-retirement (55-64 years)','Seniors (65+ years)', 'Older Seniors (85+ years)']:
        df_new[column] = df_new[column].str.replace(r',', '')

# covert to float all columns that contains values about age strucuture of population
df_new = df_new.astype({'Children (0-14 years)':'float','Youth (15-24 years)':'float','Working Age (25-54 years)':'float','Pre-retirement (55-64 years)':'float','Seniors (65+ years)':'float', 'Older Seniors (85+ years)':'float'})

# create new data frame for k-means clustering calculations
df_new_clustering = df_new.drop(columns =['Neighborhoods','Latitude','Longitude','Population density per square kilometre','Population Change 2011-2016'])

# normalize values in columns by dividing value by sum of values in column
for column in df_new_clustering.columns:
    total = df_new_clustering[column].sum()
    df_new_clustering[column] = df_new_clustering[column].div(total)

In [19]:
# first data frame
df_1 = df_new_clustering[['Children (0-14 years)','Youth (15-24 years)']].copy()

# second data frame
df_2 = df_new_clustering[['Working Age (25-54 years)','Pre-retirement (55-64 years)']].copy()

# third data frame
df_3 = df_new_clustering[['Seniors (65+ years)', 'Older Seniors (85+ years)']].copy()

In [20]:
# run k-means clustering for each data frame
kmeans_1 = KMeans(n_clusters=kclusters, random_state=0).fit(df_1)
kmeans_2 = KMeans(n_clusters=kclusters, random_state=0).fit(df_2)
kmeans_3 = KMeans(n_clusters=kclusters, random_state=0).fit(df_3)

In [21]:
# insert values of calcuation in origianl data frame
df_new.insert(4, 'Cluster Labels 1', kmeans_1.labels_)
df_new.insert(5, 'Cluster Labels 2', kmeans_2.labels_)
df_new.insert(6, 'Cluster Labels 3', kmeans_3.labels_)


### Map of clusters from the calculation of first data frame Children (0-14 years) and Youth (15-24 years)

In [22]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_new['Latitude'], df_new['Longitude'], df_new['Neighborhoods'], df_new['Cluster Labels 1']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Map of clusters from the calculation of second data frame Working Age (25-54 years) and Pre-retirement (55-64 years)

In [23]:
markers_colors = []
for lat, lon, poi, cluster in zip(df_new['Latitude'], df_new['Longitude'], df_new['Neighborhoods'], df_new['Cluster Labels 2']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

### Map of clusters from the calculation of second data frame Seniors (65+ years) and Older Seniors (85+ years)

In [24]:
markers_colors = []
for lat, lon, poi, cluster in zip(df_new['Latitude'], df_new['Longitude'], df_new['Neighborhoods'], df_new['Cluster Labels 3']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

## Analysis

### The first data frame Children (0-14 years) and Youth (15-24 years)

In [59]:
final_neighborhoods = set()

df_1 = df_new.loc[df_new['Cluster Labels 1'] == 0, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_2 = df_new.loc[df_new['Cluster Labels 1'] == 1, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_3 = df_new.loc[df_new['Cluster Labels 1'] == 2, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_4 = df_new.loc[df_new['Cluster Labels 1'] == 3, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]

print('==============================================================')
print("The first cluster\nmean number of Children (0-14 years): {}\nmean number of Youth (15-24 years): {}\n".format(df_1['Children (0-14 years)'].mean(),df_1['Youth (15-24 years)'].mean()))
print("The second cluster\nmean number of Children (0-14 years): {}\nmean number of Youth (15-24 years): {}\n".format(df_2['Children (0-14 years)'].mean(),df_2['Youth (15-24 years)'].mean()))
print("The third cluster\nmean number of Children (0-14 years): {}\nmean number of Youth (15-24 years): {}\n".format(df_3['Children (0-14 years)'].mean(),df_3['Youth (15-24 years)'].mean()))
print("The fourth cluster\nmean number of Children (0-14 years): {}\nmean number of Youth (15-24 years): {}\n".format(df_4['Children (0-14 years)'].mean(),df_4['Youth (15-24 years)'].mean()))
print('==============================================================\n')

for children, youth, neighborhood in zip(df_3['Children (0-14 years)'],df_3['Youth (15-24 years)'],df_3['Neighborhoods'] ):
    if children > df_3['Children (0-14 years)'].mean() or youth > df_3['Youth (15-24 years)'].mean():
        final_neighborhoods.add(neighborhood)

The first cluster
mean number of Children (0-14 years): 1654.5689655172414
mean number of Youth (15-24 years): 1361.2068965517242

The second cluster
mean number of Children (0-14 years): 4353.846153846154
mean number of Youth (15-24 years): 3887.8846153846152

The third cluster
mean number of Children (0-14 years): 6896.428571428572
mean number of Youth (15-24 years): 6675.714285714285

The fourth cluster
mean number of Children (0-14 years): 2896.6666666666665
mean number of Youth (15-24 years): 2336.4583333333335




### The second data frame Working Age (25-54 years) and Pre-retirement (55-64 years)

In [52]:
df_1 = df_new.loc[df_new['Cluster Labels 2'] == 0, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_2 = df_new.loc[df_new['Cluster Labels 2'] == 1, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_3 = df_new.loc[df_new['Cluster Labels 2'] == 2, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_4 = df_new.loc[df_new['Cluster Labels 2'] == 3, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]

print('==============================================================')
print("The first cluster\nmean number of Working Age (25-54 years): {}\nmean number of Pre-retirement (55-64 years): {}\n".format(df_1['Working Age (25-54 years)'].mean(),df_1['Pre-retirement (55-64 years)'].mean()))
print("The second cluster\nmean number of Working Age (25-54 years): {}\nmean number of Pre-retirement (55-64 years): {}\n".format(df_2['Working Age (25-54 years)'].mean(),df_2['Pre-retirement (55-64 years)'].mean()))
print("The third cluster\nmean number of Working Age (25-54 years): {}\nmean number of Pre-retirement (55-64 years): {}\n".format(df_3['Working Age (25-54 years)'].mean(),df_3['Pre-retirement (55-64 years)'].mean()))
print("The fourth cluster\nmean number of Working Age (25-54 years): {}\nmean number of Pre-retirement (55-64 years): {}\n".format(df_4['Working Age (25-54 years)'].mean(),df_4['Pre-retirement (55-64 years)'].mean()))
print('==============================================================\n')


for children, youth, neighborhood in zip(df_3['Working Age (25-54 years)'],df_3['Pre-retirement (55-64 years)'],df_3['Neighborhoods'] ):
    if children > df_3['Working Age (25-54 years)'].mean() or youth > df_3['Pre-retirement (55-64 years)'].mean():
        final_neighborhoods.add(neighborhood)

for children, youth, neighborhood in zip(df_4['Working Age (25-54 years)'],df_4['Pre-retirement (55-64 years)'],df_4['Neighborhoods'] ):
    if children > df_4['Working Age (25-54 years)'].mean() or youth > df_4['Pre-retirement (55-64 years)'].mean():
        final_neighborhoods.add(neighborhood)

The first cluster
mean number of Working Age (25-54 years): 10778.636363636364
mean number of Pre-retirement (55-64 years): 3092.159090909091

The second cluster
mean number of Working Age (25-54 years): 5776.158536585366
mean number of Pre-retirement (55-64 years): 1669.4512195121952

The third cluster
mean number of Working Age (25-54 years): 45105.0
mean number of Pre-retirement (55-64 years): 4680.0

The fourth cluster
mean number of Working Age (25-54 years): 19285.833333333332
mean number of Pre-retirement (55-64 years): 4798.333333333333




### The third data frame Seniors (65+ years) and Older Seniors (85+ years)

In [53]:
df_1 = df_new.loc[df_new['Cluster Labels 3'] == 0, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_2 = df_new.loc[df_new['Cluster Labels 3'] == 1, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_3 = df_new.loc[df_new['Cluster Labels 3'] == 2, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]
df_4 = df_new.loc[df_new['Cluster Labels 3'] == 3, df_new.columns[[0] + list(range(8, df_new.shape[1]))]]


print('==============================================================')
print("The first cluster\nmean number of Seniors (65+ years): {}\nmean number of Older Seniors (85+ years): {}\n".format(df_1['Seniors (65+ years)'].mean(),df_1['Older Seniors (85+ years)'].mean()))
print("The second cluster\nmean number of Seniors (65+ years): {}\nmean number of Older Seniors (85+ years): {}\n".format(df_2['Seniors (65+ years)'].mean(),df_2['Older Seniors (85+ years)'].mean()))
print("The third cluster\nmean number of Seniors (65+ years): {}\nmean number of Older Seniors (85+ years): {}\n".format(df_3['Seniors (65+ years)'].mean(),df_3['Older Seniors (85+ years)'].mean()))
print("The fourth cluster\nmean number of Seniors (65+ years): {}\nmean number of Older Seniors (85+ years): {}\n".format(df_4['Seniors (65+ years)'].mean(),df_4['Older Seniors (85+ years)'].mean()))
print('==============================================================\n')

for children, youth, neighborhood in zip(df_1['Seniors (65+ years)'],df_1['Older Seniors (85+ years)'],df_1['Neighborhoods'] ):
    if children > df_1['Seniors (65+ years)'].mean() or youth > df_1['Older Seniors (85+ years)'].mean():
        final_neighborhoods.add(neighborhood)

The first cluster
mean number of Seniors (65+ years): 1992.4683544303798
mean number of Older Seniors (85+ years): 267.65822784810126

The second cluster
mean number of Seniors (65+ years): 5177.941176470588
mean number of Older Seniors (85+ years): 879.1176470588235

The third cluster
mean number of Seniors (65+ years): 7508.0
mean number of Older Seniors (85+ years): 1363.0

The fourth cluster
mean number of Seniors (65+ years): 3749.0789473684213
mean number of Older Seniors (85+ years): 602.3684210526316




## Results and Discussion

In [75]:
potentially_suitable_neighborhood = set()

for index, data in df_new.iterrows():
    if data['Neighborhoods'] in final_neighborhoods:
        if data['Cluster Labels 1'] == 2 and (data['Cluster Labels 2'] == 2 or data['Cluster Labels 2'] == 3):
            if data['Cluster Labels 3'] == 0:
                print("The Most suitable neighborhood is:" + str(data['Neighborhoods']))
                potentially_suitable_neighborhood.add(str(data['Neighborhoods']))
            else:
                print("Potentially suitable neighborhood is:" + str(data['Neighborhoods']))
                potentially_suitable_neighborhood.add(str(data['Neighborhoods']))
        

Potentially suitable neighborhood is:Malvern
Potentially suitable neighborhood is:Rouge
Potentially suitable neighborhood is:Waterfront Communities The Island
Potentially suitable neighborhood is:Willowdale East
Potentially suitable neighborhood is:Woburn


In [74]:
map_final = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lon, poi in zip(df_new['Latitude'], df_new['Longitude'], df_new['Neighborhoods']):
    if poi in potentially_suitable_neighborhood:
        label = folium.Popup(str(poi), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            fill=True,
            fill_opacity=0.7).add_to(map_final)
    
map_final

## Conclusion 

After we perform k meaning clustering calculations we had to proceed with the process of making the final decision which neighborhoods are most suitable for opening a new gym.

From the first data frame we chose neighborhoods that have a population of Children (0-14 years) and Youth (15-24 years) bigger than 6896.428571428572 and 6675.714285714285, respectively. We form a set of neighborhood names that fulfills conditions.

From the second data frame we chose neighborhoods that have a population of Working Age (25-54 years) and Pre-retirement (55-64 years) bigger than 45105.0 or 19285.8 and 4798.3 or 4680.0, respectively. We form a set of neighborhood names that fulfills conditions.

From the third data frame we chose neighborhoods that have a population of Working Seniors (65+ years) and Older Seniors (85+ years) less than 7508.0 and 1363, respectively. We form a set of neighborhood names that fulfills conditions.



After we formed a set of names of potential neighborhoods, we made another condition.

The neighborhood should fulfill the first conditon:
<ol>
    <li>Cluster Lable 1 must be 2;</li>
    <li>Cluster Lable 2 must be 2 or 3.</li>
</ol>

The neighborhood should fulfill the first conditon:
<ol>
    <li>Cluster Lable 3 must be 0;</li>
</ol>

If the first condition is fulfilled and the second condition is also fulfilled we defined neighborhood as The Most Suitable.

If the first condition is fulfilled and the second condition is not fulfilled we defined neighborhood as Potentially Suitable Neighborhood.
