# **GYRO** 
Finding the best city based on cities' data using weighted criteria. 

## A. Data
After doing analysis on cities' data in [analysis notebook](), I want to address its core problem with the assumption that the criteria is equally important. I am going to use the exact data that I aggregated from [cleaning notebook]().

In [None]:
%pip install geopandas
%pip install ipyleaflet
%pip install -U matplotlib
!jupyter nbextension enable --py --sys-prefix ipyleaflet

In [1]:
import pandas as pd
data = pd.read_csv('MA Cities Data Oct 10.csv')
data.sample(5)

Unnamed: 0,CITY,1-Family,2-Family,3-Family,4-Family,5+ Family,OWNER,RENTER,SHAPE_AREA,distance,...,Gas,<20K,20K-39K,40K-59K,60K-74K,75K-99K,>100K,Spanish Speaker,Black,Latino or Hispanic
106,GOSNOLD,146.0,6.0,,,2.0,,32.0,34802980.0,25.799851,...,False,,,,,,,0.0,,
132,HOLYOKE,4298.0,2051.0,273.0,154.0,,4913.0,1863.0,59048520.0,105.95401,...,True,30.7,18.7,14.85,8.46,9.87,17.41,42.54,3.27,52.14
192,NEEDHAM,3211.0,,,,,3002.0,209.0,32947770.0,8.871477,...,True,6.38,6.34,6.44,5.24,7.48,68.12,2.12,3.11,3.03
65,COLRAIN,448.0,92.0,,3.0,,308.0,235.0,112314100.0,111.371409,...,False,11.38,18.02,,15.45,12.74,18.02,0.41,0.72,0.56
286,TEMPLETON,1747.0,833.0,14.0,15.0,2.0,1858.0,748.0,83865780.0,66.576892,...,False,9.16,,13.22,,14.73,,0.69,0.02,2.45


First, I need to transform the data into the same form as I drew conclusion in analysis notebook. I am going to redo all the operation in each criteria.

### 1. The closest city from Boston

In [2]:
distance = data[['CITY', 'distance']].copy()
most_distance = distance.sort_values('distance').rename(columns={'distance':'Distance'})

### 2. The City with The Most of Family Buildings

In [3]:
buildings = data[['CITY', '1-Family', '2-Family', '3-Family', '4-Family', '5+ Family']].copy()
buildings['1-4 Family'] = buildings.drop(columns=['5+ Family', 'CITY']).sum(axis=1)
buildings['Density'] = buildings['1-4 Family'] / data['SHAPE_AREA']
most_buildings = buildings.sort_values('1-4 Family', ascending=False)[['CITY', '1-4 Family']]
most_dense = buildings.sort_values('Density', ascending=False)[['CITY', 'Density']]

### 3. The City with The Most People Speaking Second Language

In [4]:
spanish = data[['CITY', 'Spanish Speaker', 'Latino or Hispanic']].copy()
population = pd.read_csv('MA cities population.csv', dtype={'Population':'int32'})
spanish = spanish.merge(population, on='CITY', how='left')
spanish = spanish.fillna(0.0)
spanish['Spanish Speaker Population'] = spanish['Spanish Speaker'] / 100 * spanish['Population']
most_spanish = spanish.sort_values('Spanish Speaker Population', ascending=False)[['CITY', 'Spanish Speaker Population']]

### 4. The City with The Most Renter

In [5]:
owner_renter = data[['CITY', 'OWNER', 'RENTER']].fillna(0).astype({'OWNER':'int32', 'RENTER':'int32'}).copy()
most_renters = owner_renter.sort_values('RENTER', ascending=False)[['CITY', 'RENTER']].rename(columns={'RENTER':'Renter'})

In [6]:
mosts = [most_buildings, most_dense, most_spanish, most_renters]
input_data = most_distance.copy()
for criteria in mosts:
    input_data = input_data.merge(criteria, how='left', on='CITY')
input_data = input_data.dropna()
input_data.sample(10)

Unnamed: 0,CITY,Distance,1-4 Family,Density,Spanish Speaker Population,Renter
256,WENDELL,90.181856,303.0,4e-06,36.7882,161
318,CLARKSBURG,137.857601,419.0,1.3e-05,9.828,221
244,WELLFLEET,76.279187,2623.0,4.8e-05,7.8996,2266
1,CAMBRIDGE,1.350934,8266.0,0.000449,8562.744,11592
20,WATERTOWN,5.473547,2813.0,0.000263,2170.7156,731
322,NORTH ADAMS,139.654902,2716.0,5.1e-05,444.277,772
102,WAYLAND,17.337517,4104.0,0.0001,229.661,711
250,WARE,80.168627,2671.0,2.6e-05,244.7172,699
199,SANDWICH,46.38582,6546.0,5.8e-05,496.1574,2133
230,BROOKFIELD,69.065138,484.0,1.1e-05,28.3064,168


## B. Scoring
In order to quantify how 'good' a city is, I need to come up with a way to score them. Fortunately, all of the values are already in numerical format. One of ways to do score them is by normalizing the values them sum them all up. This will result in a score number with a specific range no matter how different the numbers are. Also, since some criteria like distance for example, I want the score gets higher the less distance a city has, I need to make the values negative before I normalized them.

In [7]:
normalized_data = input_data.copy().drop(columns=['CITY'])
normalized_data['Distance'] = normalized_data['Distance'] * -1

In [8]:
normalized_data = (normalized_data-normalized_data.min())/(normalized_data.max()-normalized_data.min())
normalized_data['Score'] = normalized_data.sum(axis=1)
normalized_data['Score'] = (normalized_data['Score']-normalized_data['Score'].min())/(normalized_data['Score'].max()-normalized_data['Score'].min())
normalized_data['CITY'] = input_data['CITY']
normalized_data.sort_values('Score', ascending=False)

Unnamed: 0,Distance,1-4 Family,Density,Spanish Speaker Population,Renter,Score,CITY
4,0.990176,1.000000,0.377651,1.000000,1.000000,1.000000,BOSTON
0,1.000000,0.218380,1.000000,0.055981,0.084876,0.539607,SOMERVILLE
201,0.708336,0.532419,0.261582,0.237062,0.319106,0.470675,WORCESTER
89,0.909412,0.363967,0.473434,0.132679,0.147466,0.463444,LOWELL
57,0.941165,0.254384,0.416441,0.292909,0.057543,0.448656,LYNN
...,...,...,...,...,...,...,...
337,0.047756,0.004133,0.004232,0.000047,0.001756,0.012118,RICHMOND
340,0.036371,0.005200,0.005393,0.000335,0.003852,0.010565,WEST STOCKBRIDGE
342,0.014034,0.004035,0.004167,0.000534,0.003040,0.004757,EGREMONT
341,0.017224,0.002034,0.003682,0.000041,0.001077,0.004355,ALFORD


## C. Weighting
I got the score however the problem still remains like the analysis notebook, all the criteria is still fixed to be equally important. Here is where weight coefficient comes into the play. First, I will set the weight to be equally distributed to prove if we still have the same result as above.

In [9]:
weights = {
    'Distance': 0.2,
    '1-4 Family': 0.2,
    'Density': 0.2,
    'Spanish Speaker Population': 0.2,
    'Renter': 0.2
}

In [10]:
copy_data = normalized_data.copy()
for col, weight in weights.items():
    copy_data[col] = copy_data[col] * weight
normalized_data['Score'] = copy_data[list(weights.keys())].sum(axis=1)
normalized_data['Score'] = (normalized_data['Score']-normalized_data['Score'].min())/(normalized_data['Score'].max()-normalized_data['Score'].min())
normalized_data.sort_values('Score', ascending=False)

Unnamed: 0,Distance,1-4 Family,Density,Spanish Speaker Population,Renter,Score,CITY
4,0.990176,1.000000,0.377651,1.000000,1.000000,1.000000,BOSTON
0,1.000000,0.218380,1.000000,0.055981,0.084876,0.539607,SOMERVILLE
201,0.708336,0.532419,0.261582,0.237062,0.319106,0.470675,WORCESTER
89,0.909412,0.363967,0.473434,0.132679,0.147466,0.463444,LOWELL
57,0.941165,0.254384,0.416441,0.292909,0.057543,0.448656,LYNN
...,...,...,...,...,...,...,...
337,0.047756,0.004133,0.004232,0.000047,0.001756,0.012118,RICHMOND
340,0.036371,0.005200,0.005393,0.000335,0.003852,0.010565,WEST STOCKBRIDGE
342,0.014034,0.004035,0.004167,0.000534,0.003040,0.004757,EGREMONT
341,0.017224,0.002034,0.003682,0.000041,0.001077,0.004355,ALFORD


The score remains the same despite the weights on each criteria as expected.

In [11]:
weights = {
    'Distance': 0.1,
    '1-4 Family': 0.4,
    'Density': 0.2,
    'Spanish Speaker Population': 0.2,
    'Renter': 0.1
}
copy_data = normalized_data.copy()
for col, weight in weights.items():
    copy_data[col] = copy_data[col] * weight
normalized_data['Score'] = copy_data[list(weights.keys())].sum(axis=1)
normalized_data['Score'] = (normalized_data['Score']-normalized_data['Score'].min())/(normalized_data['Score'].max()-normalized_data['Score'].min())
normalized_data.sort_values('Score', ascending=False)

Unnamed: 0,Distance,1-4 Family,Density,Spanish Speaker Population,Renter,Score,CITY
4,0.990176,1.000000,0.377651,1.000000,1.000000,1.000000,BOSTON
201,0.708336,0.532419,0.261582,0.237062,0.319106,0.474302,WORCESTER
270,0.393835,0.515935,0.294624,0.438693,0.168689,0.467259,SPRINGFIELD
0,1.000000,0.218380,1.000000,0.055981,0.084876,0.464678,SOMERVILLE
89,0.909412,0.363967,0.473434,0.132679,0.147466,0.425130,LOWELL
...,...,...,...,...,...,...,...
337,0.047756,0.004133,0.004232,0.000047,0.001756,0.007146,RICHMOND
340,0.036371,0.005200,0.005393,0.000335,0.003852,0.006902,WEST STOCKBRIDGE
342,0.014034,0.004035,0.004167,0.000534,0.003040,0.003483,EGREMONT
341,0.017224,0.002034,0.003682,0.000041,0.001077,0.002483,ALFORD


After I change the weight where 1-4 family criteria is more important than the distance and renter, the new table shows slightly different results

## D. Visualization
On analysis notebook I used ipyleaflet module to make an interactive maps. However, in order to display an interactive choropleth maps with live calculation, the module will not be sufficient. It was not built for this purpose, so I need to use a simpler one, Matplotlib.

In [12]:
from ipywidgets.widgets import interact, FloatSlider
from IPython.display import display
import numpy as np

In [13]:
# Setting Default Weights
Distance = FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2, description='Distance')
Family = FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2, description='1-4 Family')
Density = FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2, description='Density')
Spanish = FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2, description='Spanish Speaker Population')
Renter = FloatSlider(min=0.0, max=1.0, step=0.1, value=0.2, description='Renter')
weights = {
    'Distance': Distance,
    '1-4 Family': Family,
    'Density': Density,
    'Spanish Speaker Population': Spanish,
    'Renter': Renter
}
visited_cities = ['CAMBRIDGE', 'BOSTON', 'METHUEN', 'LAWRENCE']
viz_data = normalized_data[~normalized_data['CITY'].isin(visited_cities)].copy()[['CITY', 'Score']]

In [14]:
def update_weight(source):
    '''
        Update criteria's weight.
    '''
    if source.old == source.new:
        return
    criteria = source.owner.description
    val = source.new
    weights[criteria].value = val
    difference = sum([slider.value for slider in weights.values()])
    offset = 1 - difference
    sliders = [slider for c, slider in weights.items() if c != criteria]
    sliders.sort(key=lambda slider: slider.value, reverse=offset < 0)
    sliders[0].value += offset

In [15]:
def handle_change(Distance, Family, Density, Spanish, Renter):
    '''
        Handle changes from a slider and update the map.
    '''
    copy_data = normalized_data[~normalized_data['CITY'].isin(visited_cities)].copy()
    for col, weight in weights.items():
        copy_data[col] = copy_data[col] * weight.value
    viz_data['Score'] = copy_data[list(weights.keys())].sum(axis=1)
    viz_data['Score'] = (viz_data['Score']-viz_data['Score'].min())/(viz_data['Score'].max()-viz_data['Score'].min())
    viz_data['Score'] = viz_data['Score'] * 100
    return viz_data.sort_values('Score', ascending=False).head(10)

In [16]:
_sliders = [slider.observe(update_weight, names='value') for slider in weights.values()]
interact(handle_change, Distance=Distance, Family=Family, Density=Density, Spanish=Spanish, Renter=Renter)

interactive(children=(FloatSlider(value=0.2, description='Distance', max=1.0), FloatSlider(value=0.2, descript…

<function __main__.handle_change(Distance, Family, Density, Spanish, Renter)>

In [17]:
from ipyleaflet import (Map, GeoData, basemaps, WidgetControl, GeoJSON,
 LayersControl, Choropleth, SearchControl, Marker)
from ipywidgets import Text, HTML
from branca.colormap import linear
import geopandas as gpd
import json

In [19]:
cities_shp = gpd.read_file('maps/map cities.shp')
cities_shp = cities_shp.rename(columns={'TOWN':'CITY'})
cities_shp = cities_shp[['CITY', 'geometry']].merge(viz_data, on='CITY', how='left')
cities_shp.to_file("maps/_map_buffer.geojson", driver='GeoJSON')
all_in_energy = (42.3497392, -71.1067746)
zoom = 9
m = Map(center=all_in_energy, zoom=zoom)
geojson_data = json.load(open("maps/_map_buffer.geojson",'r'))
maps_data = dict(zip(viz_data['CITY'], viz_data['Score']))
for feature in geojson_data['features']:
    properties = feature['properties']
    if not properties['CITY'] in maps_data:
        maps_data[properties['CITY']] = 0
    feature.update(id=properties['CITY'])
distance_layer = Choropleth(
    geo_data=geojson_data,
    choro_data=maps_data,
    colormap=linear.YlOrRd_04,
    border_color='black',
    style={'fillOpacity': 1})
marker = Marker(location=all_in_energy, draggable=False)
html = HTML('''Hover Over Cities''')
html.layout.margin = '0px 20px 20px 20px'
control = WidgetControl(widget=html, position='topright')
m.add_control(control)
def update_html(feature, **kwargs):
     html.value = '''
     <h3><b>{}</b></h3>
     <h4>Score: {}</h4> 
      '''.format(feature['properties']['CITY'],
        feature['properties']['Score'] if feature['properties']['Score'] else 'No data or Visited')
distance_layer.on_hover(update_html)
m.add_layer(marker)
m.add_layer(distance_layer)
m

Map(center=[42.3497392, -71.1067746], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_titl…