# Opportunity Score
This score is based on 8 different categories. 

1. Demographics: Understanding the target customer base and selecting a location with a relevant demographic profile.
2. Population Density: Selecting a location with a sufficient population density to support the customer base. 
3. Foot Traffic: Opting for a busy area with high pedestrian traffic to attract potential customers. 
4. Competition: Assessing the level of competition in the vicinity and choosing a location with a balanced market. 
5. Opportunity: A favorable circumstance for progress and success, arising from changes, trends, or market demand.
6. Affordability: Considering the costs associated with the location, including rent, utilities, and taxes, to ensure financial feasibility.
7. Crime: Prioritizing the safety of customers and staff by selecting a location in a secure neighborhood.
8. Growth Potential: Assessing the growth potential of the surrounding area in terms of population, development plans, and economic indicators.


Ultimately, these categories create a linear objective function. The coefficients will vary depending on what type of business the user is interested in.  PuLP would be implemented to find the optimal datapoint for the Opportunity score. Each block group would then be ranked based on the distance away from the optimal point.  

Opportunity Score = A*[Demographics] + B*[Population Density] + C*[Foot Traffic] - D*[Competition] + E*[Opportunity] + F*[Affordability] - G*[Crime] + H*[Growth Potential]
           

In [1]:
import arcgis
from arcgis.gis import GIS
from arcgis.features import FeatureLayer, FeatureLayerCollection

from pulp import *

import pandas as pd

import psycopg2

from sklearn.preprocessing import StandardScaler

import sys
sys.path.append('../../')
from utils import get_config

sys.path.append('../')
from gis_resources import san_diego_county_zips,  execute_sql, create_where_clause, read_exact_food_biz_categories, read_exact_unhealthy_food_biz_categories

In [2]:
username = get_config("arcgis","username")
password = get_config("arcgis","passkey")
gis = GIS("https://ucsdonline.maps.arcgis.com/home", username=username, password=password)

## Query feature layers into dataframes

TODO: Add note that some variables that seems like they should be included were excluded bc they were subsets of existing variables

Values Needed

Demographics: Diversity Index  ([DIVINDX_CY]) 

Population Density: Population per square mile ([POPDENS_CY])

Foot Traffic: Sum the number of retail and other entertainment businesses per square mile (S09_BUS + S26_BUS + S29_BUS/sqm) 

(neg) Competition:  The percentage of food related businesses in the area (S12_BUS + S16_BUS + N13_BUS +N35_BUS/S01_BUS) 

Opportunity:  Current away from home spending rate and the projected 5 year away from home spending growth ((X1130_A / X1002_A) * (X1130FY_A / X1002FY_A))

Affordability: Housing affordability index (HAI_CY)

(neg) Safety: Total Crime Index AGS (CRMCYTOTC) 

Growth Potential: 5 year population growth rate  (TOTPOP_CY/TOTPOP_FY)


Ultimately breaking down the equation as follows: 


Objective Function = A*[DIVINDX_CY] + B*[POPDENS_CY] + C*[S17_BUS + S17_BUS + S26_BUS + S29_BUS]/[SQ Mile] - D*[S12_SALES + S16_BUS + N13_BUS + N37_BUS + N35_BUS]/[S01_BUS] + E*[X1130_A*X1130FY_A]/[X1002_A*X1002FY_A] + F*[INDWHTR_CY + INDRTTR_CY + INDINFO_CY + INDFIN_CY + INDTECH_CY + INDEDUC_CY + INDHLTH_CY + INDARTS_CY] /[sq miles] + G*[HAI_CY] - H*[CRMCYTOTC] + I*[TOTPOP_CY]/[TOTPOP_FY]


In [3]:
needed_cols = ['DIVINDX_CY', 'POPDENS_CY', 'S09_BUS',
               'S26_BUS', 'S29_BUS', 'S12_BUS', 'S16_BUS',
               'N13_BUS', 'N35_BUS', 'S01_BUS', 'X1130_A', 
              'X1130FY_A', 'X1002_A', 'X1002FY_A', 'HAI_CY', 'CRMCYTOTC', 'TOTPOP_CY', 'TOTPOP_FY'] ##TODO fix s17

In [4]:
cs_out_fields = ['fips', 'X1130_A', 'X1130FY_A', 'X1002FY_A', 'x1002_a']

consumer_spending_layer = FeatureLayer(url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/a2afc4/FeatureServer/0")

consumer_spending_layer_df = consumer_spending_layer.query(out_fields=cs_out_fields,
                              as_df=True,
                              return_geometry=False)

del consumer_spending_layer_df['FID']

consumer_spending_layer_df.head(3)


Unnamed: 0,fips,x1002_a,x1002fy_a,x1130_a,x1130fy_a
0,60730100101,6877.52,8470.91,2942.05,3623.67
1,60730100102,8364.19,10237.27,3457.18,4231.38
2,60730100103,7101.43,7930.31,2935.24,3277.84


In [5]:
business_variables_layer = FeatureLayer(url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/a633a0/FeatureServer/0")

bus_out_fields = ['fips', 'S09_BUS', 
               'S26_BUS', 'S29_BUS','S12_BUS', 'S16_BUS',
               'N13_BUS', 'N37_BUS', 'N35_BUS', 'S01_BUS']

business_variables_layer_df = business_variables_layer.query(out_fields=bus_out_fields,
                              as_df=True,
                              return_geometry=False)

del business_variables_layer_df['FID']
business_variables_layer_df.head(3)


Unnamed: 0,fips,n13_bus,n35_bus,n37_bus,s01_bus,s09_bus,s12_bus,s16_bus,s26_bus,s29_bus
0,60730083111,0.0,0.0,0.0,47.0,1.0,0.0,0.0,1.0,1.0
1,60730083112,0.0,0.0,0.0,25.0,1.0,0.0,0.0,0.0,0.0
2,60730083121,0.0,16.0,12.0,101.0,17.0,0.0,12.0,3.0,6.0


In [6]:
demographics_layer_1 = FeatureLayer(url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/aff1b6/FeatureServer/0")

dem_1_out_fields = ['fips', 'DIVINDX_CY', 'POPDENS_CY', 'TOTPOP_CY']

demographics_layer_1_df = demographics_layer_1.query(out_fields=dem_1_out_fields,
                              as_df=True,
                              return_geometry=False)

del demographics_layer_1_df['FID']

demographics_layer_1_df.head(3)


Unnamed: 0,divindx_cy,fips,popdens_cy,totpop_cy
0,46.0,60730001001,4714.0,1199.0
1,45.6,60730001002,4993.9,1692.0
2,47.5,60730002011,4614.1,902.0


In [7]:
demographics_layer_2 = FeatureLayer(url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ab8ff7/FeatureServer/0")

dem_2_out_fields = ['fips', 'TOTPOP_FY', 'HAI_CY']

demographics_layer_2_df = demographics_layer_2.query(out_fields=dem_2_out_fields,
                              as_df=True,
                              return_geometry=False)

del demographics_layer_2_df['FID']

demographics_layer_2_df.head(3)


Unnamed: 0,fips,hai_cy,totpop_fy
0,60730083111,49.0,1821.0
1,60730083112,49.0,1130.0
2,60730083121,48.0,611.0


Read CSV for CRMCYTOTC

In [8]:
crime_df = pd.read_csv('../resources/only_crime_enriched_san_diego_county_block_groups.csv')
crime_df = crime_df[['fips', 'crmcytotc', 'sqmi']]
crime_df['fips'] = "0" + crime_df['fips'].astype(str)

Combine all dataframes into one. 

In [9]:
opportunity_competition_df = consumer_spending_layer_df.merge(business_variables_layer_df, left_on = 'fips', right_on = 'fips')
opportunity_competition_df = opportunity_competition_df.merge(demographics_layer_1_df, left_on = 'fips', right_on = 'fips')
opportunity_competition_df = opportunity_competition_df.merge(demographics_layer_2_df, left_on = 'fips', right_on = 'fips')
opportunity_competition_df = opportunity_competition_df.merge(crime_df, left_on = 'fips', right_on = 'fips')

opportunity_competition_df.head(4)

Unnamed: 0,fips,x1002_a,x1002fy_a,x1130_a,x1130fy_a,n13_bus,n35_bus,n37_bus,s01_bus,s09_bus,...,s16_bus,s26_bus,s29_bus,divindx_cy,popdens_cy,totpop_cy,hai_cy,totpop_fy,crmcytotc,sqmi
0,60730100101,6877.52,8470.91,2942.05,3623.67,0.0,0.0,0.0,11.0,2.0,...,0.0,0.0,1.0,78.7,15154.6,1690.0,81.0,1654.0,52.0,0.11
1,60730100102,8364.19,10237.27,3457.18,4231.38,2.0,3.0,3.0,25.0,8.0,...,2.0,1.0,1.0,80.8,9018.8,2185.0,53.0,2136.0,82.0,0.24
2,60730100103,7101.43,7930.31,2935.24,3277.84,1.0,3.0,3.0,29.0,13.0,...,3.0,1.0,0.0,84.8,15980.3,1565.0,201.0,1530.0,104.0,0.1
3,60730100111,10566.02,12361.14,4519.91,5287.83,1.0,1.0,1.0,16.0,2.0,...,1.0,2.0,2.0,84.2,7335.9,1485.0,79.0,1496.0,146.0,0.2


Create sub scores

In [10]:
opportunity_competition_df['demographics'] = opportunity_competition_df['divindx_cy']

opportunity_competition_df['population_density'] = opportunity_competition_df['popdens_cy']

opportunity_competition_df['foot_traffic'] = (opportunity_competition_df['s09_bus'] +  
                                             opportunity_competition_df['s26_bus'] + 
                                             opportunity_competition_df['s29_bus'])/opportunity_competition_df['sqmi']
    
opportunity_competition_df['competition'] = (opportunity_competition_df['s12_bus'] + 
                                            opportunity_competition_df['s16_bus'] + 
                                            opportunity_competition_df['n13_bus'] + 
                                            opportunity_competition_df['n35_bus'])/opportunity_competition_df['s01_bus']

opportunity_competition_df['opportunity'] = (opportunity_competition_df['x1130_a'] *
                                            opportunity_competition_df['x1130fy_a']) / (opportunity_competition_df['x1002_a'] *
                                            opportunity_competition_df['x1002fy_a'])  

opportunity_competition_df['affordability'] = opportunity_competition_df['hai_cy']

opportunity_competition_df['crime'] = opportunity_competition_df['crmcytotc']

opportunity_competition_df['growth_potential'] = opportunity_competition_df['totpop_cy'] / opportunity_competition_df['totpop_fy']


In [11]:
score_variables = opportunity_competition_df[['fips', 'demographics', 'population_density', 'foot_traffic', 'competition',
                                             'opportunity', 'affordability', 'crime',
                                             'growth_potential']]

score_variables['opportunity'] = score_variables['opportunity'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  score_variables['opportunity'] = score_variables['opportunity'].astype(float)


Normalize the data per col

In [12]:
scaler = StandardScaler()
score_variables_standardized = score_variables.copy().set_index('fips')
score_variables_standardized[score_variables_standardized.columns] = scaler.fit_transform(score_variables_standardized)
score_variables_standardized

##cap values at 4 
def truncate_std(std, boundary = 4):
    if std > boundary:
        return boundary
    elif std < -boundary:
        return -boundary
    else:
        return std
    
for col in score_variables_standardized.columns:
    score_variables_standardized[col] = score_variables_standardized[col].apply(lambda x: truncate_std(x))


# Use PuLP to calculate the score 


In [13]:
#set coeffecients 
dem_coef = 1
pop_den_coef = 1
foot_traf_coef = 1
comp_coef = 1
opp_coef = 1
aff_coef = 1
crime_coef = 1
growth_pot_coef = 1

In [14]:
problem = LpProblem("Opportunity Problem", LpMaximize)

demographics = LpVariable("demographics", score_variables_standardized['demographics'].min(), score_variables_standardized['demographics'].max())
population_density = LpVariable("population_density", score_variables_standardized['population_density'].min(), score_variables_standardized['population_density'].max())
foot_traffic = LpVariable("foot_traffic", score_variables_standardized['foot_traffic'].min(), score_variables_standardized['foot_traffic'].max())
competition = LpVariable("competition", score_variables_standardized['competition'].min(), score_variables_standardized['competition'].max())
opportunity = LpVariable("opportunity", score_variables_standardized['opportunity'].min(), score_variables_standardized['opportunity'].max())
affordability = LpVariable("affordability", score_variables_standardized['affordability'].min(), score_variables_standardized['affordability'].max())
crime = LpVariable("crime", score_variables_standardized['crime'].min(), score_variables_standardized['crime'].max())
growth_potential = LpVariable("growth_potential", score_variables_standardized['growth_potential'].min(), score_variables_standardized['growth_potential'].max())


problem += (dem_coef*demographics) + (pop_den_coef*population_density) + (foot_traf_coef*foot_traffic) - (comp_coef*competition) + (opp_coef*opportunity) + (aff_coef*affordability) - (crime_coef*crime) + (growth_pot_coef*growth_potential)
        

problem.writeLP("OpportunityCompetition.lp")
problem.solve()

print("Status:", LpStatus[problem.status])

# Each of the variables is printed with it's resolved optimum value
for v in problem.variables():
    print(v.name, "=", v.varValue)

# The optimised objective function value is printed to the screen
print("Most optimal sum = ", value(problem.objective))

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /Users/jessica.allen/PycharmProjects/capstone_again/venv/lib/python3.9/site-packages/pulp/solverdir/cbc/osx/64/cbc /var/folders/q1/22xbrl4x06zbl30hwjxzsfm80000gr/T/72b060f03d194719b78bd4cdbc41e35a-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/q1/22xbrl4x06zbl30hwjxzsfm80000gr/T/72b060f03d194719b78bd4cdbc41e35a-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 5 COLUMNS
At line 14 RHS
At line 15 BOUNDS
At line 32 ENDATA
Problem MODEL has 0 rows, 8 columns and 0 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Empty problem - 0 rows, 8 columns and 0 elements
Optimal - objective value 19.703332
Optimal objective 19.70333185 - 0 iterations time 0.002
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

Status: Optimal
affo



In [15]:
score_variables_standardized['score'] = (dem_coef*score_variables_standardized['demographics']) + (pop_den_coef*score_variables_standardized['population_density']) + (foot_traf_coef*score_variables_standardized['foot_traffic']) - (comp_coef*score_variables_standardized['competition']) + (opp_coef*score_variables_standardized['opportunity']) + (aff_coef*score_variables_standardized['affordability']) - (crime_coef*score_variables_standardized['crime']) + (growth_pot_coef*score_variables_standardized['growth_potential'])
score_variables_standardized['distance_from_optimal_score'] = value(problem.objective) - score_variables_standardized['score']

In [16]:
score_variables_standardized.sort_values('distance_from_optimal_score').head(10)

Unnamed: 0_level_0,demographics,population_density,foot_traffic,competition,opportunity,affordability,crime,growth_potential,score,distance_from_optimal_score
fips,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
60730157013,1.067218,3.491881,-0.041536,-0.411435,-0.267071,4.0,-0.412021,0.430551,9.504499,10.198833
60730157051,0.396429,2.067344,0.26769,-0.107965,-0.000777,4.0,-0.719371,0.406237,7.964257,11.739074
60730083634,0.155246,4.0,0.334913,-0.980442,0.847799,0.489502,-0.538577,0.427462,7.77394,11.929392
60730009022,0.260763,2.015658,2.037898,-0.713357,1.584182,2.983194,0.600427,-1.288506,7.70612,11.997212
60730204041,1.074755,0.613797,-0.422875,-0.668843,0.315151,4.0,-0.918245,0.493919,7.661836,12.041496
60730096033,0.984312,1.506359,4.0,0.35251,0.31515,1.145736,0.907777,0.476974,7.168244,12.535088
60730053022,-1.110963,3.821388,3.113467,-0.396938,1.584181,-0.626098,0.600427,0.473971,7.052457,12.650875
60730157061,0.728055,1.67529,0.119799,-0.225413,-0.536232,4.0,-0.321624,0.515627,7.049576,12.653756
60730083632,-0.259287,4.0,2.934205,0.313893,0.847779,-1.566701,-0.809768,0.501511,6.953383,12.749949
60730083753,-0.131158,4.0,-0.113241,-0.411435,1.584195,0.073886,-0.520497,0.519784,6.865398,12.837934


In [17]:
final_df = score_variables_standardized.reset_index()
final_df = final_df[['fips', 'distance_from_optimal_score']].dropna()


scoremin= final_df['distance_from_optimal_score'].min()
scoremax = final_df['distance_from_optimal_score'].max()

final_df['score'] = ((final_df['distance_from_optimal_score']-scoremin) / (scoremax-scoremin) )

final_df['score'] = 1 - final_df['score'] ##subtract from one (smaller the better)

In [18]:
final_df.describe()

Unnamed: 0,distance_from_optimal_score,score
count,2053.0,2053.0
mean,19.747357,0.5269
std,2.439892,0.120889
min,10.198833,0.0
25%,18.130991,0.453312
50%,19.734753,0.527524
75%,21.23258,0.606986
max,30.381718,1.0


In [19]:
final_df.head(5)

Unnamed: 0,fips,distance_from_optimal_score,score
0,60730100101,15.672823,0.728781
1,60730100102,21.130467,0.458371
2,60730100103,16.317914,0.696818
3,60730100111,20.246783,0.502155
4,60730150012,22.281426,0.401345
