### Objective:
* Get Aggregated Data from `ca_business`, nourish db table. The businesses should be `fast food restaurants` and `Convenience store`
* Create a layer using the aggregated count on each business. We could do this by joining this with our base california zip code feature layer.

Iterations : Strategy to identify Food desert:
1. Total Fast Food Restaurants and Convenience Stores per zip/sqmi/(pop/10000) - Done
    * Count - 22655
2. Total Fast Food Restaurants and Convenience Stores per zip/(pop/10000) - Done
3. Total number of HPF serving businesses( pizza restaurants, ice cream shops, liquor stores etc.) / Total Food serving Businesses.
    * Find out all the HPF food serving businesses from ca_business table.
        * Liquor store
        * Ice cream shop
        * Pizza restaurant
        * Pizza Takeout
        * Sports bar
        * Brewpub
        * Brewery
        * Candy store
        * Hamburger restaurant
        * Chicken wings restaurant
        * Sports bar
        * Dessert shop
        * Donut shop
        * Cocktail bar
        * Wine bar
        * Wine store
        
    * Find out Total Food serving Businesses
    Single Filter in the categories.
    * count of businesses: 39171.
    Filter on categories and then filter on names:
    * count of businesses: 47535
4. Total number of HPF serving businesses(fast food restaurants, convenience stores, pizza restaurants, ice cream shops, liquor stores etc.) / Total nutiritional Food serving Businesses.
5. Total number of HPF serving businesses( pizza restaurants, ice cream shops, liquor stores etc.) / Total Food serving Businesses but for a radius of may be X miles from a zip polygon [once we figure out queriying features within a distance].


In [1]:
import pandas as pd
import psycopg2
import arcgis
from arcgis.gis import GIS
from arcgis.features import FeatureLayer, FeatureLayerCollection
import sys
sys.path.append('../../')
sys.path.append('../../../')
from gis_resources import san_diego_county_zips
from utils import get_config

In [2]:
username = get_config("arcgis","username")
password = get_config("arcgis","passkey")
gis = GIS("https://ucsdonline.maps.arcgis.com/home", username=username, password=password)

<configparser.ConfigParser object at 0x105aded60>
<configparser.ConfigParser object at 0x105adebe0>


In [3]:
nourish_user = get_config("nourish_db","username")
nourish_pswd = get_config("nourish_db","passkey")

<configparser.ConfigParser object at 0x12acd2760>
<configparser.ConfigParser object at 0x12af5dd60>


In [4]:
conn = psycopg2.connect(
    host="awesome-hw.sdsc.edu",
    database="nourish",
    user=nourish_user,
    password=nourish_pswd)


In [5]:
# 'Ice cream shop' = any(categories) OR
# 'Pizza restaurant' = any(categories) OR
# 'Sports bar' = any(categories) OR 
# 'Brewpub' = any(categories) OR 
# 'Brewery' = any(categories) OR 
# 'Candy store' = any(categories) OR
# 'Hamburger restaurant' = any(categories)

In [6]:
%%time
# create a cursor
cur = conn.cursor()

query_str = """select zip, count(*)
             as count, array_agg(distinct name) as restaurants
            from ca_business
            where ('Fast food restaurant' = any(categories) OR
                   'Convenience store' = any(categories) OR
                    'Ice cream shop' = any(categories) OR
                    'Pizza restaurant' = any(categories) OR
                    'Sports bar' = any(categories) OR 
                    'Brewpub' = any(categories) OR 
                    'Brewery' = any(categories) OR 
                    'Candy store' = any(categories) OR
                    'Hamburger restaurant' = any(categories)
                   )
            group by zip
            order by count desc, zip"""

cur.execute(query_str)


# display the PostgreSQL database server version
ca_business_agg_result = cur.fetchall()
       
# Close the communication with the PostgreSQL
cur.close()

CPU times: user 6.28 ms, sys: 3.2 ms, total: 9.48 ms
Wall time: 467 ms


In [7]:
dataFrame = pd.DataFrame(ca_business_agg_result,
              columns=("zip_code","biz_cnt", "biz_names")
              );
dataFrame['biz_names']=dataFrame.biz_names.apply(lambda x: ';'.join(x))
dataFrame = dataFrame.dropna(subset=['zip_code'])
#dataFrame

In [8]:
# converting 'Weight' from float to int
dataFrame['zip_code'] = dataFrame['zip_code'].astype(int)
dataFrame['zip_code'] = dataFrame['zip_code'].astype(str)

In [9]:
#dataFrame

Let's get the california zip code layer and do a join to create a new updated layer.  
`TOGO`: We should ideally update the layer but I do not know the process yet and we would not touch the base layer for now.

`Base california Zip code layer`: California Zip Codes 1.2

`FL URL`: https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer

In [10]:
flc = FeatureLayerCollection(gis=gis,
                             url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer")

In [11]:
fs = flc.layers[0].query()

In [12]:
zip_cd_base_sdf = fs.sdf
print(f"Shape: {zip_cd_base_sdf.shape}")
#zip_cd_base_sdf.head(4)


Shape: (1721, 11)


In [13]:
updated_sdf=pd.merge(zip_cd_base_sdf,dataFrame, left_on='ZIP_CODE',right_on='zip_code', how='left')
print(f"Shape: {updated_sdf.shape}")
#updated_sdf.head(4)

Shape: (1721, 14)


In [14]:
# Let's filter only san diego zips
sd_zips = san_diego_county_zips()
updated_sdf = updated_sdf[updated_sdf['ZIP_CODE'].isin(sd_zips)]

In [15]:
# Number of zips with none of our business of interests
updated_sdf['biz_cnt'].isna().sum()

14

In [16]:
# Number of zips with our business of interests
updated_sdf['biz_cnt'].notna().sum()

95

In [17]:
# Removing some unnecessary columns
updated_sdf = updated_sdf.drop(['FID','zip_code','Shape__Are','Shape__Area','Shape__Len','Shape__Length','OBJECTID'], axis=1)
updated_sdf.head(4)

Unnamed: 0,ZIP_CODE,PO_NAME,STATE,SQMI,SHAPE,biz_cnt,biz_names
340,91901,Alpine,CA,98.71,"{'rings': [[[-12980686.298148, 3868599.6407556...",11.0,Carl's Jr.;Chevron;CVS;KFC;Little Caesars Pizz...
341,91902,Bonita,CA,12.87,"{'rings': [[[-13020538.2306131, 3851695.198877...",8.0,7-Eleven;Colima's Mexican Foods;Cotijas Taco S...
342,91905,Boulevard,CA,130.91,"{'rings': [[[-12936268.4075063, 3850327.339341...",1.0,Mountain Top Market & Gas
343,91906,Campo,CA,155.88,"{'rings': [[[-12952835.6198649, 3855658.061591...",4.0,Cameron Corners;Circle K;Sinclair;Subway


In [18]:
# Checking unique Zip codes.
updated_sdf['ZIP_CODE'].unique().size

109

In [19]:
# Get population of the zip codes from demographics ?
zip_demo_data = pd.read_csv('../../output/San_Diego_County/zip5/esri_demographics.csv')
zip_demo_data = zip_demo_data[['std_geography_id','totpop_cy']]
zip_demo_data['std_geography_id'] = zip_demo_data['std_geography_id'].astype(str)
zip_demo_data

Unnamed: 0,std_geography_id,totpop_cy
0,91910,79329.0
1,91911,88646.0
2,91913,54728.0
3,91914,17394.0
4,91915,34756.0
...,...,...
109,92152,298.0
110,92154,88330.0
111,92155,1399.0
112,92161,65.0


In [20]:
# Getting the total population from demographics data from our zip code level demographics feature enrichement.
final_merged = pd.merge(updated_sdf,zip_demo_data, left_on='ZIP_CODE',right_on='std_geography_id', how='left')
final_merged

Unnamed: 0,ZIP_CODE,PO_NAME,STATE,SQMI,SHAPE,biz_cnt,biz_names,std_geography_id,totpop_cy
0,91901,Alpine,CA,98.71,"{'rings': [[[-12980686.298148, 3868599.6407556...",11.0,Carl's Jr.;Chevron;CVS;KFC;Little Caesars Pizz...,91901,17778.0
1,91902,Bonita,CA,12.87,"{'rings': [[[-13020538.2306131, 3851695.198877...",8.0,7-Eleven;Colima's Mexican Foods;Cotijas Taco S...,91902,17713.0
2,91905,Boulevard,CA,130.91,"{'rings': [[[-12936268.4075063, 3850327.339341...",1.0,Mountain Top Market & Gas,91905,1780.0
3,91906,Campo,CA,155.88,"{'rings': [[[-12952835.6198649, 3855658.061591...",4.0,Cameron Corners;Circle K;Sinclair;Subway,91906,3860.0
4,91910,Chula Vista,CA,11.93,"{'rings': [[[-13032484.0253008, 3849772.356766...",52.0,7-Eleven;ampm;Burger King;Canada Steak Burger;...,91910,79329.0
...,...,...,...,...,...,...,...,...,...
104,92155,San Diego,CA,0.23,"{'rings': [[[-13041507.8170711, 3852542.901068...",1.0,Subway,92155,1399.0
105,92173,San Ysidro,CA,3.59,"{'rings': [[[-13027361.2248301, 3834756.011745...",27.0,7-Eleven;Burger King;Carl's Jr.;CARL'S JR. 110...,92173,29792.0
106,92182,San Diego,CA,0.28,"{'rings': [[[-13031764.7900486, 3866068.797411...",2.0,Chipotle Mexican Grill;The Habit Burger Grill,92182,112.0
107,92536,Aguanga,CA,94.93,"{'rings': [[[-13004733.3119469, 3972594.856524...",,,92536,3893.0


In [21]:
# Check a zip with minimum population
final_merged.loc[final_merged['totpop_cy'].idxmin()]

ZIP_CODE                                                        92096
PO_NAME                                                    San Marcos
STATE                                                              CA
SQMI                                                             0.35
SHAPE               {'rings': [[[-13041416.4237302, 3912832.573622...
biz_cnt                                                           NaN
biz_names                                                         NaN
std_geography_id                                                92096
totpop_cy                                                         0.0
Name: 66, dtype: object

In [22]:
# For example, if a certain region has a population of 1000 people, 
# and there are 50 ffr in the region with a total area of 25 square miles, 
# the density of ffr would be 50 / 25 = 2 fr per square mile. To normalize 
# this by population, you would divide 2 by 1000, which gives a normalized density 
# of 0.002 FFR per person in the region.

# if we want to do per 1000 person, lets devide the population by 1000. So normalized density would be 2 FFR per 1000 people.
def normalize_zip(num_biz, sqmi, pop, norm_pop):
    return num_biz/sqmi/(pop/norm_pop)

def normalize_zip_wo_sqmi(num_biz, sqmi, pop, norm_pop):
    return num_biz/(pop/norm_pop)


In [23]:
final_merged[final_merged['totpop_cy']==0.0]

Unnamed: 0,ZIP_CODE,PO_NAME,STATE,SQMI,SHAPE,biz_cnt,biz_names,std_geography_id,totpop_cy
66,92096,San Marcos,CA,0.35,"{'rings': [[[-13041416.4237302, 3912832.573622...",,,92096,0.0


In [24]:
# Removing the 0.0 pop. zips.. WHYYYY? 
final_merged = final_merged[final_merged['totpop_cy']>0.0]

In [25]:
## Calculating the normalized density per of fast food rest per square mile per 10k people.
final_merged['n_density'] = final_merged.apply(lambda x: normalize_zip_wo_sqmi(x.biz_cnt, x.SQMI, x.totpop_cy, 10000), axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_merged['n_density'] = final_merged.apply(lambda x: normalize_zip_wo_sqmi(x.biz_cnt, x.SQMI, x.totpop_cy, 10000), axis=1)


In [26]:
# std_geography_id is nothing but zip code, so removing it.
final_merged = final_merged.drop(['std_geography_id'], axis=1)

In [27]:
final_merged

Unnamed: 0,ZIP_CODE,PO_NAME,STATE,SQMI,SHAPE,biz_cnt,biz_names,totpop_cy,n_density
0,91901,Alpine,CA,98.71,"{'rings': [[[-12980686.298148, 3868599.6407556...",11.0,Carl's Jr.;Chevron;CVS;KFC;Little Caesars Pizz...,17778.0,6.187423
1,91902,Bonita,CA,12.87,"{'rings': [[[-13020538.2306131, 3851695.198877...",8.0,7-Eleven;Colima's Mexican Foods;Cotijas Taco S...,17713.0,4.516457
2,91905,Boulevard,CA,130.91,"{'rings': [[[-12936268.4075063, 3850327.339341...",1.0,Mountain Top Market & Gas,1780.0,5.617978
3,91906,Campo,CA,155.88,"{'rings': [[[-12952835.6198649, 3855658.061591...",4.0,Cameron Corners;Circle K;Sinclair;Subway,3860.0,10.362694
4,91910,Chula Vista,CA,11.93,"{'rings': [[[-13032484.0253008, 3849772.356766...",52.0,7-Eleven;ampm;Burger King;Canada Steak Burger;...,79329.0,6.554980
...,...,...,...,...,...,...,...,...,...
104,92155,San Diego,CA,0.23,"{'rings': [[[-13041507.8170711, 3852542.901068...",1.0,Subway,1399.0,7.147963
105,92173,San Ysidro,CA,3.59,"{'rings': [[[-13027361.2248301, 3834756.011745...",27.0,7-Eleven;Burger King;Carl's Jr.;CARL'S JR. 110...,29792.0,9.062836
106,92182,San Diego,CA,0.28,"{'rings': [[[-13031764.7900486, 3866068.797411...",2.0,Chipotle Mexican Grill;The Habit Burger Grill,112.0,178.571429
107,92536,Aguanga,CA,94.93,"{'rings': [[[-13004733.3119469, 3972594.856524...",,,3893.0,


In [28]:
%%time
# Convert back from a SEDF into a feature layer Collection, and publishing on AGOL
feature_layer_collection = final_merged.spatial.to_featurelayer(title="San Diego HPF Normalized Density Per 10K w/o SQMI Norm", 
                                                         gis=gis, 
                                                         folder='nourish_gis',
                                                         tags=['San Diego HPF Normalized Density'],
                                                        )
feature_layer_collection

CPU times: user 250 ms, sys: 26.1 ms, total: 276 ms
Wall time: 10.9 s
