### Objective:
* Get Aggregated Data from `ca_business`, nourish db table. The businesses should be `fast food restaurants` and `Convenience store`
* Create a layer using the aggregated count on each business. We could do this by joining this with our base california zip code feature layer.

 Strategy to identify Food desert:

* Total number of HPF serving businesses( pizza restaurants, ice cream shops, liquor stores etc.) / Total Food serving Businesses.
    * Find out all the HPF food serving businesses from ca_business table with examples such as.
        * Liquor store
        * Ice cream shop
        * Pizza restaurant
        * Pizza Takeout
        * Sports bar
        * Brewpub
        * Brewery
        * Candy store
        * Hamburger restaurant
        * Chicken wings restaurant
        * Sports bar
        * Dessert shop
        * Donut shop
        * Cocktail bar
        * Wine bar
        * Wine store
      


In [1]:
import pandas as pd

import psycopg2

import arcgis
from arcgis.gis import GIS
from arcgis.features import FeatureLayer, FeatureLayerCollection

import sys
sys.path.append('../../')
sys.path.append('../../../')

from gis_resources import execute_sql, create_where_clause, san_diego_county_zips, read_exact_food_biz_categories, read_exact_unhealthy_food_biz_categories
from utils import get_config

In [2]:
username = get_config("arcgis","username")
password = get_config("arcgis","passkey")
gis = GIS("https://ucsdonline.maps.arcgis.com/home", username=username, password=password)

<configparser.ConfigParser object at 0x106a3c0d0>
<configparser.ConfigParser object at 0x106a3cb50>


In [3]:
nourish_user = get_config("nourish_db","username")
nourish_pswd = get_config("nourish_db","passkey")

<configparser.ConfigParser object at 0x106a3ca60>
<configparser.ConfigParser object at 0x106a3c340>


In [4]:
conn = psycopg2.connect(
    host="awesome-hw.sdsc.edu",
    database="nourish",
    user=nourish_user,
    password=nourish_pswd)


#### Finding out all the Food Business from ca_business

In [5]:
# qry_where = create_where_clause(filtered_food["categories"].tolist())
qry_where = create_where_clause(read_exact_food_biz_categories())
qry = "WITH store_names AS " \
            "(SELECT DISTINCT name AS dist_names " \
             "FROM ca_business " \
        f"{qry_where}, " \
            "hpf_stores AS " \
                "(SELECT * FROM ca_business INNER JOIN store_names ON ca_business.name = store_names.dist_names) " \
        "SELECT zip, count(*) as count, array_agg(distinct name) as restaurants " \
        "FROM hpf_stores " \
        "group by zip " \
        "order by count desc, zip;"

In [6]:
foodDF = pd.DataFrame(execute_sql(conn, qry),
              columns=("zip_code","biz_cnt", "biz_names")
              );
foodDF['biz_names']=foodDF.biz_names.apply(lambda x: ';'.join(x))
# drop NaN values from column 'zip_code'
foodDF = foodDF.dropna(subset=['zip_code'])

# Convert Flow to remove decimal places
foodDF['zip_code'] = foodDF['zip_code'].astype(int)
# Convert to String
foodDF['zip_code'] = foodDF['zip_code'].astype(str)
foodDF

Unnamed: 0,zip_code,biz_cnt,biz_names
0,92101,932,1010 Caffe;10 Barrel Brewing San Diego;12th & ...
1,94110,717,20 Spot;23rd liquor store;23rd & Mission Produ...
2,94103,608,1601 Bar & Kitchen;18 Rabbits;3rd Street Tap R...
4,94109,499,1608 Bistro;1760;707 Sutter;721 Lounge;800 Lar...
5,94102,492,20th Century Cafe;2G Japanese Brasserie;398 Br...
...,...,...,...
1811,98887,1,Grumpy Bears Retreat
1812,98940,1,A Mi Estilo Mexican Restaurant
1813,98960,1,Reyes Market
1814,99321,1,Queen Bean Caffé Crafton Hills College


#### Finding all the unhealthy businesses

In [7]:
qry_where = create_where_clause(read_exact_unhealthy_food_biz_categories())
qry = "WITH store_names AS " \
            "(SELECT DISTINCT name AS dist_names " \
             "FROM ca_business " \
        f"{qry_where}, " \
            "hpf_stores AS " \
                "(SELECT * FROM ca_business INNER JOIN store_names ON ca_business.name = store_names.dist_names) " \
        "SELECT zip, count(*) as count, array_agg(distinct name) as restaurants " \
        "FROM hpf_stores " \
        "WHERE name NOT IN (SELECT DISTINCT name FROM ca_business WHERE 'Supermarket' = any(categories)) " \
        "group by zip " \
        "order by count desc, zip;"

In [8]:
UHFoodDF = pd.DataFrame(execute_sql(conn, qry),
              columns=("zip_code","biz_cnt", "biz_names")
              );
UHFoodDF['biz_names']=UHFoodDF.biz_names.apply(lambda x: ';'.join(x))
# drop NaN values from column 'zip_code'
UHFoodDF = UHFoodDF.dropna(subset=['zip_code'])

# Convert Flow to remove decimal places
UHFoodDF['zip_code'] = UHFoodDF['zip_code'].astype(int)
# Convert to String
UHFoodDF['zip_code'] = UHFoodDF['zip_code'].astype(str)


UHFoodDF

Unnamed: 0,zip_code,biz_cnt,biz_names
0,92101,490,10 Barrel Brewing San Diego;1919;1st and Ivy M...
1,94103,267,3rd Street Tap Room;54 Mint Ristorante Italian...
2,90028,263,101 Coffee Shop;25 Degrees;33 Taps Hollywood;7...
3,94110,253,20 Spot;24th Street Bar;49'ERS Liquors & Groce...
4,94102,229,2G Japanese Brasserie;620 Jones;7-Eleven;Abe's...
...,...,...,...
1608,96124,1,Lakeside Bar
1609,96128,1,Shell
1610,96134,1,Stateline General Store and Liquor
1611,97701,1,St Helena Wine Co


In [9]:
finalDF = foodDF.merge(UHFoodDF, on='zip_code', how='left')
finalDF = finalDF.rename(columns={'biz_cnt_x':'totfdbizct','biz_cnt_y':'uhfdbizct','biz_names_y':'uhfdbiz'})
finalDF['uhfdbizct'] = finalDF['uhfdbizct'].fillna(0)
finalDF

Unnamed: 0,zip_code,totfdbizct,biz_names_x,uhfdbizct,uhfdbiz
0,92101,932,1010 Caffe;10 Barrel Brewing San Diego;12th & ...,490.0,10 Barrel Brewing San Diego;1919;1st and Ivy M...
1,94110,717,20 Spot;23rd liquor store;23rd & Mission Produ...,253.0,20 Spot;24th Street Bar;49'ERS Liquors & Groce...
2,94103,608,1601 Bar & Kitchen;18 Rabbits;3rd Street Tap R...,267.0,3rd Street Tap Room;54 Mint Ristorante Italian...
3,94109,499,1608 Bistro;1760;707 Sutter;721 Lounge;800 Lar...,213.0,1760;721 Lounge;800 Larkin;Ace's;Amelie San Fr...
4,94102,492,20th Century Cafe;2G Japanese Brasserie;398 Br...,229.0,2G Japanese Brasserie;620 Jones;7-Eleven;Abe's...
...,...,...,...,...,...
1810,98887,1,Grumpy Bears Retreat,1.0,Grumpy Bears Retreat
1811,98940,1,A Mi Estilo Mexican Restaurant,0.0,
1812,98960,1,Reyes Market,0.0,
1813,99321,1,Queen Bean Caffé Crafton Hills College,0.0,


In [10]:
# This routine gives percentage of unhealthy food businesses by total food businesses
def unhealthy_food_biz_pct(uhfoodcnt,totfoodcnt):
    return round((uhfoodcnt*100/totfoodcnt),2)

In [11]:
finalDF['pctuhfdbiz'] = finalDF.apply(lambda x: unhealthy_food_biz_pct(x.uhfdbizct, x.totfdbizct), axis=1)
finalDF

Unnamed: 0,zip_code,totfdbizct,biz_names_x,uhfdbizct,uhfdbiz,pctuhfdbiz
0,92101,932,1010 Caffe;10 Barrel Brewing San Diego;12th & ...,490.0,10 Barrel Brewing San Diego;1919;1st and Ivy M...,52.58
1,94110,717,20 Spot;23rd liquor store;23rd & Mission Produ...,253.0,20 Spot;24th Street Bar;49'ERS Liquors & Groce...,35.29
2,94103,608,1601 Bar & Kitchen;18 Rabbits;3rd Street Tap R...,267.0,3rd Street Tap Room;54 Mint Ristorante Italian...,43.91
3,94109,499,1608 Bistro;1760;707 Sutter;721 Lounge;800 Lar...,213.0,1760;721 Lounge;800 Larkin;Ace's;Amelie San Fr...,42.69
4,94102,492,20th Century Cafe;2G Japanese Brasserie;398 Br...,229.0,2G Japanese Brasserie;620 Jones;7-Eleven;Abe's...,46.54
...,...,...,...,...,...,...
1810,98887,1,Grumpy Bears Retreat,1.0,Grumpy Bears Retreat,100.00
1811,98940,1,A Mi Estilo Mexican Restaurant,0.0,,0.00
1812,98960,1,Reyes Market,0.0,,0.00
1813,99321,1,Queen Bean Caffé Crafton Hills College,0.0,,0.00


Let's get the california zip code layer and do a join to create a new updated layer.  
`TOGO`: We should ideally update the layer but I do not know the process yet and we would not touch the base layer for now.

`Base california Zip code layer`: California Zip Codes 1.2

`FL URL`: https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer

In [12]:
flc = FeatureLayerCollection(gis=gis,
                             url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer")

In [13]:
fs = flc.layers[0].query()

In [14]:
zip_cd_base_sdf = fs.sdf
print(f"Shape: {zip_cd_base_sdf.shape}")
#zip_cd_base_sdf.head(4)


Shape: (1721, 11)


In [15]:
finalDF=pd.merge(zip_cd_base_sdf,finalDF, left_on='ZIP_CODE',right_on='zip_code', how='left')
print(f"Shape: {finalDF.shape}")
#updated_sdf.head(4)

Shape: (1721, 17)


In [16]:
# Let's filter only san diego zips
# sd_zips = san_diego_county_zips()
# updated_sdf = updated_sdf[updated_sdf['ZIP_CODE'].isin(sd_zips)]

In [17]:
# Number of zips with none of our business of interests
finalDF['uhfdbizct'].isna().sum()

95

In [18]:
# Number of zips with our business of interests
finalDF['uhfdbizct'].notna().sum()

1626

In [19]:
# Removing some unnecessary columns
finalDF = finalDF.drop(['FID','zip_code','Shape__Are','Shape__Area','Shape__Len','Shape__Length','OBJECTID','biz_names_x'], axis=1)
finalDF.head(4)

Unnamed: 0,ZIP_CODE,PO_NAME,STATE,SQMI,SHAPE,totfdbizct,uhfdbizct,uhfdbiz,pctuhfdbiz
0,12,Mt Meadows Area,CA,30.92,"{'rings': [[[-13452238.6668297, 4902283.104334...",,,,
1,16,Sequoia National Forest,CA,39.33,"{'rings': [[[-13184703.8666724, 4239963.437913...",,,,
2,17,Northeast Fresno County,CA,564.38,"{'rings': [[[-13221974.1887042, 4503848.451224...",,,,
3,18,Los Padres Ntl Forest,CA,90.83,"{'rings': [[[-13226734.2102427, 4104576.874935...",,,,


In [20]:
# Checking unique Zip codes.
finalDF['ZIP_CODE'].unique().size

1721

In [21]:
%%time
# Convert back from a SEDF into a feature layer Collection, and publishing on AGOL
feature_layer_collection = finalDF.spatial.to_featurelayer(title="CA Unhealthy Food Businesses Percentage By Zip", 
                                                         gis=gis, 
                                                         folder='nourish_gis',
                                                         tags=['CA Unhealthy Food Businesses Percentage By Zip'],
                                                        )
feature_layer_collection

CPU times: user 3.2 s, sys: 170 ms, total: 3.37 s
Wall time: 54 s
