### Objective:
* Get Aggregated Data from `ca_business`, nourish db table. The businesses should be `fast food restaurants` and `Convenience store`
* Create a layer using the aggregated count on each business. We could do this by joining this with our base california zip code feature layer.

In [6]:
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
import sqlalchemy as sal
import psycopg2
import arcgis
from arcgis.gis import GIS
from arcgis.features import FeatureLayer, FeatureLayerCollection
import sys
sys.path.append('../../')
from gis_resources import san_diego_county_zips
import os
from utils import get_config

In [58]:
gis = GIS("https://ucsdonline.maps.arcgis.com/home", client_id=get_config("arcgis","clientid"))

<configparser.ConfigParser object at 0x14542a970>
Please sign in to your GIS and paste the code that is obtained below.
If a web browser does not automatically open, please navigate to the URL below yourself instead.
Opening web browser to navigate to: https://ucsdonline.maps.arcgis.com/sharing/rest/oauth2/authorize?response_type=code&client_id=Elm5V3upnnV17Q3r&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&state=mK87eZVgrqBrIimbAuDA30TXNrUOzq&allow_verification=false


Enter code obtained on signing in using SAML:  ········




In [59]:
nourish_user = get_config("nourish_db","username")
nourish_pswd = get_config("nourish_db","passkey")

<configparser.ConfigParser object at 0x145472670>
<configparser.ConfigParser object at 0x14542a6a0>


In [60]:
conn = psycopg2.connect(
    host="awesome-hw.sdsc.edu",
    database="nourish",
    user=nourish_user,
    password=nourish_pswd)


In [45]:
%%time
# create a cursor
cur = conn.cursor()

query_str = """select zip, count(*)
             as count, array_agg(distinct name) as restaurants
            from ca_business
            where ('Fast food restaurant' = any(categories) OR
                   'Convenience store' = any(categories))
            group by zip
            order by count desc, zip"""
# Greater than average
# query_str = """select zip, count(*)
#              as count, array_agg(distinct name) as restaurants
#             from ca_business
#             where ('Fast food restaurant' = any(categories) OR
#                    'Convenience store' = any(categories))
#             group by zip
#             having count(*)
#              > (
#                 select count(*)/(select count(distinct zip) from ca_business)
#                 from ca_business
#                 where ('Fast food restaurant' = any(categories) OR
#                    'Convenience store' = any(categories))
#                 )
#              order by count desc, zip"""

# execute a statement
cur.execute(query_str)


# display the PostgreSQL database server version
ca_business_agg_result = cur.fetchall()
       
# Close the communication with the PostgreSQL
cur.close()

CPU times: user 34.4 ms, sys: 35.1 ms, total: 69.5 ms
Wall time: 453 ms


In [53]:
dataFrame = pd.DataFrame(ca_business_agg_result,
              columns=("zip_code","business_count", "business_names")
              );
dataFrame['business_names']=dataFrame.business_names.apply(lambda x: ';'.join(x))
dataFrame = dataFrame.dropna(subset=['zip_code'])
dataFrame

Unnamed: 0,zip_code,business_count,business_names
0,92376.0,71,7-Eleven;ampm;Baker's Drive-Thru;Burger King;C...
1,91761.0,70,7-Eleven;Alberto's Mexican Food;ampm;Andy's Bu...
2,92345.0,69,7-Eleven;ampm;Arby's;Baker's Drive-Thru;Best F...
3,92101.0,67,7-Eleven;7-Eleven - Closed;BBQ Boss;Best Damn ...
4,92335.0,67,76;7-Eleven;ampm;Baker's Drive-Thru;Burger Kin...
...,...,...,...
1361,96109.0,1,Shell
1362,96128.0,1,Shell
1363,96137.0,1,Dollar General
1364,96141.0,1,Obexer's General Store


In [54]:
# converting 'Weight' from float to int
dataFrame['zip_code'] = dataFrame['zip_code'].astype(int)
dataFrame['zip_code'] = dataFrame['zip_code'].astype(str)

In [55]:
dataFrame.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1365 entries, 0 to 1365
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   zip_code        1365 non-null   object
 1   business_count  1365 non-null   int64 
 2   business_names  1365 non-null   object
dtypes: int64(1), object(2)
memory usage: 42.7+ KB


In [56]:
dataFrame

Unnamed: 0,zip_code,business_count,business_names
0,92376,71,7-Eleven;ampm;Baker's Drive-Thru;Burger King;C...
1,91761,70,7-Eleven;Alberto's Mexican Food;ampm;Andy's Bu...
2,92345,69,7-Eleven;ampm;Arby's;Baker's Drive-Thru;Best F...
3,92101,67,7-Eleven;7-Eleven - Closed;BBQ Boss;Best Damn ...
4,92335,67,76;7-Eleven;ampm;Baker's Drive-Thru;Burger Kin...
...,...,...,...
1361,96109,1,Shell
1362,96128,1,Shell
1363,96137,1,Dollar General
1364,96141,1,Obexer's General Store


Let's get the california zip code layer and do a join to create a new updated layer.  
`TOGO`: We should ideally update the layer but I do not know the process yet and we would not touch the base layer for now.

`Base california Zip code layer`: California Zip Codes 1.2

`FL URL`: https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer

In [61]:
flc = FeatureLayerCollection(gis=gis,
                             url="https://services1.arcgis.com/eGSDp8lpKe5izqVc/arcgis/rest/services/ae9a0c/FeatureServer")

In [62]:
fs = flc.layers[0].query()

In [63]:
zip_cd_base_sdf = fs.sdf
print(f"Shape: {zip_cd_base_sdf.shape}")
zip_cd_base_sdf.head(4)


Shape: (1721, 11)


Unnamed: 0,FID,OBJECTID,ZIP_CODE,PO_NAME,STATE,SQMI,Shape__Are,Shape__Len,Shape__Area,Shape__Length,SHAPE
0,1,1.0,12,Mt Meadows Area,CA,30.92,862157443.22168,195388.61918,137603279.910156,78041.226964,"{""rings"": [[[-13452238.6668297, 4902283.104334..."
1,2,2.0,16,Sequoia National Forest,CA,39.33,1096295677.53442,169790.572348,154003969.648438,63651.498113,"{""rings"": [[[-13184703.8666724, 4239963.437913..."
2,3,3.0,17,Northeast Fresno County,CA,564.38,15734145627.6488,873109.20835,2313715393.57422,334630.782309,"{""rings"": [[[-13221974.1887042, 4503848.451224..."
3,4,4.0,18,Los Padres Ntl Forest,CA,90.83,2532221635.86206,294311.333366,347407381.199219,109006.006242,"{""rings"": [[[-13226734.2102427, 4104576.874935..."


In [64]:
updated_sdf=pd.merge(zip_cd_base_sdf,dataFrame, left_on='ZIP_CODE',right_on='zip_code', how='left')
print(f"Shape: {updated_sdf.shape}")
updated_sdf.head(4)

Shape: (1721, 14)


Unnamed: 0,FID,OBJECTID,ZIP_CODE,PO_NAME,STATE,SQMI,Shape__Are,Shape__Len,Shape__Area,Shape__Length,SHAPE,zip_code,business_count,business_names
0,1,1.0,12,Mt Meadows Area,CA,30.92,862157443.22168,195388.61918,137603279.910156,78041.226964,"{'rings': [[[-13452238.6668297, 4902283.104334...",,,
1,2,2.0,16,Sequoia National Forest,CA,39.33,1096295677.53442,169790.572348,154003969.648438,63651.498113,"{'rings': [[[-13184703.8666724, 4239963.437913...",,,
2,3,3.0,17,Northeast Fresno County,CA,564.38,15734145627.6488,873109.20835,2313715393.57422,334630.782309,"{'rings': [[[-13221974.1887042, 4503848.451224...",,,
3,4,4.0,18,Los Padres Ntl Forest,CA,90.83,2532221635.86206,294311.333366,347407381.199219,109006.006242,"{'rings': [[[-13226734.2102427, 4104576.874935...",,,


In [65]:
# Just Testing how many zip codes have got the counts
updated_sdf['business_count'].isna().sum()

397

In [66]:
updated_sdf = updated_sdf.drop(['zip_code'], axis=1)
updated_sdf.head(4)

Unnamed: 0,FID,OBJECTID,ZIP_CODE,PO_NAME,STATE,SQMI,Shape__Are,Shape__Len,Shape__Area,Shape__Length,SHAPE,business_count,business_names
0,1,1.0,12,Mt Meadows Area,CA,30.92,862157443.22168,195388.61918,137603279.910156,78041.226964,"{'rings': [[[-13452238.6668297, 4902283.104334...",,
1,2,2.0,16,Sequoia National Forest,CA,39.33,1096295677.53442,169790.572348,154003969.648438,63651.498113,"{'rings': [[[-13184703.8666724, 4239963.437913...",,
2,3,3.0,17,Northeast Fresno County,CA,564.38,15734145627.6488,873109.20835,2313715393.57422,334630.782309,"{'rings': [[[-13221974.1887042, 4503848.451224...",,
3,4,4.0,18,Los Padres Ntl Forest,CA,90.83,2532221635.86206,294311.333366,347407381.199219,109006.006242,"{'rings': [[[-13226734.2102427, 4104576.874935...",,


In [68]:
%%time
# Convert back from a SEDF into a feature layer Collection, and publishing on AGOL
feature_layer_collection = updated_sdf.spatial.to_featurelayer(title="California Business Fast Food - Convinience Store Count 3.0", 
                                                         gis=gis, 
                                                         folder='nourish_gis',
                                                         tags=['California Business Fast Food and Convinience Store Count'],
                                                        )
feature_layer_collection

CPU times: user 3.46 s, sys: 265 ms, total: 3.73 s
Wall time: 36.9 s
