### Motivation
* This notebook is for exploring USA Census block group boundaries from a 2020 Census
We will try to see if we can use the existing feature layer created from Esri :  
Feature Layer: https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Census_BlockGroups/FeatureServer  
Feature Layer Details: https://esri.maps.arcgis.com/home/item.html?id=2f5e592494d243b0aa5c253e75e792a4
* Using this layer, we would pick San Diego County FIPS data  
* Enrich the Dataframe with consumer spending and other variables

In [15]:
from arcgis.gis import GIS
from arcgis.features import FeatureLayerCollection, FeatureLayer
from arcgis.geoenrichment import enrich, Country
from feature_layer_utils import get_enrichment_variables
import itertools
import sys
sys.path.append('../')
from utils import get_config

In [2]:
app_id = get_config("arcgis","clientid")
gis = GIS("https://ucsdonline.maps.arcgis.com/home", client_id=app_id)

<configparser.ConfigParser object at 0x1095ed760>
Please sign in to your GIS and paste the code that is obtained below.
If a web browser does not automatically open, please navigate to the URL below yourself instead.
Opening web browser to navigate to: https://ucsdonline.maps.arcgis.com/sharing/rest/oauth2/authorize?response_type=code&client_id=Elm5V3upnnV17Q3r&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&state=edVj3JFOZWdnbrH4rX6Oc171dfELEg&allow_verification=false


Enter code obtained on signing in using SAML:  ········




#### Selecting only San Diego County block groups from US block groups

In [3]:
fls = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Census_BlockGroups/FeatureServer"

In [4]:
flc = FeatureLayerCollection(fls)

In [5]:
fl_url = flc.layers[0].url

In [6]:
us_block_grp_fl = FeatureLayer(fl_url)

Selecting FIPS='073' which is for San Diego County as our data

In [7]:
san_diego_county_fip = '073'
imperial_county_fip = '025'

In [8]:
san_diego_county_block_grp_fs = us_block_grp_fl.query(where=f"STATE_FIPS='06' AND COUNTY_FIPS='{san_diego_county_fip}'")

In [9]:
san_diego_county_block_grp_fs_sdf = san_diego_county_block_grp_fs.sdf
print(f"Shape: {san_diego_county_block_grp_fs_sdf.shape}")
san_diego_county_block_grp_fs_sdf.head(4)

Shape: (2057, 14)


Unnamed: 0,OBJECTID,STATE_ABBR,STATE_FIPS,COUNTY_FIPS,STCOFIPS,TRACT_FIPS,BLOCKGROUP_FIPS,FIPS,POPULATION,POP_SQMI,SQMI,Shape__Area,Shape__Length,SHAPE
0,29002,CA,6,73,6073,100,1,60730001001,1197,4788.0,0.25,6.3e-05,0.04273,"{""rings"": [[[-117.188940909064, 32.75880776425..."
1,29003,CA,6,73,6073,100,2,60730001002,1711,5032.4,0.34,8.4e-05,0.052974,"{""rings"": [[[-117.187159908283, 32.75685976468..."
2,29004,CA,6,73,6073,201,1,60730002011,877,4385.0,0.2,4.9e-05,0.036512,"{""rings"": [[[-117.168410904985, 32.75683776569..."
3,29005,CA,6,73,6073,202,1,60730002021,1295,8633.3,0.15,3.8e-05,0.036458,"{""rings"": [[[-117.172296905513, 32.74893776444..."


#### Enrich the feature set for san diego county with variables

In [10]:
market_potential_variables = get_enrichment_variables('market_potential')
demographics_variables = get_enrichment_variables('demographics')
business_variables = get_enrichment_variables('business')

Parsing [Esri Market Potential Data] for market_potential segment!!
	Number of Variables: 1264
Parsing [Esri Demographics] for demographics segment!!
	Number of Variables: 1923
Parsing [Esri Business Data] for business segment!!
	Number of Variables: 99


Since Consumer Spending Variables given in the sheet by the professor does not cover all needed features, let's just give it the whole data collection of `food`

In [11]:
food_data_collection = 'food'

For enrichment, we can provide the feature set or the spatially enabled df.
Let us just pass down the sdf since we have visualized it already.

In [18]:
san_diego_block_groups_enriched_df = enrich(study_areas=san_diego_county_block_grp_fs_sdf,
                                           data_collections=[food_data_collection],
                                           analysis_variables=itertools.chain(*[market_potential_variables,demographics_variables,business_variables]),
                                           gis=gis,
                                       )
print(f"Shape: {san_diego_block_groups_enriched_df.shape}")
san_diego_block_groups_enriched_df.head(4)

Shape: (2057, 4242)


Unnamed: 0,objectid,state_abbr,state_fips,county_fips,stcofips,tract_fips,blockgroup_fips,fips,totpop_cy,pop_sqmi,...,x1130fy_x,x1130fy_a,x1130fy_i,x1002fy_x,x1002fy_a,x1002fy_i,x1003fy_x,x1003fy_a,x1003fy_i,SHAPE
0,29002,CA,6,73,6073,100,1,60730001001,1197,4788.0,...,5477103.0,11293.0,225.0,12946134.0,26693.06,218.0,7469031.0,15400.06,214.0,"{""rings"": [[[-117.18894090906399, 32.758807764..."
1,29003,CA,6,73,6073,100,2,60730001002,1711,5032.4,...,8314027.0,11843.34,236.0,19996145.0,28484.54,233.0,11682118.0,16641.19,231.0,"{""rings"": [[[-117.18715990828298, 32.756859764..."
2,29004,CA,6,73,6073,201,1,60730002011,877,4385.0,...,3146681.0,7167.84,143.0,7516044.0,17120.83,140.0,4369363.0,9952.99,138.0,"{""rings"": [[[-117.168410904985, 32.75683776569..."
3,29005,CA,6,73,6073,202,1,60730002021,1295,8633.3,...,4743282.0,6854.45,137.0,11028782.0,15937.55,130.0,6285500.0,9083.09,126.0,"{""rings"": [[[-117.17229690551298, 32.748937764..."


In [20]:
san_diego_block_groups_enriched_df.to_csv('../resources/full_enriched_san_diego_county_block_groups.csv', index=False)

While convertin the SEDF to a feature layer, code was throwing error: `ValueError: Columns must be same length as key`.
ON further investigation, I realized that there are two columns with same name `totpop_cy`.
So let's drop one of them and keep other one first before creating a featurelayer out of this.

In [33]:
list_cols = list(san_diego_block_groups_enriched_df.T.index.values)
dupl_cols = list(set([x for i,x in enumerate(list_cols) if list_cols.count(x) > 1]))
dupl_cols

['totpop_cy']

In [36]:
len(list_cols)

4241

In [37]:
len(dupl_cols)

1

In [35]:
import collections
print([item for item, count in collections.Counter(list_cols).items() if count > 1])

['totpop_cy']


Looks like, if I use `san_diego_block_groups_enriched_df_cleaned  = san_diego_block_groups_enriched_df.loc[:,~san_diego_block_groups_enriched_df.T.duplicated(keep='last')]
`  
, I would lose all the column which have different column names but the same data. So I am doing this hack to remove only duplicate name column.

In [51]:
san_diego_block_groups_enriched_df['totpop_cy'].T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2047,2048,2049,2050,2051,2052,2053,2054,2055,2056
totpop_cy,1197.0,1711.0,877.0,1295.0,913.0,1261.0,920.0,960.0,804.0,957.0,...,1139.0,6010.0,1837.0,2004.0,2183.0,448.0,1753.0,1736.0,2115.0,1420.0
totpop_cy,1199.0,1692.0,902.0,1283.0,911.0,1251.0,911.0,996.0,841.0,954.0,...,1126.0,6013.0,1823.0,1988.0,2360.0,450.0,1802.0,1906.0,2171.0,1403.0


In [54]:
backup_totpop_cy = san_diego_block_groups_enriched_df['totpop_cy'].T.tail(1).T
print(f"Shape of backup_totpop_cy: {backup_totpop_cy.shape}")
backup_totpop_cy.head(3)

Shape of backup_totpop_cy: (2057, 1)


Unnamed: 0,totpop_cy
0,1199.0
1,1692.0
2,902.0


In [55]:
san_diego_block_groups_enriched_df_cleaned = san_diego_block_groups_enriched_df.drop('totpop_cy', axis=1)
san_diego_block_groups_enriched_df_cleaned = san_diego_block_groups_enriched_df_cleaned.join(backup_totpop_cy)
print(f"Shape of san_diego_block_groups_enriched_df_cleaned: {san_diego_block_groups_enriched_df_cleaned.shape}")
san_diego_block_groups_enriched_df_cleaned.head(3)

Shape of san_diego_block_groups_enriched_df_cleaned: (2057, 4241)


Unnamed: 0,objectid,state_abbr,state_fips,county_fips,stcofips,tract_fips,blockgroup_fips,fips,pop_sqmi,sqmi,...,x1130fy_a,x1130fy_i,x1002fy_x,x1002fy_a,x1002fy_i,x1003fy_x,x1003fy_a,x1003fy_i,SHAPE,totpop_cy
0,29002,CA,6,73,6073,100,1,60730001001,4788.0,0.25,...,11293.0,225.0,12946134.0,26693.06,218.0,7469031.0,15400.06,214.0,"{""rings"": [[[-117.18894090906399, 32.758807764...",1199.0
1,29003,CA,6,73,6073,100,2,60730001002,5032.4,0.34,...,11843.34,236.0,19996145.0,28484.54,233.0,11682118.0,16641.19,231.0,"{""rings"": [[[-117.18715990828298, 32.756859764...",1692.0
2,29004,CA,6,73,6073,201,1,60730002011,4385.0,0.2,...,7167.84,143.0,7516044.0,17120.83,140.0,4369363.0,9952.99,138.0,"{""rings"": [[[-117.168410904985, 32.75683776569...",902.0


In [57]:
san_diego_block_groups_enriched_df_cleaned.to_csv('../resources/full_enriched_san_diego_county_block_groups_cleaned.csv', index=False)

In [56]:
%%time
# Convert back from a SEDF into a feature layer, and publishing on AGOL
san_diego_block_groups_enriched_fl = san_diego_block_groups_enriched_df_cleaned.spatial.to_featurelayer(title="San Diego Block Groups Enriched V1", 
                                                         gis=gis, 
                                                         folder='nourish_gis',
                                                         tags=['SanDiegoCountyBlockGroups','ConsumerSpending','businesses','MarketPotential','Demographics'])


ShapefileException: Shapefile Writer reached maximum number of fields: 2046.

Limit on number of columns on feature layer: `Shapefile Writer reached maximum number of fields: 2046`