# Extracting Census Tract data from ArcGIS Online
The **functional** objective of this notebook is to discover and download Census Tract polygons for Durham County from resources hosted on ArcGIS Online. 

The **learning** objective of this notebook, however, is to outline the process of discoverying, querying, and downloading feature data from ArcGIS Online. First, we use the ArcGIS Online web portal to [re]familiarize ourselves with the resources accessible there and how we navigate the various levels of a given dataset. From there, we then repeat the same steps, but from within Python using the ArcGIS Python API.

→ A good resource for learning more is here: https://developers.arcgis.com/python/guide/working-with-feature-layers-and-features/

## Part 1. Working with Data on ArcGIS Online via arcgis.com
One route for getting census data is to look for it online. Here, we navigate to https://arcgis.com and search for `Census Tracts`. When I checked last, that search returned > 30,000 records! So we'd need to refine our search. If we knew the owner of the dataset, we could add `owner:` to our search. We can also filter by **item type** and even filter for **authoritative** datasets.  

###   Searching for content via AGOL: 
_First we'll search for objects in ArcGIS Online and familiarize ourselves with various attributes with our results._
1. Search [ArcGIS Online](https://arcgis.com) for <u>`Census Tracts Areas`</u> owned by <u>`esri_dm`</u>, filtering results for <u>feature layers</u> only.


2. Open the [link](https://www.arcgis.com/home/item.html?id=db3f9c8728dd44e4ad455e0c27a85eea) to the one result.
 * Note the URL for the link, particularly the *id* returned: `db3f9c8728dd44e4ad455e0c27a85eea`
 
 
3. Scroll to the bottom of the page. On the right side, find the [URL](https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Census_Tract_Areas_analysis_trim/FeatureServer) associated with the feature layer and open it in your browser. 
 * Note this page also reveals the item's ID. 
 * This page shows that the feature layer service serves just the one layer: `tracts_trim`.
 
 
4. Open the [link] to the `tracts_trim` feature layer's *REST endpoint*.
 * What attributes are associated with this layer? 
 * How many records can be retrieved at one time from this service? 


5. At the bottom of the page, find the link associated with the [Query](https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Census_Tract_Areas_analysis_trim/FeatureServer/0/query) interface for this layer.


6. In the query interface enter `FIPS LIKE '37063%'` as the *Where clause*. Then scroll to the bottom and click the `Query(GET)` button. 
 * How many records are returned? 
 * Modify the query to return output format as `GeoJSON` and click `Query(GET)` again. 
 
_What we just did was use AGOL to find a layer, access its REST endpoint, and use the REST api to query Census tracts for Durham County, setting the output to be a GeoJSON object. We can copy these results into a text file and convert the GeoJSON to a feature class using ArcGIS Pro's [JSON To Features](https://pro.arcgis.com/en/pro-app/tool-reference/conversion/json-to-features.htm) tool or through Python pakages like Fiona or Geopandas (more on that later...)_

## Part 2. Working with content via the ArcGIS Python API
The ArcGIS Python APIs [GIS module](https://developers.arcgis.com/python/guide/the-gis-module/) allows us to execute the same process above, but from within our coding environment instead of our web browser. Here we explore how that's done, using the opportunity to better understand the structure and working of this powerful API.  

### Step 1. Importing the API's GIS module
To access the API, we need to import it. We aren't accessing any 'premium' content here, so we can authenticate "anonymously":

In [None]:
#Import the GIS object and authenticate
from arcgis import GIS
gis = GIS()

### Step 2. Use the GIS module's [Content Manager](https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#contentmanager) to search AGOL
Instead of clicking on web links in our browser, we'll use one of the helper objects access via the GIS module, namely the [Content Manager](https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#contentmanager), to execute our search. This is done by passing our search terms and our item type filters to the `gis.content.search()` command. ([link to help](https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#arcgis.gis.ContentManager.search))

The code below searches for all "Census Tracts" items and reveals how many items are returned. Here we cap it at 1000 items. We could easily increase the cap, but you get the idea. The `outside_org=True` is not really necessary here as we are using an anonymous login to the GIS object. However, if we signed into, say, our dukeuniv.maps.arcgis.com account when authenticating the GIS object, we'd need the `outside_org = True` statement to extend our search to content not created by fellow Dukies. 

In [None]:
#Use the API's content' helper to search for feature layers with keyword Census and owner is "esri_dm"
results = gis.content.search(query='Census Tracts Areas',
                             max_items=1000,
                             outside_org=True)
#Show the list of results returned
len(results)

Now we'll amend the query to limit our results just to Census Tract *feature layers* that are *owned by "esri_dm"*. 
* Alter the code cell below filling in the same query string we used before for the `query=` option.
* Next, specify the item_type to be a `Feature Layer`. 

In [None]:
#Use the API's content' helper to search for feature layers with keyword Census and owner is "esri_dm"
results = gis.content.search(query='Census Tracts Areas owner:esri_dm',
                             item_type='Feature Layer',
                             outside_org=True)
#Show the complete list of results returned
results

Just the one item returned - same as when we searched via the Web! Phew...

``` ► More info and examples on searching:``` https://developers.arcgis.com/python/guide/accessing-and-creating-content/

---
 

### Step 3. Exploring the item(s) returned.
Just as we did with our browser-based searched results, we'll drill into the item obtained through our search.
* First, we'll extract the one item as its own variable - `tractsItem` - and then examine that object various ways...

In [None]:
#Extract the one returned item in the list to the "tractsItem" variable
tractsItem = results[0]
#Reveal the data type of this object
type(tractsItem)

In [None]:
#We can display the formatted AGOL info on that item:
tractsItem

In [None]:
#Show help documentation on the "arcgis.gis.Item" object
?tractsItem

Or, more detailed documentation on ArcGIS Item object is here:<br>
→ https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#item

* Open this link and view the functions associated with the object. 
 * What does the `content_status` function reveal? 
 * The `id` function? 
 * The `download` function??
_Note that not all these functions will work on this item. Some of them are for modifying the actual feature layer hosted on AGOL, which we don't have privileges to do._ 


* Next, reveal the `id` associated with the item -- and compare that to the one you found by seaching AGOL in your browser? 

In [None]:
#Reveal the id associated with this item
tractsItem.id

---
####  **TIP**: 
A feature layer's item is useful to know because we can use that to access the item directly, i.e., without having to search for it. 

In [None]:
#Extract the Census tracts layer directly, via its ID
other_tractsItem = gis.content.get('db3f9c8728dd44e4ad455e0c27a85eea')
other_tractsItem

---

### Step 4. Accessing layer(s) associated with the selected item. 
In our browser-based search, we continued our exploration of the search result by exploring the REST endpoint of the Feature Layer service. This endpoint address is revealed using the item's `url` function. From here, we could just open the returned link in our browser to list and access the specific feature layers and then query them to return GeoJSON objects...

In [None]:
#Show the URL of the feature server's REST endpoint
print(tractsItem.url)

What we're really after though are the **feature layers** associated with the feature service. We can reveal these with the `layers` function.

In [None]:
#Extract the layer(s) included with this feature service
tractLayers = tractsItem.layers
tractLayers

This returns a list with just one layer in it -- just as we found in the browser based exploration. Now we extract that layer into its own coding object and reveal the data type of this object. 

In [None]:
#Pull the one layer item associated with the service to a new variable
tractsLayer = tractLayers[0]
type(tractsLayer)

We see this object is something called a **"FeatureLayer"**. We can explore the help on this object to see just what we can do with this kind of thing. Note also, however, the *FeatureLayer* object is a subset of the *Layer* object, and thus some of the operations of the Layer object will apply to the FeatureLayer object as well. So, we'd want to investigate the documentation on that object too. (In code speak, this is called *inheritance*: FeatureLayers inherit properies and methods from the Layer object...)

→ More info on the ArcGIS `layer` object: https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.gis.toc.html#layer<br>
→ More info on the ArcGIS `FeatureLayer` object: https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#featurelayer

---

### Step 5. Explore properties of the FeatureLayer object
The [`properties`](https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#arcgis.features.FeatureLayer.properties) function of the FeatureLayers object returns a dictionary of properties. 
* Print the entire `properties` object to expose all the properties...

In [None]:
#Reveal all properties of the feature layer
tractsLayer.properties

* Print specific properties. What is the `service item ID` of our feature layer? Its `name`? Its `capabilities`? 

In [None]:
#Report the tractsLayer "itemID", "name", and "capabilities"
print(tractsLayer.properties.serviceItemId,
      tractsLayer.properties.name,
      tractsLayer.properties.capabilities
     )

* The `fields` property returns another dictionary. Save that to a new variable then iterate through each field and print the field's name

In [None]:
#Iterate through all fields and report the field's name
for fld in tractsLayer.properties.fields:
    print (fld.name)

### Step 5. Interact with our Feature Layer
Moving beyond just the properties, we can apply some of the methods associated with the feature layer. Most methods are for updating data, which we can't do, but we can list unique values and also subset records via a query. 

* List all the unique values in the STATE column in the feature layer

In [None]:
#List the unique values found in the STATE attribute. 
tractsLayer.get_unique_values('STATE')

* Query the records in the Feature Layer

In [None]:
#Subset records that are in Durham Co (FIPS 37063)
qResults = tractsLayer.query(where="FIPS LIKE '37063%'")
type(qResults)

The results of the query is yet another new object: a FeatureSet. So... consult the documentation and see what we can do with it: https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.features.toc.html#featureset
* Report the geometry type of the FeatureSet

In [None]:
qResults.geometry_type

### Step 6. Save our data to a local file!
* The FeatureSet has another function called `save` which allows us to download our queried results.

In [None]:
#Save the selected features to a shapefile
qResults.save(save_location='.',out_name='DurhamTracts')

---
### Success!

We just downloaded our own copy of the Durham County Tracts file. We can now grab any feature layer we've found on ArcGIS Online. Well, it's not always this easy as datasets to have some download limits:

In [None]:
tractsLayer.properties.maxRecordCount

We can get around that restriction by "paging" our download, i.e. downloading in chunks of records, 2000 at a time...

For more info on this process, see ESRI's documentation on querying feature layers:<br>
https://developers.arcgis.com/python/guide/working-with-feature-layers-and-features/#Querying-feature-layers

### Step 7 Analyzing the data here, as a dataframe
Of course, why stop there. We have our data in our coding environment. Let's analyze it!

To facilitate analyses, we can convert our featureset to (1) a list of [Feature](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html) objects or (2) to a [spatial dataframe](https://developers.arcgis.com/python/guide/introduction-to-the-spatially-enabled-dataframe/). 

In [None]:
#Convert feature set to a list of features
features = qResults.features
len(features)

In [None]:
#Grab the first feature
feature= features[0]
type(feature)

In [None]:
feature.get_value('FIPS')

In [None]:
#Convert the feature set data as a dataframe
sdf = qResults.sdf
sdf.head()

* Note the output has a column called "SHAPE". These values are ArcGIS API `geometry` objects. 

#### Analyzing geometry

In [None]:
#Get the value in the first row of the "SHAPE" column
feat = df.loc[0,'SHAPE']
type(feat)

https://esri.github.io/arcgis-python-api/apidoc/html/arcgis.geometry.html#arcgis.geometry.Geometry.get_area

In [None]:
#Get the area, in square miles
feat.get_area(method='GEODESIC',units='MILES')

#### Analyzing age demographics

In [None]:
#Grab the first 10 columns into a new dataframe
ageColsDF = df.iloc[:,:9]

In [None]:
#Summarize those columns
ageColsDF.describe()

In [None]:
#Plot demographics: count within each age group
ageCols.sum()

In [None]:
%matplotlib inline
ageCols.sum().plot(kind='bar');