# Introduction to the Spatially Enabled DataFrame

The [`Spatially Enabled DataFrame`](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#spatialdataframe) (SEDF) creates a simple, intutive object that can easily manipulate geometric and attribute data.

<blockquote>
    New at version 1.5, the Spatially Enabled DataFrame is an evolution of the <code>SpatialDataFrame</code> object that you may be familiar with. While the <code>SDF</code> object is still avialable for use, the team has stopped active development of it and is promoting the use of this new Spatially Enabled DataFrame pattern. The SEDF provides you better memory management, ability to handle larger datasets and is the pattern that Pandas advocates as the path forward.</blockquote>

The Spatially Enabled DataFrame inserts a custom namespace called `spatial` into the popular [Pandas](https://pandas.pydata.org/) [DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) structure to give it spatial abilities. This allows you to use intutive, pandorable operations on both the attribute and spatial columns. Thus, the SEDF is based on data structures inherently suited to data analysis, with natural operations for the filtering and inspecting of subsets of values which are fundamental to statistical and geographic manipulations.

The dataframe reads from many **sources**, including shapefiles, [Pandas](https://pandas.pydata.org/) [DataFrames](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe), feature classes, GeoJSON, and Feature Layers.

This document outlines some fundamentals of using the `Spatially Enabled DataFrame` object for working with GIS data.

In [None]:
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor

## Accessing GIS data
GIS users need to work with both published layers on remote servers (web layers) and local data, but the ability to manipulate these datasets without permanently copying the data is lacking.  The `Spatial Enabled DataFrame` solves this problem because it is an in-memory object that can read, write and manipulate geospatial data.

The SEDF integrates with Esri's [`ArcPy` site-package](http://pro.arcgis.com/en/pro-app/arcpy/get-started/what-is-arcpy-.htm) as well as the open source [`pyshp`](https://github.com/GeospatialPython/pyshp/), [`shapely`](https://github.com/Toblerity/Shapely) and [`fiona`](https://github.com/Toblerity/Fiona) packages. This means the ArcGIS API for Python SEDF can use either of these geometry engines to provide you options for easily working with geospatial data regardless of your platform.  The SEDF transforms data into the formats you desire so you can use Python functionality to analyze and visualize geographic information.

Data can be read and scripted to automate workflows and just as easily visualized on maps in [`Jupyter notebooks`](../using-the-jupyter-notebook-environment/). The SEDF can export data as feature classes or publish them directly to servers for sharing according to your needs. Let's explore some of the different options available with the versatile `Spatial Enabled DataFrame` namespaces:

### Reading Web Layers

[`Feature layers`](https://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm) hosted on [**ArcGIS Online**](https://www.arcgis.com) or [**ArcGIS Enterprise**](http://enterprise.arcgis.com/en/) can be easily read into a Spatially Enabled DataFrame using the  [`from_layer`](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html?highlight=from_layer#arcgis.features.GeoAccessor.from_layer) method. Once you read it into a SEDF object, you can create reports, manipulate the data, or convert it to a form that is comfortable and makes sense for its intended purpose.

**Example: Retrieving an ArcGIS Online [`item`](https://developers.arcgis.com/rest/users-groups-and-items/publish-item.htm) and using the [`layers`](https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#layer) property to inspect the first 5 records of the layer**

In [None]:
from arcgis import GIS
gis = GIS(profile="your_online_profile")
item = gis.content.get("c0a74e5332d443b299d804ee3cbd3cd3")
flayer = item.layers[0]

# create a Spatially Enabled DataFrame object
sdf = pd.DataFrame.spatial.from_layer(flayer)
sdf.head()

When you inspect the `type` of the object, you get back a standard pandas `DataFrame` object. However, this object now has an additional `SHAPE` column that allows you to perform geometric operations. In other words, this `DataFrame` is now geo-aware.

In [None]:
type(sdf)

Further, the `DataFrame` has a new `spatial` property that provides a list of geoprocessing operations that can be performed on the object. The rest of the guides in this section go into details of how to use these functionalities. So, sit tight.

### Reading Feature Layer Data

As seen above, the SEDF can consume a `Feature Layer` served from either ArcGIS Online or ArcGIS Enterprise orgs. Let's take a step-by-step approach to break down the notebook cell above and then extract a subset of records from the feature layer.

#### Example: Examining Feature Layer content

Use the `from_layer` method on the SEDF to instantiate a data frame from an item's `layer` and inspect the first 5 records.

In [None]:
# Retrieve an item from ArcGIS Online from a known ID value
known_item = gis.content.get("c0a74e5332d443b299d804ee3cbd3cd3")
known_item

In [None]:
# Obtain the first feature layer from the item
fl = known_item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(fl)

# Return the first 5 records. 
sdf.head()

> NOTE: See Pandas DataFrame [`head() method documentation`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html) for details.

You can also use sql queries to return a subset of records by leveraging the ArcGIS API for Python's [`Feature Layer`](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#featurelayer) object itself. When you run a [`query()`](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#arcgis.features.FeatureLayer.query) on a `FeatureLayer`, you get back a `FeatureSet` object. Calling the `sdf` property of the `FeatureSet` returns a Spatially Enabled DataFrame object. We then use the data frame's [`head()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.head.html#pandas.core.groupby.GroupBy.head) method to return the first 5 records and a subset of columns from the DataFrame:

#### Example: Feature Layer Query Results to a Spatially Enabled DataFrame
We'll use the amenity column to query the data frame and return a new `DataFrame` with a subset of records. We can use the built-in [`zip()`](https://docs.python.org/3/library/functions.html#zip) function to print the data frame attribute field names, and then use data frame syntax to view specific attribute fields in the output:

In [None]:
# Filter feature layer records with a sql query. 
# See https://developers.arcgis.com/rest/services-reference/query-feature-service-layer-.htm

df = fl.query(where="amenity = 'bar'").sdf

In [None]:
sdf["amenity"].unique()

Fancy way to show columns :

In [None]:
for a,b,c,d in zip(df.columns[::4], df.columns[1::4],df.columns[2::4], df.columns[3::4]):
    print("{:<30}{:<30}{:<30}{:<}".format(a,b,c,d))

In [None]:
# Return a subset of columns on just the first 5 records
df[['amenity', 'name', 'addr_housenumber','addr_street',"SHAPE"]]

# What can we do with this filtered dataframe ?

## Accessing local GIS data
The ArcGIS API for Python uses either [`shapely`](https://pypi.org/project/Shapely/) or [`arcpy`](https://www.esri.com/en-us/arcgis/products/arcgis-python-libraries/libraries/arcpy) as back-ends (engines) for processing geometries. The API is identical no matter which engine you use. However, at any point in time, only one engine will be used. 

[__ArcPy__]((https://www.esri.com/en-us/arcgis/products/arcgis-python-libraries/libraries/arcpy)) provides a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. With `arcpy` as the geometry engine, you can read/write different file types, perform various geometric operations and do a lot more without needing multiple other third-party packages that perform such operations. 


By default, the ArcGIS API for Python looks for `arcpy` as the geometry engine. In the absence of `arcpy`, it looks for `shapely`. The ArcGIS API for Python integrates the [Shapely](https://pypi.org/project/Shapely/), [Fiona](https://pypi.org/project/Fiona/), and [PyShp](https://pypi.org/project/pyshp/) packages so that spatial data from other sources can be accessed through the API. This makes it easier to use the ArcGIS API for Python and work with geospatial data regardless of the platform used. However, we recommend using `arcpy` for better accuracy and support for a wider gamut of data sources. Here is a one-line overview of each of these packages:

 - [Shapely](https://pypi.org/project/Shapely/) is used for the manipulation and analysis of geometric objects. 
 - [Fiona](https://pypi.org/project/Fiona/) can read and write real-world data using multi-layered GIS formats, including Esri File Geodatabase. It is often used in combination with Shapely so that Fiona is used for creating the input and output, while Shapely does the data wrangling part. 
 - [PyShp](https://pypi.org/project/pyshp/) is used for reading and writing ESRI shapefiles.
 <div class="alert alert-info">
    <b>Note:</b> In the absence of <code>arcpy</code>, the ArcGIS API for Python looks for a <code>shapely</code> geometry engine. To allow for a seamless experience, both <a href="https://pypi.org/project/Shapely/">Shapely</a> and <a href="https://pypi.org/project/Fiona/">Fiona</a> packages must be present in your current conda environment. If these packages are not installed, you may install them using <code>conda</code> as follows:
    
<code>conda install shapely
conda install fiona</code>
</div>
    
### Example: Reading a Shapefile
> You must authenticate to `ArcGIS Online` or `ArcGIS Enterprise` to use the `from_featureclass()` method to read a shapefile with a Python interpreter that does not have access to `ArcPy`.

>  `g2 = GIS("https://www.arcgis.com", "username", "password")`

It could be that both `arcpy` and `shapely` are __not__ present in your current environment. In such a scenario, the number of spatial operations you could perform using SeDF will be extremely limited. The cell below shows how to easily detect the current geometry engine in your environment.

In [None]:
import imp
try:
    if imp.find_module('arcpy'):
        print("Has arcpy")
    elif imp.find_module('shapely'):
        print("Has shapely")
    elif imp.find_module('arcpy') and imp.find_module('shapely'):
        print("Has both arcpy and shapely")
except:
    print("Does not have either arcpy or shapely")

In [None]:
sdf = pd.DataFrame.spatial.from_featureclass("..\data\osmFoodDrinks.shp")
sdf.tail()

### Example: Reading a Featureclass from FileGDB

> You must have `fiona` installed if you use the `from_featureclass()` method to read a feature class from FileGDB with a Python interpreter that does not have access to `ArcPy`.


In [None]:
sdf = pd.DataFrame.spatial.from_featureclass("..\data\Berlin.gdb\osm_food_drinks")
sdf.head()

## Saving Spatially Enabled DataFrames

The SEDF can export data to various data formats for use in other applications.


### Export Options

- [Feature Layers](https://doc.arcgis.com/en/arcgis-online/share-maps/hosted-web-layers.htm)
- [Feature Collections](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#featurelayercollection)
- [Feature Set](https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#featureset)
- [GeoJSON](http://geojson.org/)
- [Feature Class](http://desktop.arcgis.com/en/arcmap/latest/manage-data/feature-classes/a-quick-tour-of-feature-classes.htm)
- [Pickle](https://pythontips.com/2013/08/02/what-is-pickle-in-python/)
- [HDF](https://support.hdfgroup.org/HDF5/Tutor/HDF5Intro.pdf)

### Export to Feature Class

The SEDF allows for the export of whole datasets or partial datasets.  

#### Example: Export a whole dataset to a shapefile:

In [None]:
sdf.spatial.to_featureclass(location=r"..\data\copy.shp")

> The ArcGIS API for Python installs on all `macOS` and `Linux` machines, as well as those `Windows` machines not using Python interpreters that have access to `ArcPy` will only be able to write out to shapefile format with the `to_featureclass` method. Writing to file geodatabases requires the `ArcPy` site-package.

#### Example: Export dataset with a subset of columns and top 5 records to a shapefile:

In [None]:
for a,b,c,d in zip(sdf.columns[::4], sdf.columns[1::4], sdf.columns[2::4], sdf.columns[3::4]):
    print("{:<30}{:<30}{:<30}{:<}".format(a,b,c,d))

In [None]:
columns = ['NAME', 'ST', 'CAPITAL', 'STFIPS', 'POP2000', 'POP2007', 'SHAPE']
sdf[columns].head().spatial.to_featureclass(location=r"..\data\Berlin.gdb")

#### Example: Export dataset to a featureclass in FileGDB:

In [None]:
sdf.spatial.to_featureclass("..\data\Berlin.gdb\copy")

### Publish as a Feature Layer

The SEDF allows for the publishing of datasets as feature layers.  

#### Example: Publishing as a feature layer:

In [None]:
lyr = sdf.spatial.to_featurelayer("BerlinOSMdevsummit")
lyr