# ESRI Shapefiles

   -- From: [Library of Congress, Digital Format](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml)

The ESRI Shapefile (known here as the ESRI Shapefile format), stores nontopological geometry and attribute information for the spatial features in a data set. 
A shapefile consists minimally of a main file, an index file, and a dBASE table.

In the main file, the geometry for a feature is stored as a shape comprising a set of vector coordinates. 
This main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices. 
In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. 
Attributes are held in a dBASE format file. 
The dBASE table contains feature attributes with one record per feature. 
Attribute records in the dBASE file must be in the same order as records in the main file. 
Each attribute record has a one-to-one relationship with the associated shape record.

The shapefile format can support point, line, and area features. 
Area features are represented as closed loop, double-digitized polygons.


## Shapefile Specifics and Limits

A shapefile is generally composed of these three expected subordinate files:
 * .shp – this file stores the geometry of the feature (main file)
 * .shx – this file stores the index of the geometry (index file)
 * .dbf – this file stores the attribute information for the feature (dBASE Table)

Some other files may also be present depending on the generating application.
 
A key aspect of the Shapefile is that it can be a collection of layers!
Each layer becomes a 3-tuple of the files above.

For example, here is a Florida Coastline file. A command line interface (CLI) tool, ogrinfo, that comes from GDAL can read the file and see that it is a collection of layers.
```BASH
 ogrinfo florida_coast
INFO: Open of `florida_coast'
      using driver `ESRI Shapefile' successful.
1: fl_transects_lt (Line String)
2: fl_baseline (Line String)
3: fl1855_1895 (Line String)
4: fl_transects_st (Line String)
5: fl1998_2001 (Line String)
6: fl_nourish (Line String)
7: fl1976_1979 (Line String)
8: fl1926_1953 (Line String)
9: fl_intersects (Point)
```

If we look inside:
```BASH
ls florida_coast/
fl1855_1895.avl      fl1976_1979.dbf      fl_baseline.shp.xml    fl_nourish.shp.xml
fl1855_1895.dbf      fl1976_1979.prj      fl_baseline.shx        fl_nourish.shx
fl1855_1895.prj      fl1976_1979.shp      fl_error.avl           fl_transects_lt.avl
fl1855_1895.shp      fl1976_1979.shp.xml  fl_intersects.avl      fl_transects_lt.dbf
fl1855_1895.shp.xml  fl1976_1979.shx      fl_intersects.dbf      fl_transects_lt.prj
fl1855_1895.shx      fl1998_2001.avl      fl_intersects.prj      fl_transects_lt.shp
fl1926_1953.avl      fl1998_2001.dbf      fl_intersects.shp      fl_transects_lt.shp.xml
fl1926_1953.dbf      fl1998_2001.prj      fl_intersects.shp.xml  fl_transects_lt.shx
fl1926_1953.prj      fl1998_2001.shp      fl_intersects.shx      fl_transects_st.avl
fl1926_1953.sbn      fl1998_2001.shp.xml  fl_nourish.avl         fl_transects_st.dbf
fl1926_1953.sbx      fl1998_2001.shx      fl_nourish.dbf         fl_transects_st.prj
fl1926_1953.shp      fl_baseline.avl      fl_nourish.prj         fl_transects_st.shp
fl1926_1953.shp.xml  fl_baseline.dbf      fl_nourish.sbn         fl_transects_st.shp.xml
fl1926_1953.shx      fl_baseline.prj      fl_nourish.sbx         fl_transects_st.shx
fl1976_1979.avl      fl_baseline.shp      fl_nourish.shp
```


Shapefile format is over 20 years old, but still widely used.
However, there are some format limitations that are present that rarely exist in the new formats or spatial database extensions.
  * Do not support names in fields longer than 10 characters
  * Cannot store date and time in the same field
  * Do not store NULL values in a field; when a value is NULL, a shapefile will use 0 instead


## Library access
Most open source libraries and software that interacts with Shapefiles rely on the [GDAL](http://www.gdal.org/) library, specifically its [OGR](http://gdal.org/1.11/ogr/) component.

Software is often built using GDAL to access the data formats, including Raster and Vector formats.
The software may include thick client software such as GRASS GIS or libraries such as Fiona (python geospatial data IO).

### Fiona

The example below uses the Fiona library to open and walk through the layers of the Shapefile.

In [None]:
import fiona
GEODATA_FILE = '/dsa/data/geospatial/florida_coast'
numLayers = len(fiona.listlayers(GEODATA_FILE))
print("'{}' has {} layers".format(GEODATA_FILE,numLayers))

In [None]:
for i, name in enumerate(fiona.listlayers(GEODATA_FILE)):
    with fiona.open(GEODATA_FILE, layer=i) as current_layer:
        print("[{}/{}] Layer {} has {} features".format((i+1),numLayers,name,len(current_layer)))

Let's look at one of the layers to try to decompose it a little and inspect it.

We can look at layer 5 (from the 0-8 list), which above is labeled `[6/9]`.
First, examine the `type`

In [None]:
with fiona.open(GEODATA_FILE, layer=5) as current_layer:
    print(type(current_layer))

So, we see that a layer is a Collection.  
Collections are traditionally iteratable in Python, and therefore suitable in the `for x in collection:` syntax.

Let's see what is in our collection!

In [None]:
with fiona.open(GEODATA_FILE, layer=5) as current_layer:
    for feature in current_layer:
        print(type(feature))

So, our collection is a list of dictionaries.

Let's look at the first one!

In [None]:
import json
with fiona.open(GEODATA_FILE, layer=5) as current_layer:
    for feature in current_layer:
        print(feature)
        print("-------------------------------")
        print(json.dumps(feature, indent=2))
        break # stop processing the features after this point

So, our element of the layer is a geometric feature that has the following:
 * `geometry`
 * `properties`
 * `id`
 * `type`

Note, that the `geometry` has `coordinates` (as list of X,Y) and `type`="LineString".

Fiona is a great low-level tool for walking through data and doing data carpentry!

However, there is a higher-level library that leverages Fiona, and therefore GDAL, to get you a well structured representation of the data.

### GeoPandas

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import geopandas as gpd
geo_df = gpd.read_file(GEODATA_FILE, layer=5)
geo_df.head()

In [None]:
# plotting stuff
geo_df.plot(figsize=(15,15))

Read more about Fiona [here](https://github.com/Toblerity/Fiona).   
Read more about GeoPandas [here](http://geopandas.org/).

# Save Your Notebook
## Then Notebook Menu:  File > Close and Halt