# GeoDB Files

   -- From: [Library of Congress, Digital Format](https://www.loc.gov/preservation/digital/formats/fdd/fdd000294.shtml)

The GeoDB, ESRI Geodatabase (File-based) or GeoDB_File is a subtype of the GeoDB. 
The GeoDB is the primary data storage model for ArcGIS. 
It is a container of spatial and attribute data and enables the user to store many different types of GIS data within its structure. Its structure is implemented either in an RDBMS or as a collection of files in a file system. As an implementation of the GeoDB data model, the GeoDB_File is designed to:

 * Provide a widely available, simple, and scalable geodatabase solution for all users
 * Provide a portable geodatabase that works across operating systems
 * Scale up to provide fast performance for very large datasets, e.g., those containing well over 300 million features or scale beyond 500 gigabytes per file
 * Use an efficient data structure optimized for performance and storage that allows users to compress vector data to a read-only format, and uses about one third less storage space by comparison with shapefiles and personal geodatabases
 * Improve performance by comparison with shapefiles for operations involving attributes, such as classification or creating overlays

## GeoDB Specifics and Limits





For example, here is a Florida Coastline file. A command line interface (CLI) tool, ogrinfo, that comes from GDAL can read the file and see that it is a collection of layers.
```BASH
 ogrinfo RH_SampleData.gdb
Had to open data source read-only.
INFO: Open of `RH_SampleData.gdb'
      using driver `OpenFileGDB' successful.
1: Centerline (3D Multi Line String)
2: Calibration_Point (3D Measured Point)
3: Redline (Multi Line String)
4: Route (None)
5: Centerline_Sequence (None)
6: LRSN_MilePoint (3D Measured Multi Line String)
7: AADT (None)
8: Access_Control (None)
9: Base_Thickness (None)
10: Crashes (None)
11: F_System (None)
12: Speed_Limit (None)
13: LRSE_Access_Control (3D Measured Multi Line String)
14: LRSE_Crashes (3D Measured Point)
15: LRSE_Speed_Limit (3D Measured Multi Line String)
16: LRSE_Functional_Class (3D Measured Multi Line String)
17: LRSI_MilePoint_Intersections (Point)
18: LRSN_RefMarker (3D Measured Multi Line String)
19: LRSE_Base_Thickness (3D Measured Multi Line String)
20: LRSE_AADT (3D Measured Multi Line String)
21: Redline__ATTACH (None)
```

If we look inside:
```BASH
ls RH_SampleData.gdb
a00000001.freelist                           a0000002f.gdbtablx
a00000001.gdbindexes                         a0000002f.ix_FROM_DATE.atx
a00000001.gdbtable                           a0000002f.ix_ROUTE_ID.atx
a00000001.gdbtablx                           a0000002f.ix_TO_DATE.atx
a00000001.TablesByName.atx                   a00000030.gdbindexes
a00000002.gdbtable                           a00000030.gdbtable
a00000002.gdbtablx                           a00000030.gdbtablx
  .
  .
  .
a0000002e.gdbindexes                         a00000041.ix_TO_DATE.atx
a0000002e.gdbtable                           a00000041.spx
a0000002e.gdbtablx                           a00000042.GDB_60_REL_OBJECTID.atx
a0000002e.ix_FROM_DATE.atx                   a00000042.gdbindexes
a0000002e.ix_ROUTE_ID.atx                    a00000042.gdbtable
a0000002e.ix_TO_DATE.atx                     a00000042.gdbtablx
a0000002f.gdbindexes                         gdb
a0000002f.gdbtable                           timestamps
```





## Library access
Most open source libraries and software that interacts with GeoDB files rely on the [GDAL](http://www.gdal.org/) library, specifically its [OGR](http://gdal.org/1.11/ogr/) component.

Software is often built using GDAL to access the data formats, including Raster and Vector formats.
The software may include thick client software such as [GRASS GIS](https://grass.osgeo.org) or libraries such as Fiona (python geospatial data IO).

### Fiona

The example below uses the Fiona library to open and walk through the layers of the Shapefile.

In [None]:
import fiona
GEODATA_FILE = '/dsa/data/geospatial/RH_SampleData.gdb'
numLayers = len(fiona.listlayers(GEODATA_FILE))
print("'{}' has {} layers".format(GEODATA_FILE,numLayers))

In [None]:
for i, name in enumerate(fiona.listlayers(GEODATA_FILE)):
    with fiona.open(GEODATA_FILE, layer=i) as current_layer:
        print("[{}/{}] Layer {} has {} features".format((i+1),numLayers,name,len(current_layer)))

Let's look at one of the layers to try to decompose it a little and inspect it.

We can look at layer 17 (from the 0-20 list), which above is labeled `[18/21]`.
First, examine the `type`

In [None]:
with fiona.open(GEODATA_FILE, layer=17) as current_layer:
    print(type(current_layer))

So, we see that a layer is a Collection.  
Collections are traditionally iteratable in Python, and therefore suitable in the `for x in collection:` syntax.

Let's see what is in our collection!

In [None]:
with fiona.open(GEODATA_FILE, layer=17) as current_layer:
    for feature in current_layer:
        print(type(feature))
        break # stop processing the features after this point

So, our collection is a list of dictionaries.

Let's look at the first one!

In [None]:
import json
with fiona.open(GEODATA_FILE, layer=17) as current_layer:
    for feature in current_layer:
        print(feature)
        print("-------------------------------")
        print(json.dumps(feature, indent=2))
        break # stop processing the features after this point

So, our element of the layer is a geometric feature that has the following:
 * `geometry`
 * `id`
 * `type`
 * `properties`

Note, that the `geometry` has  `type`="MultiLineString" and `coordinates` (as a X,Y,Z). In this case, Z is elevation, however it looks to be always zero.

Fiona is a great low-level tool for walking through data and doing data carpentry!

However, there is a higher-level library that leverages Fiona, and therefore GDAL, to get you a well structured representation of the data.

### GeoPandas

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import geopandas as gpd
geo_df = gpd.read_file(GEODATA_FILE, layer=17)
geo_df.head()

In [None]:
# plotting stuff
geo_df.plot(figsize=(15,15))

Read more about Fiona [here](https://github.com/Toblerity/Fiona).   
Read more about GeoPandas [here](http://geopandas.org/).

# Save Your Notebook
## Then Notebook Menu:  File > Close and Halt