# Tutorial 12-01: File Geodatabases and Python

In this tutorial, we'll explore a file geodatabase using ArcPy.  In this case, in our work with GeoNinjas PythonAnalytics, we've been given a file geodatabase with a schema that we're not familiar with.  We'll use Python to find all the feature classes and tables in the file geodatabase and gather some descriptive information about each.

## Opening a File Geodatabase and Listing the Contents

#### 1.  Import arcpy

You'll start by importing the ArcPy package.  This will allow you to interact with the database pythonically.

In [1]:
import arcpy

#### 2.  Enter the path to the file geodatabase.

File geodatabases are composed of many files in a single folder.  Your entry point to the data included in the file geodatabase will be that folder path.  Start by saving that folder path as a variable.

In [2]:
file_gdb = "./indoors_test.gdb"

Now that you've got your path saved, it's helpful to make sure that it's valid.  You can use `arcpy.Exists()` to ensure that your path is valid.

In [3]:
arcpy.Exists(file_gdb)

True

#### 3.  Set the geodatabase as your workspace

Before you use some of the built-in functions that ArcPy has to list the datasets in this geodatabase, you'll need to set the database as your workspace.  

In [4]:
arcpy.env.workspace = file_gdb

#### 4.  List all the feature classes.

The first thing you might be interested in is what feature classes are in this geodatabase.  ArcPy has a handy function for this with some very simple syntax.

In [5]:
arcpy.ListFeatureClasses()

['GDB_ValidationPointErrors',
 'GDB_ValidationLineErrors',
 'GDB_ValidationPolygonErrors']

That returned a list of feature classes, but it seems odd that the only feature classes in this geodatabase would be validation error feature classes.  If you were to open this in ArcGIS Pro, you'd see several other feature classes.  They are in Feature Datasets though.

#### 5.  List all the feature datasets.

Since you suspect there might be other feature classes in feature datasets, the first step in finding them would be to list all the feature datasets.  ArcPy can do that as well.

In [6]:
arcpy.ListDatasets()

['Indoors', 'Network', 'PrelimNetwork']

#### 6.  List all the tables

In addition to feature classes, you can also list the tables to account for any non-spatial data.

In [7]:
arcpy.ListTables()

['Areas', 'AreaRoles', 'IndoorsConfig']

#### 7.  Iterate through the database and list everything.

You can put all those functions together in a small code block to list all the feature classes and tables in the geodatabase.  You can start by iterating through each dataset and printing every feature class in each dataset.  Then you can list all the feature classes and tables that are not in feature datasets.

In [8]:
# iterate through each feature dataset
for dataset in arcpy.ListDatasets():
    
    # print the dataset name
    print(dataset)
    
    # within each dataset, list all the feature classes
    for fc in arcpy.ListFeatureClasses(feature_dataset=dataset):
        
        # print the name of each feature class
        print(f"  {fc}")
        
# print the names of the feature classes that are not in feature datasets       
for fc in arcpy.ListFeatureClasses():
    print(fc)
    
# print the name of each table
for table in arcpy.ListTables():
    print(table)

Indoors
  Levels
  TrackingZones
  Details
  Zones
  Events
  Sections
  Units
  Sites
  Facilities
  Occupants
  Reservations
Network
  Landmarks
  Transitions
  Pathways
PrelimNetwork
  PrelimTransitions
  PrelimPathways
GDB_ValidationPointErrors
GDB_ValidationLineErrors
GDB_ValidationPolygonErrors
Areas
AreaRoles
IndoorsConfig


Now you've got a pretty good idea of what's in this geodatabase.

## Walk the Geodatabase

The approach you used above is very specific to ArcPy.  There's another approach included in ArcPy that is more consistent with how other packages approach working your way through a directory or database.  One of the nice things about using this approach with ArcPy is that you can use this approach within a single database or folder, but also use it to iterate through multiple databases and folders.

#### 1.  Create a walk object.

Before you're off and running (pun intended) with this `Walk` object, it's worth unpacking how it works.  The syntax can be a little confusing at first because it involves nested for loops.  You can always break down this functionality and go through it step by step though.  Start by creating a `Walk` object on your file geodatabase

In [16]:
fgdb_walker = arcpy.da.Walk(file_gdb)
fgdb_walker

<Workspace Walker object at 0x1ba55985f80>

#### 2.  Use the next() function.

The `Walk` class is a special kind of class in Python called a *generator*.  A generator is a class that you can iterate against (with something like a for-loop.  There is also a built-in function that Python has called `next()`.  When you call `next()` on a generator it will return the next item that would be returned in an iteration.  One of the neat features about generators is that they can keep track of which item they're on.  This will make more sense as we move through this.  Start by calling the `next()` function on your `Walk` object.

In [17]:
next(fgdb_walker)

('./indoors_test.gdb',
 ['Indoors', 'Network', 'PrelimNetwork'],
 ['GDB_ValidationObjectErrors',
  'GDB_ValidationPointErrors',
  'GDB_ValidationLineErrors',
  'GDB_ValidationPolygonErrors',
  'Areas',
  'AreaRoles',
  'IndoorsConfig'])

What we see here is a tuple with three items in it.  The items are:
 - a string representing the path of the file geodatabase
 - a list representing the three feature datasets in the file geodatabase
 - a list representing all the feature classes and tables in the that aren't in a file geodatabase.
 
If you were working through a folder structure, it would be the same concept.  The second item in the tuple would be sub-folders or sub-directories.  The third item would be the files that are in that folder.

Now call `next()` again and see what happens.

In [18]:
next(fgdb_walker)

('C:\\Users\\dav11274\\Desktop\\github\\Top-20-Python\\Exercises\\Chapter 12 - Interacting with Databases\\indoors_test.gdb\\Indoors',
 [],
 ['Levels',
  'TrackingZones',
  'Details',
  'Zones',
  'Events',
  'Sections',
  'Units',
  'Sites',
  'Facilities',
  'Occupants',
  'Reservations'])

Now the `Walk` object has returned another tuple, but with different content.  The first item now shows the path to the file geodatabase AND the first feature dataset.  This indicates that the `Walk` object is working its way through the feature datasets (or sub-directories).  The second item in the tuple is an empty list indicating that there are no further sub-directories in this feature dataset.  The third item lists all the feature classes (or files) in this feature dataset.

Now call `next()` again and see what happens.

In [19]:
next(fgdb_walker)

('C:\\Users\\dav11274\\Desktop\\github\\Top-20-Python\\Exercises\\Chapter 12 - Interacting with Databases\\indoors_test.gdb\\Network',
 [],
 ['Landmarks', 'Transitions', 'Pathways'])

This time, we're getting the same structure but with the next feature dataset.  We saw in the first iteration that there were three feature datasets.  So we would expect to see another one the next time we call it.

In [20]:
next(fgdb_walker)

('C:\\Users\\dav11274\\Desktop\\github\\Top-20-Python\\Exercises\\Chapter 12 - Interacting with Databases\\indoors_test.gdb\\PrelimNetwork',
 [],
 ['PrelimTransitions', 'PrelimPathways'])

That was the last feature dataset to work through.  Now try running `next()` again.

In [21]:
next(fgdb_walker)

StopIteration: 

If you got a **StopIteration:** error, don't worry.  That's expected.  You've reached the end of what you can iterate through in this file geodatabase.  

#### 3.  Write a for-loop with the Walk object.

Now that you understand what's going on with a `Walk` object, you can write a for loop on it.  This is a much more common pattern.  The first thing you'll do is iterate through the tuples that the `Walk` object returns.  Then you can iterate through the the sub-directories or files that are returned in each tuple.

In [24]:
# unpack the tuples from the Walk object
for root, directories, files in arcpy.da.Walk(file_gdb):
    
    # iterate through individual files
    for file in files:
        print(file)

GDB_ValidationObjectErrors
GDB_ValidationPointErrors
GDB_ValidationLineErrors
GDB_ValidationPolygonErrors
Areas
AreaRoles
IndoorsConfig
Levels
TrackingZones
Details
Zones
Events
Sections
Units
Sites
Facilities
Occupants
Reservations
Landmarks
Transitions
Pathways
PrelimTransitions
PrelimPathways


This is great and it gives you all the feature class and table names without having to do your own checking for sub-directories or feature datasets.  One thing that this didn't do is give us the full path to each item though.  Fortunately, you can use the `os` package to combine the root directory and file name to get the full path.

In [26]:
import os

In [27]:
# unpack the tuples from the Walk object
for root, directories, files in arcpy.da.Walk(file_gdb):
    
    # iterate through individual files
    for file in files:
        
        # combine root and file to get the full path
        full_path = os.path.join(root, file)
        
        print(full_path)

./indoors_test.gdb\GDB_ValidationObjectErrors
./indoors_test.gdb\GDB_ValidationPointErrors
./indoors_test.gdb\GDB_ValidationLineErrors
./indoors_test.gdb\GDB_ValidationPolygonErrors
./indoors_test.gdb\Areas
./indoors_test.gdb\AreaRoles
./indoors_test.gdb\IndoorsConfig
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Levels
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\TrackingZones
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Details
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Zones
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Events
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacti

Now you can use these full paths in your data processing or geoprocessing.

## Describing Datasets

Now that you've identified all the contents of a file geodatabase, you can pull some descriptive information about each item.  ArcPy has a class that does just this.  It's called `Describe()`.

#### 1.  Describe a table.

The `Describe()` object pulls a lot of metadata about a table or feature class.  Start by creating a `Describe()` object on a table in the geodatabase.

In [31]:
desc = arcpy.Describe(
    os.path.join(file_gdb, "IndoorsConfig")
)
desc

0,1
catalogPath,./indoors_test.gdb\IndoorsConfig
dataType,Table

0,1
ObjectID,OID
CONFIG_KEY,String
CONFIG_VALUE,String


In [32]:
# check the dataType property
desc.dataType

'Table'

#### 2.  Describe a feature class.

Now do the same with a feature class.

In [35]:
desc = arcpy.Describe(
    os.path.join(file_gdb, "Indoors", "Levels")
)
desc

0,1
catalogPath,./indoors_test.gdb\Indoors\Levels
dataType,FeatureClass
shapeType,Polygon
hasM,False
hasZ,True

0,1
ObjectID,OID
LEVEL_ID,String
NAME,String
NAME_SHORT,String
LEVEL_NUMBER,Integer
FACILITY_ID,String
AREA_GROSS,Double
HEIGHT_RELATIVE,Double
VERTICAL_ORDER,Integer
SHAPE,Geometry

0,1
name (Projected Coordinate System),WGS_1984_Web_Mercator_Auxiliary_Sphere
factoryCode (WKID),3857
linearUnitName (Linear Unit),Meter

0,1
name (Geographic Coordinate System),GCS_WGS_1984
factoryCode (WKID),4326
angularUnitName (Angular Unit),Degree
datumName (Datum),D_WGS_1984

0,1
name (Vertical Coordinate System),WGS_1984
factoryCode (WKID),115700
linearUnitName (Linear Unit),Meter
direction (Direction),1
datumName (Datum),D_WGS_1984


You can see there's a lot more information about the feature class than there was about the table.  ArcPy describes the geometry type and the spatial reference in addition to the field information included in the table description.  This is all useful information when you're iterating through feature classes in a file geodatabase.

## Iterate and Describe

Now you can put these concepts all together and implement some conditional logic on feature classes in a file geodatabase.  A very common geoprocessing pattern in ArcPy is to identify all the feature classes of a given geometry type.  You could identify all the point datasets and buffer them for example.

#### 1.  Iterate through all the contents of a file geodatabase

Use the `Walk()` object to iterate through all the contents of your file geodatabase.

In [36]:
for root, directories, files in arcpy.da.Walk(file_gdb):
    for file in files:
        full_path = os.path.join(root, file)
        print(full_path)

./indoors_test.gdb\GDB_ValidationObjectErrors
./indoors_test.gdb\GDB_ValidationPointErrors
./indoors_test.gdb\GDB_ValidationLineErrors
./indoors_test.gdb\GDB_ValidationPolygonErrors
./indoors_test.gdb\Areas
./indoors_test.gdb\AreaRoles
./indoors_test.gdb\IndoorsConfig
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Levels
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\TrackingZones
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Details
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Zones
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Events
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacti

#### 2.  Identify point datasets.

Use the `Describe()` class to identify any datasets that have a point geometry type.

In [41]:
for root, directories, files in arcpy.da.Walk(file_gdb):
    for file in files:
        full_path = os.path.join(root, file)
        
        # describe the file
        desc = arcpy.Describe(full_path)
        
        # test to see if file is a feature class and it has a point geometry type
        if desc.dataType == 'FeatureClass' and desc.shapeType == 'Point':
            print(full_path)

./indoors_test.gdb\GDB_ValidationPointErrors
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Events
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Indoors\Occupants
C:\Users\dav11274\Desktop\github\Top-20-Python\Exercises\Chapter 12 - Interacting with Databases\indoors_test.gdb\Network\Landmarks


Now that you've identified all the point datasets in this database, you can perform some geoprocessing on those datasets.  We won't be going into that in this chapter, but you could do that on your own.