## Notebook for validating a GeMS database one rule at a time
I use this for testing. You can programmatically write errors into the database on purpose to check that the tool finds them.

Each section starts with copying a source database into a scratch workspace and then running the function(s) in the Validate Database tool for that rule and printing the results. This step is in the first code cell after the rule header. For testing, I then modify the database as necessary and then run the function(s) again, which I imagine most people using this notebook would not care to do.  

Notes:
1. add the folder paths and path to the database in the cell below by deleting the commented part of the line (starting with `#` after the `=` sign), browsing to the folder or database in the Catalog window, and click-and-dragging the item to the space after the `=` sign.
2. `arcpy.management.Delete(gdb_c)` at the end of some sections is not always necessary. If the `copy` command at the beginning of a section fails, and there is no call to delete the database at the end, try adding it.
3. most rule functions return a list, the first three items of which are used to build headers and anchors in the report htmls. Items beyond that will be the errors.
4. if you edit any of the scripts imported (modules renamed as `vd`, `gdef`, `guf`, `alc`) while Pro is open, you need to reload them before you run the code cell again. After your edits, add the line `reload(vd)`, for example, to the top of the cell and try again.

In [None]:
# change these variables
gdb = # path\to\geodatabase\or\geopackage
scratch = # path\to\writable\scratch\space
scripts = # path\to\scripts\folder\of\toolbox

In [None]:
# imports
import sys
from pathlib import Path
sys.path.append(scripts_folder)
import GeMS_ValidateDatabase as vd
import GeMS_Definition as gdef
import GeMS_utilityFunctions as guf
import GeMS_ALaCarte as alc
from importlib import reload
import re


In [None]:
# HTML is built into some function results for nice rendering in the validate reports
# but we'll just remove html tags for display in this notebook
CLEANR = re.compile('<.*?>') 
def clean(raw_html):
  cleantext = re.sub(CLEANR, "", raw_html)
  return cleantext

In [None]:
# set up some starting variables
gdb_n = Path(gdb).name
gdb_c = f"{scratch}\\{gdb_n}"
if Path(gdb).suffix == ".gpkg":
    is_gpkg = True
else:
    is_gpkg = False
    
# path to the reference GeoMaterialDict file
ref_gmd = scripts_folder /  "GeoMaterialDict.csv"

### Rule 2.1 - Has required elements: nonspatial tables DataSources, DescriptionOfMapUnits, GeoMaterialDict; feature dataset GeologicMap with feature classes ContactsAndFaults and MapUnitPolys

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
errors, topology_pairs, sr_warnings = vd.rule2_1(d, is_gpkg)

print("MISSING")
for err in errors[3:]: 
    print(f"  {clean(err)}")
    
print("TOPOLOGY PAIRS")
for tp in topology_pairs:
    print(f"  {tp}") 
    
print("SPATIAL REFERENCE WARNINGS")
if len(sr_warnings):
    for warn in sr_warnings: print(f"  {clean(warn)}")

In [None]:
# remove elements and check results
for n in ("DataSources", "DescriptionOfMapUnits", "GeoMaterialDict", "GeologicMap", "ContactsAndFaults", "MapUnitPolys"):
    if Path(gdb_c).exists:
        arcpy.management.Delete(gdb_c)
    arcpy.management.Copy(gdb, gdb_c)
    db_dict = vd.guf.gdb_object_dict(gdb_c)
    if n in db_dict:
        arcpy.management.Delete(db_dict[n]['catalogPath'])
        del db_dict[n]
    errors = vd.rule2_1(db_dict, is_gpkg)[0]
    print("MISSING")
    for err in errors[3:]: print(f"  {clean(err)}")

The test below checks whether a "topology pair" can be found. A topology pair is any pair of similarly named `ContactsAndFaults` and `MapUnitPolys` feature classes. For example, if a feature class called `SurficialContactsAndFaults` is found, there should also be a `SurficialMapUnitPolys`. In the case of file geodatabases, the pairs need not be inside a feature dataset. There can be multiple topology pairs. For example, a single file gdb or geopackage could have a `Surficial` pair and a `Bedrock` pair, the requirement being that both feature classes have the same prefix and suffix.

In [None]:
# change the name of a required element
# if the name change only includes a suffix or prefix, the tool should still identify 
# the table as a GeMS object
arcpy.management.Copy(gdb, gdb_c)
d = guf.gdb_object_dict(gdb_c)

caf = d["ContactsAndFaults"]["catalogPath"]
new_caf = f"{caf}_2"
arcpy.management.Rename(caf, new_caf)
d = vd.guf.gdb_object_dict(gdb_c)
errors = vd.rule2_1(d, is_gpkg)[0]

print("MISSING")
for err in errors[3:]: print(f"  {clean(err)}")

arcpy.management.Delete(gdb_c)

### Rule 2.2 - Required fields within required elements are present and correctly defined

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
errors, schema_extensions, sr_warnings = vd.check_fields(d, 2, [])

print("ERRORS")
for err in errors: print(f"  {clean(err)}")
print("EXTENSIONS")
for ext in schema_extensions: print(f"  {clean(ext)}")          
print("FIELD WARNINGS")
for warn in sr_warnings: print(f"  {clean(warn)}")

In [None]:
# change the names of some fields, delete others
change = {"MapUnit": "mapunit",
          "Type": "Type2",
          "HierarchyKey": "HKEY"}
delete = ("ExistenceConfidence", "Label")

arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
for k,v in d.items():
    table = v['catalogPath']
    if "fields" in v:
        flds = [f.name for f in v["fields"]]
        # can't use AlterField on geopackages
        # test first.
        if not is_gpkg:
            for a in change:
                if a in flds:
                    arcpy.management.AlterField(table, a, change[a])
                    
        for n in delete:
            if n in flds:
                arcpy.management.DeleteField(table, n)
                
d = vd.guf.gdb_object_dict(gdb_c)
errors, schema_extensions, warnings = vd.check_fields(d, 2, [])

print("MISSING")
for err in errors[3:]: print(f"  {clean(err)}")
print("EXTENSIONS")
for ext in schema_extensions: print(f"  {clean(ext)}")
print("WARNINGS")
for warn in warnings: print(f"  {clean(warn)}")

### Rule 2.3 - GeologicMap topology: no internal gaps or overlaps in MapUnitPolys, boundaries of MapUnitPolys are covered by ContactsAndFaults

In [None]:
# make a copy
arcpy.management.Copy(gdb, gdb_c)
# check for existing Topology.gdb
t_path = Path(scratch) / "Topology.gdb"
if t_path.exists:
    arcpy.management.Delete(str(t_path))

In [None]:
# noodle around with the topology OR NOT IF YOU ARE CHECKING AN UNMODIFIED DATABASE
# change names of topo pairs, etc.
d = guf.gdb_object_dict(gdb_c)
topo_pairs = vd.rule2_1(d, is_gpkg)[1]

level2_results = vd.check_topology(d, scratch, False, topo_pairs)[0]

print("TOPOLOGY ERRORS")
for err in level2_results[3:]: print(f"  {clean(err)}")

### Rule 2.4 - All map units in MapUnitPolys have entries in DescriptionOfMapUnits table

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
missing, all_map_units, fds_map_units = vd.check_map_units(d, 2, [], {})

print("MISSING")
for miss in missing[3:]: print(f"  {clean(missing)}\n")
print("ALL LEVEL 2 MAP UNITS")
print(", ".join(set(all_map_units)), "\n")
print("MAP UNITS IN DMU, EACH FEATURE DATASET")
for k,v in fds_map_units.items():
    print(k) 
    print(", ".join(v), "\n")

In [None]:
# add a MapUnit to MapUnitPolys that is not in the DMU
mup = d["MapUnitPolys"]["catalogPath"]
with arcpy.da.UpdateCursor(mup, "MapUnit") as cursor:
    for i,row in enumerate(cursor):
        if i == 0:
            row[0] = "foo"
        if i == 1:
            row[0] = "bar"
        cursor.updateRow(row)
        
missing, all_map_units, fds_map_units = vd.check_map_units(d, 2, [], {})

print("MISSING")
for miss in missing[3:]: print(f"{clean(miss)}\n")
print("ALL LEVEL 2 MAP UNITS")
print(", ".join(set(all_map_units)), "\n")
print("MAP UNITS IN DMU, EACH FEATURE DATASET")
for k,v in fds_map_units.items():
    print(k) 
    print(", ".join(v), "\n")

### Rule 2.5 - No duplicate MapUnit values in DescriptionOfMapUnit table

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
dups = guf.get_duplicates(dmu, "Mapunit")

print("DUPLICATES")
print(", ".join(dups))

In [None]:
# copy a MapUnit value in the DMU
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
with arcpy.da.UpdateCursor(dmu, "MapUnit", where_clause="MapUnit is not null" ) as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            mu = row[0]
        if i == 1:
            row[0] = mu
        cursor.updateRow(row)
        
dups = guf.get_duplicates(dmu, "Mapunit")
print("DUPLICATES")
print(", ".join(dups))

### Rule 2.6 - Certain field values within required elements have entries in Glossary table

In [None]:
importlib.reload(vd)
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
missing_glossary_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

print("MISSING")
for miss in missing_glossary_terms[3:]: print(f"{clean(miss)}")
print("\n")
print("ALL GLOSSARY TERMS")
print(", ".join(all_gloss_terms))

In [None]:
# required element can be renamed but gems_equivalent is correctly assigned
# and the fields are still checked
caf = d["ContactsAndFaults"]['catalogPath']
arcpy.management.Rename(caf, f"{caf}_2")
d = guf.gdb_object_dict(gdb_c)

# investigate 'gems_equivalent', un-comment the next two lines
# for k,v in d.items():
#     print(k, v["gems_equivalent"])
missing_glossary_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

print("MISSING")
for miss in missing_glossary_terms[3:]: print(f"{clean(miss)}")
print("\n")
print("ALL GLOSSARY TERMS")
print(", ".join(all_gloss_terms))

In [None]:
# reset a value in a required field
arcpy.management.Copy(gdb, gdb_c)
d = guf.gdb_object_dict(gdb_c)
caf = d["ContactsAndFaults"]['catalogPath']
with arcpy.da.UpdateCursor(caf, "Type") as cursor:
    for i,row in enumerate(cursor):
        if i == 0:
            row[0] = "foobar"
            cursor.updateRow(row)
missing_glossary_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

print("MISSING")
for miss in missing_glossary_terms[3:]: print(f"{clean(miss)}")
print("\n")
print("ALL GLOSSARY TERMS")
print(", ".join(all_gloss_terms))

### Rule 2.7 - No duplicate Term values in Glossary table

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
glo = d["Glossary"]["catalogPath"]
dups = guf.get_duplicates(glo, "Term")

print("DUPLICATES")
print(", ".join(dups))

In [None]:
# Copy one of the terms in Glossary
arcpy.management.Copy(gdb, gdb_c)
d = guf.gdb_object_dict(gdb_c)
glo = d["Glossary"]['catalogPath']
with arcpy.da.UpdateCursor(glo, "Term") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            term = row[0]
        if i == 1:
            row[0] = term
        cursor.updateRow(row)
d = guf.gdb_object_dict(gdb_c)
dups = guf.get_duplicates(glo, "Term")

print("DUPLICATES")
print(", ".join(dups))

### Rule 2.8 - All SourceID values in required elements have entries in DataSources table

In [None]:
# check the rule on the unmodified database
importlib.reload(vd)
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
errors, all_sources = vd.sources_check(d, 2, [])

print("ERRORS")
for err in errors[3:]: print(f"  {clean(err)}\n")

In [None]:
# add a DataSourceID that is not in DataSources
caf = d["ContactsAndFaults"]["catalogPath"]
with arcpy.da.UpdateCursor(caf, "DataSourceID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "foobar"
            cursor.updateRow(row)
d = guf.gdb_object_dict(gdb_c)

errors, all_sources = vd.sources_check(d, 2, [])

print("ERRORS")
for err in errors[3:]: print(f"  {clean(err)}\n")

### Rule 2.9 - No duplicate DataSources_ID values in DataSources table

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
ds = d["DataSources"]["catalogPath"]
dups = guf.get_duplicates(ds, "DataSources_ID")

print("DUPLICATES")
print(", ".join(dups))

In [None]:
# add a duplicate DataSource_ID
with arcpy.da.UpdateCursor(ds, "DataSources_ID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            val = row[0]
        if i == 1:
            row[0] = val
        cursor.updateRow(row)

duplicates = guf.get_duplicates(ds, "DataSources_ID")

print("DUPLICATES")
print(", ".join(dups))

### Rule 3.1 - Table and field definitions conform to GeMS schema

In [None]:
# check the rule on the unmodified database
importlib.reload(vd)
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
errors, schema_extensions, warnings = vd.check_fields(d, 3, [])

print("MISSING")
for err in errors[3:]: print(f"  {clean(err)}")
print("EXTENSIONS")
for ext in schema_extensions: print(f"  {clean(ext)}")
print("WARNINGS")
for warn in warnings: print(f"  {clean(warn)}")

In [None]:
# add an optional GeMS-defined feature class
if not is_gpkg:
    fd = "GeologicMap"
    sr = d["GeologicMap"]["spatialReference"].name   
    fc = "OverlayPolys"
else:
    fd = "#"
    sr = d["MapUnitPolys"]["spatialReference"].name
    fc = "OverlayPolys"

vt = arcpy.ValueTable(3)
vt.addRow(f"{fd} {sr} {fc}")
alc.process(gdb_c, vt)

In [None]:
# delete required fields from this optional feature class
d = guf.gdb_object_dict(gdb_c)
fc = "OverlayPolys"
delete_fields = ["Type", "Label"]
for f in delete_fields:
    arcpy.management.DeleteField(d[fc]['catalogPath'], f)
d = guf.gdb_object_dict(gdb_c)

errors = vd.check_fields(d, 3, [])[0]

print("MISSING")
for err in errors[3:]: print(f"  {clean(err)}")

In [None]:
# add a required field but with the wrong length, and type. Again, we're not checking for nullable fields
arcpy.management.Copy(gdb, gdb_c)
d = guf.gdb_object_dict(gdb_c)
if not is_gpkg:
    fd = "GeologicMap"
    sr = d["GeologicMap"]["spatialReference"].name   
    fc = "OverlayPolys"
else:
    fd = "#"
    sr = d["MapUnitPolys"]["spatialReference"].name
    fc = "OverlayPolys"
    
vt = arcpy.ValueTable(3)
vt.addRow(f"{fd} {sr} {fc}")
alc.process(gdb_c, vt)
d = guf.gdb_object_dict(gdb_c)
arcpy.management.DeleteField(d["OverlayPolys"]['catalogPath'], "Label")
# set length and f_type separately
# length only considered if type is text
length = 25
f_type = "float"  # "text"
arcpy.management.AddField(d["OverlayPolys"]['catalogPath'], "Label", f_type, field_length=length)
d = guf.gdb_object_dict(gdb_c)

errors = vd.check_fields(d, 3, [])[0]

print("MISSING")
for err in errors[3:]: print(f"  {clean(err)}")

### Rule 3.2 - All map-like feature datasets obey topology rules. No MapUnitPolys gaps or overlaps. No ContactsAndFaults overlaps, self-overlaps, or self-intersections. MapUnitPoly boundaries covered by ContactsAndFaults

In [None]:
# make a copy
arcpy.management.Copy(gdb, gdb_c)
# check for existing Topology.gdb
t_path = Path(scratch) / "Topology.gdb"
if t_path.exists:
    arcpy.management.Delete(str(t_path))

In [None]:
# noodle around with the topology OR NOT IF YOU ARE CHECKING AN UNMODIFIED DATABASE
# change names of topo pairs, etc.
d = guf.gdb_object_dict(gdb_c)
topo_pairs = vd.rule2_1(d, is_gpkg)[1]
level3_results = vd.check_topology(d, scratch, False, topo_pairs)[1]

print("TOPOLOGY ERRORS")
for err in level3_results[3:]: print(f"  {clean(err)}")

### Rule 3.3 - No missing required values

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

errors, warnings = vd.rule3_3(d)

print("MISSING VALUES")
for err in errors[3:]: print(f"  {clean(err)}")
    
print("MISSING WARNINGS")
for warn in warnings[1:]: print(f"  {clean(warn)}")

In [None]:
# delete a couple values from a NoNulls field
# and a couple values from a non-critical NoNulls field
# eg, FieldID in Stations in which nulls maybe shouldn't 
# exist but won't break compliancy if they do
mup = d["MapUnitPolys"]["catalogPath"]
with arcpy.da.UpdateCursor(mup, "MapUnit") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = None
        if i == 1:
            row[0] = None
        cursor.updateRow(row)

sta = d["Stations"]["catalogPath"]
with arcpy.da.UpdateCursor(sta, "FieldID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = None
        if i == 1:
            row[0] = None
        cursor.updateRow(row)
        
errors, warnings = vd.rule3_3(d)

print("MISSING VALUES")
for err in errors[3:]: print(f"  {clean(err)}")
    
print("MISSING WARNINGS")
for warn in warnings[1:]: print(f"  {clean(warn)}")

### Rule 3.4 - No missing terms in Glossary

`defined_term_fields_list = (
    "Type",
    "ExistenceConfidence",
    "IdentityConfidence",
    "ParagraphStyle",
    "GeoMaterialConfidence",
    "ErrorMeasure",
    "AgeUnits",
    "LocationMethod",
    "ScientificConfidence",
)`

Values in `defined_term_fields` fields not found in Glossary are errors. Values in non-defined fields that end in `type`, `confidence`, or `method` that are not found in Glossary are warnings

In [None]:
importlib.reload(vd)
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# first get all glossary terms from a level 2 check
missing_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

# and then use that in a level 3 check
missing_terms, all_gloss_terms, warnings = vd.glossary_check(d, 3, all_gloss_terms)

print("MISSING GLOSSARY TERMS")
for miss in missing_terms[3:]: print(f"  {clean(miss)}")
print("WARNINGS")
for warn in warnings: print(f"{clean(warn)}")

In [None]:
# add some terms not found in the glossary
# first, add a gems-like field. Values here should get reported as warnings
cart = d["CartographicLines"]["catalogPath"]
arcpy.management.AddField(cart, 'CartoMethod', 'TEXT')
d = vd.guf.gdb_object_dict(gdb_c)

# add terms to required field in a non-core table and a gems-like field that is not in the glossary
with arcpy.da.UpdateCursor(cart, ["Type", "CartoMethod"]) as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "foobar"
            row[1] = "found it on a map"
        cursor.updateRow(row)

# first get all glossary terms from a level 2 check
missing_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

# and then use that in a level 3 check
missing_terms, all_gloss_terms, warnings = vd.glossary_check(d, 3, all_gloss_terms)

print("MISSING TERMS")
for miss in missing_terms[3:]: print(f"  {clean(miss)}")
print("WARNINGS")
for warn in warnings: 
    warn = warn.replace("\n", "")
    warn = " ".join(warn.split())
    print(f"  {clean(warn)}")

### Rule 3.5 - No unnecessary terms in Glossary

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# need to run glossary_check at levels 2 and 3 to get all_gloss_terms
# first get all glossary terms from a level 2 check
missing_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

# and then use that in a level 3 check
missing_terms, all_gloss_terms, warnings = vd.glossary_check(d, 3, all_gloss_terms)

unused = vd.rule3_5_and_7(d, "glossary", all_gloss_terms)
print("UNUSED TERMS")
for term in unused[3:]: print(f" {clean(term)}")

In [None]:
# add a value to the Glossary that is not used anywhere
gloss = d["Glossary"]["catalogPath"]
with arcpy.da.UpdateCursor(gloss, "Term") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "foobar"
        cursor.updateRow(row)

# need to run glossary_check at levels 2 and 3 to get all_gloss_terms
# first get all glossary terms from a level 2 check
missing_terms, all_gloss_terms = vd.glossary_check(d, 2, [])

# and then use that in a level 3 check
missing_terms, all_gloss_terms, warnings = vd.glossary_check(d, 3, all_gloss_terms)

unused = vd.rule3_5_and_7(d, "glossary", all_gloss_terms)
print("UNUSED TERMS")
for term in unused[3:]: print(f" {clean(term)}")

### Rule 3.6 - No missing sources in DataSources

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# first run sources_check at level 2 to collect all_sources from required core elements
all_sources = []
missing_ids, all_sources = vd.sources_check(d, 2, all_sources)

# then, run at level 3 to check the rest
missing_ids, all_sources = vd.sources_check(d, 3, all_sources)

print("MISSING DATASOURCES")
for miss in missing_ids[3:]: print(f"  {clean(miss)}")

In [None]:
# add a data source to a non-core GeMS table that is not in DataSources
sta = d["Stations"]["catalogPath"]
with arcpy.da.UpdateCursor(sta, "DataSourceID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "DASfoobar"
        # delete a DataSourceID
        if i == 1:
            row[0] = None
        cursor.updateRow(row)
        
# first run sources_check at level 2 to collect all_sources from required core elements
all_sources = []
missing_ids, all_sources = vd.sources_check(d, 2, all_sources)

# then, run at level 3 to check the rest
missing_ids, all_sources = vd.sources_check(d, 3, all_sources)

print("MISSING DATASOURCES")
for miss in missing_ids[3:]: print(f"  {clean(miss)}")

### Rule 3.7 - No unnecessary sources in DataSources

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# need to run sources_check at levels 2 and 3 to get all_sources
# first get all sources from a level 2 check
all_sources = []
missing_ids, all_sources = vd.sources_check(d, 2, all_sources)

# then, run at level 3 to check the rest
missing_ids, all_sources = vd.sources_check(d, 3, all_sources)

# then check rule3_5_and_7
unused = vd.rule3_5_and_7(d, "datasources", all_sources)

print("UNUSED TERMS")
for ds in unused[3:]: print(f" {clean(ds)}")

In [None]:
# add a value to the Glossary that is not used anywhere
ds = d["DataSources"]["catalogPath"]
with arcpy.da.UpdateCursor(ds, "DataSources_ID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "DASfoobar"
        cursor.updateRow(row)

# need to run sources_check at levels 2 and 3 to get all_sources
# first get all sources from a level 2 check
all_sources = []
missing_ids, all_sources = vd.sources_check(d, 2, all_sources)

# then, run at level 3 to check the rest
missing_ids, all_sources = vd.sources_check(d, 3, all_sources)

# then check rule3_5_and_7
unused = vd.rule3_5_and_7(d, "datasources", all_sources)

print("UNUSED TERMS")
for ds in unused[3:]: print(f" {clean(ds)}")

### Rule 3.8 - No map units without entries in DescriptionOfMapUnits and

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# first, run check_map_units at level 2 which collects map units from MapUnitPolys
all_map_units = []
fds_map_units = {}
msgs, all_map_units, fds_map_units = vd.check_map_units(d, 2, all_map_units, fds_map_units)

# and then at level 3 to extend all_map_units with units from all tables with 'MapUnit'
msgs3_8, msgs3_9, all_map_units, fds_map_units, mu_warnings = vd.check_map_units(d, 3, all_map_units, fds_map_units)

print("MISSING MAPUNITS")
for mu in msgs3_8[3:]:
    mu = " ".join(mu.split())
    print(f"  {clean(mu)}")
    
print("WARNINGS")
for warn in mu_warnings: 
    warn = " ".join(warn.split())
    print(f"  {clean(warn)}")

In [None]:
# first, run check_map_units at level 2 which collects map units from MapUnitPolys
all_map_units = []
fds_map_units = {}
msgs, all_map_units, fds_map_units = vd.check_map_units(d, 2, all_map_units, fds_map_units)

# add a random map unit to a non-core element
clines = d["Stations"]["catalogPath"]
with arcpy.da.UpdateCursor(clines, ["MapUnit", "ObservedMapunit"]) as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "foo"
        if i == 1:
            row[1] = "bar"
        cursor.updateRow(row)

# run again at level 3 to extend all_map_units with units from all tables with 'MapUnit'
msgs3_8, msgs3_9, all_map_units, fds_map_units, mu_warnings = vd.check_map_units(d, 3, all_map_units, fds_map_units)

print("MISSING MAPUNITS")
for mu in msgs3_8[3:]:
    mu = " ".join(mu.split())
    print(f"  {clean(mu)}")
    
print("WARNINGS")
for warn in mu_warnings: 
    warn = " ".join(warn.split())
    print(f"  {clean(warn)}")

### Rule 3.9 - No unnecessary MapUnits in DescriptionOfMapUnits

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

msgs3_8, msgs3_9, all_map_units, fds_map_units, mu_warnings = vd.check_map_units(d, 3, all_map_units, fds_map_units)

print("MISSING MAPUNITS")
for mu in msgs3_9[3:]:
    mu = " ".join(mu.split())
    print(f"  {clean(mu)}")

In [None]:
# add an extra map unit to DescriptionOfMapUnits
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
with arcpy.da.UpdateCursor(dmu, "MapUnit") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "foobar"
        cursor.updateRow(row)
        
msgs3_8, msgs3_9, all_map_units, fds_map_units, mu_warnings = vd.check_map_units(d, 3, all_map_units, fds_map_units)

print("UNUSED MAPUNITS")
for mu in msgs3_9[3:]:
    mu = " ".join(mu.split())
    print(f"  {clean(mu)}")


### Rule 3.10 - HierarchyKey values in DescriptionOfMapUnits are unique and well formed

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
hkey_errors, hkey_warnings = vd.rule3_10(d)

print("HKEY ERRORS")
for err in hkey_errors[3:]: print(f"  {clean(err)}")

print("HKEY WARNINGS")
for warn in hkey_warnings[1:]: print(f"  {clean(warn)}")

In [None]:
# take a look at the HierarchyKeys
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
hkeys = [r[0] for r in arcpy.da.SearchCursor(dmu, "HierarchyKey")]
hkeys.sort()
for hkey in hkeys: print(hkey)

In [None]:
# add a weird HierarchyKey
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
with arcpy.da.UpdateCursor(dmu, "HierarchyKey") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "1/2"
        cursor.updateRow(row)

hkey_errors, hkey_warnings = vd.rule3_10(d)

print("HKEY ERRORS")
for err in hkey_errors[3:]: print(f"  {clean(err)}")

print("HKEY WARNINGS")
for warn in hkey_warnings[1:]: print(f"  {clean(warn)}")

### Rule 3.11 - All values of GeoMaterial are defined in GeoMaterialDict. GeoMaterialDict is as specified in the GeMS standard

In [None]:
# check the rule on the unmodified database
arcpy.management.Delete(gdb_c)
importlib.reload(vd)
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

errors = vd.rule3_11(d, str(ref_gmd))
print("GEOMATERIAL ERRORS")
for err in errors[3:]: print(f"  {clean(err)}")

In [None]:
# delete a geomaterial and a definition from GeoMaterialDict
# finding a null value in GeoMaterialDict causes the rule function 
# to return early so no other checks are made. Skip this cell
# to get to the other checks
gmd = d["GeoMaterialDict"]["catalogPath"]
with arcpy.da.UpdateCursor(gmd, ["GeoMaterial", "Definition"]) as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = None
        if i == 1:
            row[1] = None
        cursor.updateRow(row)
        
errors = vd.rule3_11(d, str(ref_gmd))
print("GEOMATERIAL ERRORS")
for err in errors[3:]: print(f"  {clean(err)}")

In [None]:
# change a geomaterial and a definition in GeoMaterial
gmd = d["GeoMaterialDict"]["catalogPath"]
with arcpy.da.UpdateCursor(gmd, ["GeoMaterial", "Definition"]) as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "choss"
        if i == 1:
            row[1] = "I believe this is some kind of rock"
        cursor.updateRow(row)
        
errors = vd.rule3_11(d, str(ref_gmd))
print("GEOMATERIAL ERRORS")
for err in errors[3:]: print(f"  {clean(err)}")

In [None]:
# add a weird geomaterial to DMU
dmu = d["DescriptionOfMapUnits"]["catalogPath"]
with arcpy.da.UpdateCursor(dmu, "GeoMaterial") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = "choss"
        cursor.updateRow(row)
        
errors = vd.rule3_11(d, str(ref_gmd))
print("GEOMATERIAL ERRORS")
for err in errors[3:]: print(f"  {clean(err)}")

### Rule 3.12 - No duplicate \_ID values

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)
dups = vd.rule3_12(d)

print("DUPLICATES")
for dup in dups[3:]: print(f"  {clean(dup)}")

In [None]:
# duplicate an _ID value
importlib.reload(vd)
caf = d["ContactsAndFaults"]["catalogPath"]
with arcpy.da.UpdateCursor(caf, "ContactsAndFaults_ID") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            val = row[0]
        if i == 1:
            row[0] = val
        cursor.updateRow(row)

dups = vd.rule3_12(d)

print("DUPLICATES")
for dup in dups[3:]: print(f"  {clean(dup)}")

### Rule 3.13 - No zero-length or whitespace-only strings

In [None]:
# check the rule on the unmodified database
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

zero_length_strings, leading_trailing_spaces = vd.rule3_13(d)

print("ZERO LENGTH STRINGS")
for zero in zero_length_strings[3:]: print(f"  {clean(zero)}")
    
print("LEAD/TRAILING WHITESPACE")
for lead in leading_trailing_spaces[1:]: print(f"  {clean(lead)}")

In [None]:
# add some bad null values
caf = d["ContactsAndFaults"]["catalogPath"]
with arcpy.da.UpdateCursor(caf, "Type") as cursor:
    for i, row in enumerate(cursor):
        if i == 0:
            row[0] = ""
        if i == 1:
            row[0] = " "
        if i == 2:
            row[0] = "<NULL>"
        cursor.updateRow(row)

results = vd.rule3_13(d)
zero_length_strings, leading_trailing_spaces = vd.rule3_13(d)

print("ZERO LENGTH STRINGS")
for zero in zero_length_strings[3:]: print(f"  {clean(zero)}")
    
print("LEAD/TRAILING WHITESPACE")
for lead in leading_trailing_spaces[1:]: print(f"  {clean(lead)}")

### Check for editor tracking

### List extra tables and fields

In [None]:
# make a copy
arcpy.management.Copy(gdb, gdb_c)
d = vd.guf.gdb_object_dict(gdb_c)

# collect the extra fields as logged by check_fields for each level
extra_fields = []
errors, extra_fields, fld_warnings = vd.check_fields(d, 2, extra_fields)
errors, extra_fields, fld_warnings = vd.check_fields(d, 3, extra_fields)
all_extras = vd.extra_tables(d, extra_fields)

for extra in all_extras:
    print(clean(extra))