This notebook takes a look at the USMIN source of mine symbols pulled from US Topo maps. These data essentially give us more exact locations for mining facilities denoted on maps. There is not, however, anything in the dataset that links us (unambiguously at least) to any other source of information about mines. Fundamentally, USMIN introduces the following to the GeoKB:

* Pointer to a useful classification mechanism we need to incorporate on mining facility types from the 1997 AGI dictionary of mining terms
* Specific, vetted locations for mining facility features
* Names and geo-location context we may be able to use to link to some already known mine concepts along with clues about new mines the GeoKB doesn't know about yet

In [58]:
import requests
import xmltodict
from wbmaker import WikibaseConnection
import geopandas as gpd
import pandas as pd
from io import BytesIO
from zipfile import ZipFile

In [2]:
geokb = WikibaseConnection("GEOKB_CLOUD")

# Mine Facility Classification

One of the key properties in the USMIN topo map symbols dataset is the facility type (ftr_type). In metadata, most of these values point to a 1997 dictionary reference from AGI. It will be useful, in a number of ways, for the GeoKB to be aware of these classifiers. In the following code blocks, we pull metadata and these labels/definitions into the GeoKB.

We need to do some further work from this point:
* subclassification of things like quarries, at least
* consultation back to the AGI source material to see what else is useful for these concepts
* exploration of a broader community effort to tie this into as a better foundation (e.g., ESIP SWEET)

In [11]:
# Get USMIN dataset item and pull metadata for use
usmin_item_url = "https://www.sciencebase.gov/catalog/item/5a1492c3e4b09fc93dcfd574?format=json"
usmin_item_json = requests.get(usmin_item_url).json()
usmin_fgdc_meta_file_item = next((f for f in usmin_item_json["files"] if f["contentType"] == "application/fgdc+xml"), None)
if usmin_fgdc_meta_file_item is not None:
    usmin_fgdc_meta_url = usmin_fgdc_meta_file_item["url"]
    usmin_fgdc_meta_xml = requests.get(usmin_fgdc_meta_url)
    d_usmin_fgdc_meta_xml = xmltodict.parse(usmin_fgdc_meta_xml.text, dict_constructor=dict)

    ftr_type = next((i for i in d_usmin_fgdc_meta_xml["metadata"]["eainfo"]["detailed"]["attr"] if i["attrlabl"] == "Ftr_Type"), None)

In [6]:
agi_ref = geokb.datatypes.Item(
    prop_nr=geokb.prop_lookup["knowledge source"],
    value=geokb.ref_lookup["Dictionary of mining, mineral, and related terms, 2nd Ed."]
)

usmin_ref = geokb.datatypes.Item(
    prop_nr=geokb.prop_lookup["knowledge source"],
    value="Q44146" # Not coming up in SPARQL query currently
)

In [14]:
for typ in ftr_type["attrdomv"]:
    if typ["edom"]["edomv"] != "Mine":
        item = geokb.wbi.item.new()
        item.labels.set('en', typ["edom"]["edomv"])
        item.descriptions.set('en', typ["edom"]["edomvd"].split(".")[0][:250])

        references = geokb.models.References()
        if typ["edom"]["edomvds"] == "American Geological Institute (1997)":
            references.add(agi_ref)
        elif typ["edom"]["edomvds"] == "USGS Authors":
            references.add(usmin_ref)

        item.claims.add(
            geokb.datatypes.Item(
                prop_nr=geokb.prop_lookup["subclass of"],
                value=geokb.class_lookup["mining facility"],
                references=references
            )
        )
        
        response = item.write(
            summary="Added classification of mining facility from USMIN source metadata"
        )
        print(typ["edom"]["edomv"], response.id)

Adit Q44148
Air Shaft Q44149
Bentonite Pit Q44150
Borrow Pit Q44151
Caliche Pit Q44152
Chert Pit Q44153
Cinder Pit Q44154
Clay Pit Q44155
Coal Mine Q44156
Diggings Q44157
Disturbed Surface Q44158
Disturbed Surface - Pit Q44159
Evaporation Pond Q44160
Glory Hole Q44161
Gravel Pit Q44162
Gravel/Borrow Pit - Undifferentiated Q44163
Hydraulic Mine Q44164
Iron Pit Q44165
Leach Pond Q44166
Lignite Pit Q44167
Marl Pit Q44168
Mill Site Q44169
Mine Dump Q44170
Mine Shaft Q44171
Open Pit Mine Q44172
Open Pit Mine or Quarry Q44173
Ore Stockpile/Storage Q44174
Placer Mine Q44175
Prospect Pit Q44176
Quarry Q44177
Quarry - Gypsum Q44178
Quarry - Limestone Q44179
Quarry - Pumice Q44180
Quarry - Rock Q44181
Salt Evaporator Q44182
Sand and Gravel Pit Q44183
Sand Pit Q44184
Scoria Pit Q44185
Settling Pond Q44186
Shale Pit Q44187
Shell Pit Q44188
Silica Mine Q44189
Slag Pile Q44190
Strip Mine Q44191
Tailings - Dredge Q44192
Tailings - Mill Q44193
Tailings - Placer Q44194
Tailings - Pond Q44195
Tailings -

# USMIN Exploration

Note: I'll end up dumping this aspect of the notebook or moving it elsewhere. I include it in a commit for reference.

The trick with the USMIN data and the GeoKB will be connecting dots on mines we already know about in the GeoKB, identifying clues about new mines, and deciding what we want to do with those clues. Since this is a good discrete example of a case where one of two methods we are exploring for "live knowledge-banking," I'm going to leave off building a bot approach on USMIN digestion at this point. We'll look instead at the following:

* an OpenRefine pathway where we use Ftr_Name, State, and County values to reconcile existing items in the GeoKB
* an AGOL project pathway where we pull in the WFS from the USMIN record in ScienceBase and a query of mines from GeoKB and use both property intersections and geospatial proximity to reconcile and tease out potentially new records

In the following, I 

In [70]:
usmin_gdb_file_item = next((f for f in usmin_item_json["files"] if "_Geodatabase.zip" in f["name"]), None)
r_usmin_gdb_file = requests.get(usmin_gdb_file_item["url"])
with ZipFile(BytesIO(r_usmin_gdb_file.content)) as zip_usmin_gdb:
    zip_usmin_gdb.extractall('data/')

In [71]:
gdf_usmin = gpd.read_file('data/USGS_TopoMineSymbols_ver9.gdb')

In [72]:
# Create a simple "identifier" field with name:state:county from USMIN
gdf_usmin["name_location"] = gdf_usmin.apply(lambda x: f"{x.Ftr_Name}:{x.State}:{x.County}" if isinstance(x.Ftr_Name, str) else None, axis=1)

In [74]:
# Query existing mines in GeoKB
query_wb_mines = "PREFIX%20wd%3A%20%3Chttps%3A%2F%2Fgeokb.wikibase.cloud%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttps%3A%2F%2Fgeokb.wikibase.cloud%2Fprop%2Fdirect%2F%3E%0APREFIX%20p%3A%20%3Chttps%3A%2F%2Fgeokb.wikibase.cloud%2Fprop%2F%3E%0APREFIX%20pr%3A%20%3Chttps%3A%2F%2Fgeokb.wikibase.cloud%2Fprop%2Freference%2F%3E%0A%0ASELECT%20%3Fmine%20%3FmineLabel%20%3Floc_typeLabel%20%3FlocationAltLabel%0AWHERE%20%7B%0A%20%20%3Fmine%20wdt%3AP1%20wd%3AQ3646%20.%0A%20%20%3Fmine%20p%3AP1%20%3Finstance_of_statement%20.%0A%20%20%3Finstance_of_statement%20prov%3AwasDerivedFrom%20%3Fref%20.%0A%20%20%3Fref%20pr%3AP3%20wd%3AQ3624%20.%0A%20%20%3Fmine%20wdt%3AP11%20%3Flocation%20.%0A%20%20VALUES%20%3Flocation_type%20%7B%20wd%3AQ229%20wd%3AQ481%20%7D%0A%20%20%3Flocation%20wdt%3AP1%20%3Flocation_type%20.%0A%20%20%3Flocation%20wdt%3AP1%20%3Floc_type%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20.%20%7D%0A%7D"
df_wb_mines = geokb.wb_ref_data(query=query_wb_mines)

In [75]:
# Do some reorganization on the GeoKB records and build similar name:state:county property
df_wb_mines["loc_name"] = df_wb_mines.apply(lambda x: x["locationAltLabel"].split(",")[0].strip() if x["loc_typeLabel"] == "U.S. County (or equivalent)" else x["locationAltLabel"], axis=1)
df_wb_mines["mine"] = df_wb_mines.mine.apply(lambda x: x.split("/")[-1])

df_wb_mines_grouped = df_wb_mines.sort_values(["mine","loc_typeLabel"], ascending=False).groupby("mine", as_index=False)["mineLabel","loc_name"].agg(list)
df_wb_mines_grouped["name_location"] = df_wb_mines_grouped.apply(lambda x: ":".join([x["mineLabel"][0], x["loc_name"][0], x["loc_name"][1]]), axis=1)

  df_wb_mines_grouped = df_wb_mines.sort_values(["mine","loc_typeLabel"], ascending=False).groupby("mine", as_index=False)["mineLabel","loc_name"].agg(list)


In [76]:
# Assemble what should be reasonable matches between USMIN features and GeoKB items
geokb_usmin_probable_matches = pd.merge(
    left=gdf_usmin[gdf_usmin.name_location.isin(df_wb_mines_grouped.name_location)][["name_location","Ftr_Type","Topo_Date","geometry"]],
    right=df_wb_mines_grouped[["mine","name_location"]],
    how="left",
    on="name_location"
)

For now, I took the first one of these and added a couple of claims to the existing GeoKB item as an example to work from.

In [77]:
geokb_usmin_probable_matches

Unnamed: 0,name_location,Ftr_Type,Topo_Date,geometry,mine
0,Northernmost Mines:AL:Franklin,Adit,1946,POINT (-87.62939 34.31061),Q5414
1,Northernmost Mines:AL:Franklin,Adit,1946,POINT (-87.62951 34.31126),Q5414
2,Lowler Mines:AL:Marion,Adit,1946,POINT (-87.65932 34.29053),Q5413
3,Lowler Mines:AL:Marion,Adit,1946,POINT (-87.65651 34.29070),Q5413
4,Hager Mine:AL:Marion,Adit,1946,POINT (-87.65836 34.28951),Q5412
...,...,...,...,...,...
3520,Old Michigan Mine:TX:Culberson,Open Pit Mine,1973,POINT (-104.15893 31.66827),Q25333
3521,Whites Mine:TX:Uvalde,Open Pit Mine,1974,POINT (-100.09062 29.16093),Q25345
3522,Southwest Ledge Quarries:TX:Lampasas,Quarry,1959,POINT (-98.39506 31.12322),Q25342
3523,Gato Quarry:TX:Uvalde,Quarry,1974,POINT (-100.04606 29.15270),Q25324
