# MinMod Knowledge Graph

### live at [https://minmod.isi.edu/](https://minmod.isi.edu/) wrapping a [SPARQL Endpoint](https://minmod.isi.edu/sparql).

MinMod is the mineral data **Knowledge Graph (KG)**, integrating heterogeneous data sources including: grade-tonnage data extracted from **mine reports**, **scholarly articles**, **mine site databases**, and **structured tables** to provide a rich, queryable graph of mineral site information, and **links** to additional knowledge bases such as [GeoKB](https://geokb.wikibase.cloud/).

### data in this knowledge graph adhere to this [schema](https://github.com/DARPA-CRITICALMAAS/schemas/blob/main/ta2/README.md).

## Why KGs?

<img src="demo_imgs/minmod_kg.png" alt="minmod kg" width="250"/>

- KGs
  - graphs are natural way to **encode data**
  - KGs use **semantic concepts & relationships** to create a **Semantic Network**
  - involves **spatial & temporal** information
- RDF
  - framework within the **Semantic Web** stack
  - an extension of WWW, enabling the Web of Data (aka **"Linked Data"**)
  - Linked Open Data & **FAIR** data principles

## Constructing the KG

- Extracted data --> `JSON-LD` readers / `TTL` triples reader
- Predefined data
  - Open set of commodity entities (based on MRDS/GeoKB)
  - Finite set of deposit type entities
  - Ontology following schema to enforce class & property constraints

<img src="demo_imgs/minmod_pipeline.png" alt="minmod kg" width="600"/>

## Interacting with the KG
MinMod KG `SPARQL` Sandbox

In [None]:
import json
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from collections import Counter
import warnings

warnings.filterwarnings("ignore")
tqdm.pandas()

In [None]:
def run_sparql_query(query, endpoint='https://minmod.isi.edu/sparql', values=False):
    # add prefixes
    final_query = '''
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX : <https://minmod.isi.edu/resource/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX gkbi: <https://geokb.wikibase.cloud/entity/>
    PREFIX gkbp: <https://geokb.wikibase.cloud/wiki/Property:>
    PREFIX gkbt: <https://geokb.wikibase.cloud/prop/direct/>
    \n''' + query
    # send query
    response = requests.post(
        url=endpoint,
        data={'query': final_query},
        headers={
            "Content-Type": "application/x-www-form-urlencoded",
            "Accept": "application/sparql-results+json"  # Requesting JSON format
        },
        verify=False  # Set to False to bypass SSL verification as per the '-k' in curl
    )
    #print(response.text)
    try:
        qres = response.json()
        if "results" in qres and "bindings" in qres["results"]:
            df = pd.json_normalize(qres['results']['bindings'])
            if values:
                filtered_columns = df.filter(like='.value').columns
                df = df[filtered_columns]
            return df
    except:
        return None

In [None]:
def run_minmod_query(query, values=False):
    return run_sparql_query(query, endpoint='https://minmod.isi.edu/sparql', values=values)

def run_geokb_query(query, values=False):
    return run_sparql_query(query, endpoint='https://geokb.wikibase.cloud/query/sparql', values=values)

--------------------------------------------------------

### 0. Count total number of triples in KG

In [None]:
query = ''' SELECT (COUNT(?s) as ?count)
            WHERE {
                ?s ?p ?o .
            } '''
run_minmod_query(query)

### 1. Deposit Types

In [None]:
query = ''' SELECT ?ci ?cn
            WHERE {
                ?ci a :DepositType .
                ?ci :name ?cn .
            } '''
run_minmod_query(query)

### 2. Mineral Inventories

#### 2.1. all **inferred** ore values, from all inventories, their grades, cutoff grades, & dates

In [None]:
query = ''' SELECT ?o_inv ?comm_name ?ore ?grade ?cutoff_grade ?date
            WHERE {
                ?s :mineral_inventory ?o_inv .
                ?o_inv :category :Inferred .
                ?o_inv :commodity [ :name ?comm_name ] .
                ?o_inv :ore [ :ore_value ?ore ] .
                ?o_inv :grade [ :grade_value ?grade ] .
                ?o_inv :cutoff_grade [ :grade_value ?cutoff_grade ] .
                ?o_inv :date ?date .
            } '''
run_minmod_query(query, values=True)

#### 2.2. all mineral inventories with **inferred** values and grade >= 14

In [None]:
query = ''' SELECT ?o_inv ?comm_name ?ore ?grade
            WHERE {
                ?s :mineral_inventory ?o_inv .
                ?o_inv :category :Inferred .
                ?o_inv :commodity [ :name ?comm_name ] .
                ?o_inv :ore [ :ore_value ?ore ] .
                ?o_inv :grade [ :grade_value ?grade ] .
                FILTER (?grade >= 14)
            } '''
run_minmod_query(query, values=True)

### 3. Commodities

#### 3.1. all commodities and their `GeoKB` URIs

In [None]:
query = ''' SELECT ?ci ?cn ?gi
            WHERE {
                ?ci a :Commodity .
                ?ci :name ?cn .
                ?ci owl:sameAs ?gi .
            } '''
example_df = run_minmod_query(query)
example_df

#### 3.2. get commodity symboles from `GeoKB`

In [None]:
def get_symbol_via_sparql(geokb_uri):
    query = '''
    SELECT ?symb
    WHERE {
        <%s> gkbt:P17 ?symb .
    }''' % (geokb_uri)
    result_record = run_geokb_query(query)
    if len(result_record) > 0:
        return result_record.iloc[0]['symb.value']
    return ""

In [None]:
pd.set_option('display.max_rows', 500)

example_df['geoKB Symbol'] = example_df['gi.value'].progress_apply(get_symbol_via_sparql)
example_df

------

### 4. Mineral Sites

#### 4.1. all `MineralSite` instances

In [None]:
query = ''' SELECT ?ms ?ms_p ?ms_v
            WHERE {
                ?ms a :MineralSite .
                ?ms ?ms_p ?ms_v .
            } '''
run_minmod_query(query, values=True)

#### 4.2. all `MineralSite` URIs with Deposit Type `Sedex Type Deposits`

In [None]:
query = ''' SELECT ?ms ?ms_p ?ms_v
            WHERE {
                ?ms a :MineralSite .
                ?ms :deposit_type [ :name "Sedex Type Deposits" ] .
            } '''
run_minmod_query(query, values=True)

## 5. Grade-Tonnage models

In [None]:
# todo: ?mineralInventory :date        "%s"#^^xsd:dateTime .
# todo: use grade-units to convert to single unit/format
# todo: use ore-units to convert to single unit/format

query_template = '''
SELECT ?mineralInventory ?tonnage  ?grade ?inventoryName ?category
WHERE {
    ?mineralInventory a            :MineralInventory .
    ?mineralInventory :id          ?inventoryName .

    ?mineralInventory :date        "%s" .
    
    ?mineralInventory :ore         ?ore .
    ?ore              :ore_value   ?tonnage .
    
    ?mineralInventory :grade       ?gradeInfo .
    ?gradeInfo        :grade_value ?grade .
    
    ?mineralInventory :commodity   ?Commodity .
    ?Commodity        :name        "%s"@en .

    ?mineralInventory :category    ?category .
}
'''

In [None]:
# todo: this example is f(timestamp, commodity)

query_resp_df = run_minmod_query(query_template % ('09-19-2017', 'Zinc'), values=True)
mineral_data_df = pd.DataFrame([
    {
        'mineralInventory': row['mineralInventory.value'],
        'name': row['inventoryName.value'],  # This is now the inventory ID/name
        'tonnage': float(row['tonnage.value']),
        'grade': float(row['grade.value']),
        'category': row['category.value']#.split('/')[-1]
    }
    for index, row in query_resp_df.iterrows()
])
mineral_data_df

In [None]:
tonnages = mineral_data_df['tonnage'].values
grades = mineral_data_df['grade'].values
names = mineral_data_df['name'].values

# todo: convert tonnage to million metric tons using unit-transformation
tonnages_million_metric_tons = tonnages / 1e6

plt.figure(figsize=(6, 4))
scatter = plt.scatter(tonnages_million_metric_tons, grades, marker='x', color='r')

# loga scale
plt.xscale('log')

# todo: units should be evaluated and converted
plt.xlabel('Tonnage, in million metric tons')
plt.ylabel('Grade, in percent (Zn)')
# todo, commodity name comes from query
plt.title('Grade-Tonnage Model of Mineral Deposits (Zinc)')

plt.grid(True, which="both", ls="--")
for i, txt in enumerate(names):
    plt.annotate(txt, (tonnages_million_metric_tons[i], grades[i]))
plt.show()