<a target="_blank" href="https://colab.research.google.com/github/ChuBL/How-to-Use-Mindat-API/blob/main/How_to_Use_Mindat_API.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# How to Use OpenMindat Data API to Query and Download Datasets


## 0. Access Your Mindat API Token

[How to Get My Mindat API Key or Token?](https://www.mindat.org/a/how_to_get_my_mindat_api_key)

[Mindat API doc](https://api.mindat.org/schema/redoc/)

## 1. Dependencies (Run this section first)

This set of examples runs with files that are located in the file directory with the python notebook .ipynb file, for running with Jupyter or similar host in your local desktop environment with an active internet connection. 

In [1]:
from pathlib import Path
import os
import sys
import json
import re
import pprint
import requests
# import google

You should **avoid** placing your API token explicitly in your code if you plan to share it. That would include working with a notebook that is in a public Github repo, like this one.

The solution here is to have the api key saved in a file accessible from your notebook environment (e.g. in the same directory), and adding that file to your github .ignore file so it is not copied to the public Github repo.

In [2]:

api_key_file_dir = "local/api_key.txt"
try:
    with open(api_key_file_dir, 'r') as f:
        YOUR_API_KEY = f.read()
except FileNotFoundError:
    print("API key file not found. Please create a text file containing your api key and place it in the correct directory.")
    
try:
    assert 0 != len(YOUR_API_KEY)
except AssertionError:
    raise Exception("Please set a valid API token before the start!")

### Basic access pattern

In [3]:
# API root entry point
MINDAT_API_URL = "https://api.mindat.org"

# authorization header that must be included with each request.
headers = {'Authorization': 'Token '+ YOUR_API_KEY}


In [12]:
# see https://api.mindat.org/schema/redoc/ for documentation on using the Mindat API


# using the geomaterials_search endpoint. this executes a text search using the term in the 
#  q parameter, and returns a fixed set of fields. 
# Apparently these are id','name': ,'synid','ima_status', 'ima_approved'
# other parameters besides 'q' are ignored

filter_dict = {
    'q':'dunite',
}

params = filter_dict

# use python requests package to GET results from mindat
response = requests.get(MINDAT_API_URL+"/geomaterials_search/",
                    params=params,
                    headers=headers)
# assume the query succeeds. See later examples for using the response.status_code
#  to check if the request worked.

#  another handy requests package function to cast the results to JSON.
json_file = response.json()

# this will echo the json file containing the query results.
json_file

[{'id': 48408,
  'name': 'Dunite',
  'synid': 0,
  'ima_status': 0,
  'ima_approved': 0},
 {'id': 51341,
  'name': 'Metadunite',
  'synid': 0,
  'ima_status': 0,
  'ima_approved': 0}]

### Get the Items with Selected Fields


#### List of Mindat API fields: 
~all
a
aboutname
aerror
alpha
alphaerror
approval_year
b
berror
beta
betaerror
c
cclass
cerror
cim
cim
cleavage
cleavagetype
colour
commentbreak
commentcolor
commentcrystal
commentdense
commenthard
commentluster
csmetamict
csystem
dana8ed1
dana8ed2
dana8ed3
dana8ed4
dcalc
dcalcerror
description_short
diapheny
discovery_year
dmeas
dmeas2
dmeaserror
electrical
elements
entrytype
entrytype_text
epitaxidescription
fracturetype
gamma
gammaerror
groupid
guid
hardtype
hmax
hmin
id
ima_formula
ima_history
ima_notes
ima_status
impurities
industrial
ir
key_elements
longid
luminescence
lustre
lustretype
magnetism
meteoritical_code
mindat_formula
mindat_formula_note
morphology
name
nolocadd
occurrence
optical2vcalc
optical2vcalc2
optical2vcalcerror
optical2vmeasured
optical2vmeasured2
optical2vmeasurederror
opticalalpha
opticalalpha2
opticalalphaerror
opticalanisotropism
opticalbeta
opticalbeta2
opticalbetaerror
opticalbireflectance
opticalbirefringence
opticalcolour
opticalcomments
opticaldispersion
opticalepsilon
opticalepsilon2
opticalepsilonerror
opticalextinction
opticalgamma
opticalgamma2
opticalgammaerror
opticalinternal
opticaln
opticaln2
opticalnerror
opticalomega
opticalomega2
opticalomegaerror
opticalpleochorismdesc
opticalpleochroism
opticalr
opticalsign
opticaltropic
opticaltype
other
otheroccurrence
parting
polytypeof
publication_year
rock_bgs_code
rock_parent
rock_parent2
rock_root
shortcode_ima
sigelements
spacegroup
spacegroupset
specdispm
streak
strunz10ed1
strunz10ed2
strunz10ed3
strunz10ed4
synid
tenacity
thermalbehaviour
tlform
tranglide
twinning
type_specimen_store
updttime
uuid2mindat
uv
va3
varietyof
vhnerror
vhng
vhnmax
vhnmin
vhns
weighting
z

### Get Mindat IDs from a list of html links in a Microsoft word document

In [None]:
#repeating for convienence

MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}


You might need to install docx

pip install docx

In [None]:

from docx import Document
from docx.opc.constants import RELATIONSHIP_TYPE as RT 
# Usage example  This file just contains a list of material types, each is a link with a URL 
  # behind the label text you see in the document.
document_path = 'MeteoriteExtraterrestrial.docx'
hyperlinks = []

print(" This program extracts hyperlinks detected in a word .docx file \n")

document = Document(document_path)

rels = document.part.rels

for rel in rels:
   if rels[rel].reltype == RT.HYPERLINK:
      #print("\n Original link id -", rel, "with detected URL: ", rels[rel]._target)
      hyperlinks.append(rels[rel]._target)
    
idlist = []
# Print the extracted URLs
for url in hyperlinks:
#    print(url)
#    print(url[27:32])
    idlist.append(url[27:32])
    
print (idlist)


In [None]:
# set your selected fields here
# fields_str = 'id','longid','guid','name','entrytype','entrytype_text','description_short','rock_parent','rock_parent2','rock_root','rock_bgs_code','meteoritical_code'

fields_str = 'id','longid','name', 'updttime'

In [None]:
idlist = ['11263','48145','49089','49091','49093','49504','49505','49506','49507','49508','49509','49510','49511','49512','49513','49514','49517','49518','49519','49520','49521','49522','49523','49524','49525','49526','49527','49528','49529','49530','49531','49532','49533','49534','49535','49536','49537','49538','49539','49540','49541','49542','49543','49544','49545','49546','49547','49548','49549','49550','49551','49552','49553','49554','49556','49557','49558','49559','49560','49561','49562','49563','49564','49565','49566','49567','49568','49569','49570','49571','49572','49573','49574','49575','49576','49577','49578','49579','49580','49581','49584','49585','49586','49587','49588','49589','49590','49591','49592','49593','49595','49596','49597','49599','49600','49603','49604','49606','49607','49608','49609','49610','49612','49614','49615','49616','49617','49618','49619','49620','49621','49622','49623','49625','49626','49627','49628','49629','49630','49631','49632','49633','49634','49635','49636','49637','49638','49639','49640','49641','49642','49643','49644','49645','49646','49647','49648','49649','49650','49651','49652','49653','49654','49655','49656','49657','49658','49659','49660','49661','49662','49663','49664','49665','49666','49667','49668','49669','49670','49671','49672','49673','49674','49675','49676','49677','49678','49679','49680','49682','49683','49684','49685','49686','49688','49689','49690','49691','49692','49693','49695','49696','49697','49698','49699','49700','49702','49703','49705','49707','49708','49709','49710','49711','49712','49713','49715','49716','49717','49718','49719','49720','49721','49722','49723','49724','49725','49726','49727','49728','49729','49730','49731','49732','49733','49734','49735','49736','49737','49738','49739','49740','49741','49742','49743','49744','49745','49746','49750','49751','49752','49753','49754','49755','49756','49757','49758','49760','49761','49762','49763','49764','49765','49767','49768','49769','49770','49772','49773','49774','49776','49777','49778','49779','49780','49781','49782','49783','49784','49785','49786','49787','49788','49789','49790','49791','49792','49793','49794','49795','49796','49797','49798','49799','49800','49802','49803','49804','49805','49806','49807','49808','49809','49810','49811','49812','49813','49814','49815','49816','49817','49818','49819','49820','49821','49822','49823','49824','49825','49826','49827','49828','49829','49831','49832','49833','49835','49836','49837','49838','49839','49840','49841','49842','49843','49844','49845','49846','49847','49848','49849','49850','49851','49852','49853','49854','49856','49857','49859','49860','49861','49862','49863','49864','49865','49866','49867','49869','49872','49873','49876','49877','49878','49879','49880','49881','49882','49883','49884','49885','49886','49887','49888','49889','49890','49891','49892','49893','49894','49895','49896','49898','49899','49901','49902','49904','49905','49906','49907','49908','49909','49910','49911','49912','49913','49914','49915','49916','49917','49918','49919','49920','49921','49922','49923','49924','49925','49926','49927','49928','49929','49930','49931','49932','49933','49934','49935','49936','49937','49938','49939','49941','49942','49944','49945','49947','49950','49951','49952','49953','49954','49955','49956','49957','49958','49959','49960','49961','50269','50270','50444','50445','50446','50447','50448','50449','50450','50451','50452','50453','50454','50455','50456','50457','50458','50459','50460','50461','50462','50463','51453','52197','52198','52199','52200','52201','52202','52203','52204','52205','52206','52207','52208','52209','52210','52211','52212','52213','52214','52215','52216','52217','52218','52219','52220','52221','52222','52223','52224','52225','52226','52227','52228','52229','52237','52238','52239','52240','52241','52360','52368','52369','52370','52371','52372','52373','52374','52393','52394','52395','52396','52720','52787','54116','54117']

query for items by id in an id list

In [None]:
import pandas
import time

json_array = []

# idlist = ['11263','48145']

for idstr in idlist:

    params = {
        'fields': fields_str, # put your selected fields here
        'id__in': idstr, # set the item amount for each page
        'format': 'json'
    }

    response = requests.get(MINDAT_API_URL+"/geomaterials/",
                    params=params,
                    headers=headers)

    if 200 <= response.status_code <= 299:
        json_out = response.json()
    #    print (json_out)
        json_array = json_array + json_out["results"]
    else:
        print ('problem-- ', idstr)
 
    time.sleep(3)
    
    
print ('Done')    
json_array


use the text search interface

In [None]:
# get all records that have a mereoritical_code value
# have to use cursor pagination

import pandas
import time

MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

#fields_str = 'id','longid','guid','name'

json_array = []

params = {
    'fields': fields_str, # put your selected fields here
    'format': 'json',
    'meteoritical_code_exists':'true'
}

response = requests.get(MINDAT_API_URL+"/geomaterials/",
                params=params,
                headers=headers)

if 200 <= response.status_code <= 299:
    json_out = response.json()
#    print (json_out)
    json_array = json_array + json_out["results"]
else:
    print ('problem')



In [None]:
json_array


In [None]:
params = {}

while json_out["next"] is not None :
    response = requests.get(json_out["next"],
                params=params,
                headers=headers)
    
    print (response.status_code)
    if 200 <= response.status_code <= 299:
        json_out = response.json()
    #    print (json_out)
        json_array = json_array + json_out["results"]
    else:
        print ('problem-- ', json_out["next"])

print ('Done')

In [None]:
df_result = pandas.DataFrame(json_array)

df_result.to_csv('timestamp.csv') 


In [None]:

# Load the JSON array
# Create a DataFrame from results
df_nested_list = pandas.json_normalize(json_array, record_path =['results'])

# Display the DataFrame
print(df_nested_list)

In [None]:
df_nested_list.to_csv('49089.csv') 

extract a list of ids and long-id for all minerals.

In [None]:
fields_str = 'id','longid'

params = {
    'fields': fields_str, # put your selected fields here
    'format': 'json'
}

response = requests.get(MINDAT_API_URL+"/geomaterials/",
                params=params,
                headers=headers)

if 200 <= response.status_code <= 299:
    json_out = response.json()
#    json_array.append(json_out)
else:
    print ('problem ')

same operation, but iterate through allpages.

In [None]:
# Create a DataFrame from results
df_nested_list = pandas.json_normalize(json_out, record_path =['results'])

In [None]:
json_out


In [None]:
df_nested_list.to_csv('id-longid.csv') 

In [None]:
# mindat check sum algorithm
# from Jolyon,2023-06-18

def mindat_longid(authority, type, id):
    out = "{}:{}:{}:".format(authority, type, id)
    out2 = "{}{}{}".format(authority, type, id)
    t = 0
    for i in range(len(out2)):
        if i % 2 == 1:
            t += int(out2[i]) * 3
        else:
            t += int(out2[i])
    ck = t % 10
    if ck:
        ck = 10 - ck
    out += str(ck)
    return out

In [None]:
# run checksum function
mindat_longid(1,1,49602)