<a target="_blank" href="https://colab.research.google.com/github/ChuBL/How-to-Use-Mindat-API/blob/main/How_to_Use_Mindat_API.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# How to Use OpenMindat Data API to Query and Download Datasets


## 0. Access Your Mindat API Token

[How to Get My Mindat API Key or Token?](https://www.mindat.org/a/how_to_get_my_mindat_api_key)

## 1. Dependencies (Please run this section first)

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This step will connect your Google Drive with this notebook, and you can check the 📁 file management system in the left column.

In [2]:
from pathlib import Path
import os
import sys
import json
import re
import pprint
import requests

In [3]:
# You can change this working path according to your preference
# If the folder didn't show up in the left column, click the refresh button
WORKING_DIR = "/content/drive/MyDrive/MindatAPI_folder/"
Path(WORKING_DIR).mkdir(parents=True, exist_ok=True)

You should **avoid** placing your API token explicitly in your code if you plan to share it.

For example, you can drag a text file of your token to the working directory to upload it, then access it using the following code. Or you can input it manually.

In [4]:
YOUR_API_KEY = ""

In [5]:
%%script false --no-raise-error
# comment out the first line to activate this code block
api_key_file_dir = "/content/drive/MyDrive/MindatAPI_folder/api_key.txt"
try:
    with open(api_key_file_dir, 'r') as f:
        YOUR_API_KEY = f.read()
except FileNotFoundError:
    print("API key file not found. Please create a text file containing your api key and place it in the correct directory.")

In [6]:
%%script false --no-raise-error
# comment out the first line to activate this code block
YOUR_API_KEY = input()

In [7]:
try:
    assert 0 != len(YOUR_API_KEY)
except AssertionError:
    raise Exception("Please set a valid API token before the start!")

## 2. Use Cases


### Get the Items with Filters

In [None]:
MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

filter_file_name = "mindat_items_filter.json"
filter_file_path = Path(WORKING_DIR, filter_file_name)
filter_file_path

In [None]:
filter_dict = {'density__to': '3',
          'crystal_system': 'Triclinic',
          'color': 'red',
          'ima': 1,          # show only minerals approved by ima
          'format': 'json',
          }

with open(filter_file_path, 'w') as f:
    params = filter_dict

    response = requests.get(MINDAT_API_URL+"/geomaterials/",
                    params=params,
                    headers=headers)

    result_data = response.json()["results"]
    json_data = {"results": result_data}

    while True:
        try:
            next_url = response.json()["next"]
            response = requests.get(next_url, headers=headers)
            json_data["results"] += response.json()['results']

        except requests.exceptions.MissingSchema as e:
            # This error indicates the `next_url` is none
            break

    json.dump(json_data, f, indent=4)
print("Successfully saved " + str(len(json_data["results"])) + " entries to " + str(filter_file_path))

### Get the IMA-Approved Mineral Items

In [None]:
MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

ima_file_name = "mindat_items_IMA.json"
ima_file_path = Path(WORKING_DIR, ima_file_name)
ima_file_path

In [None]:
with open(ima_file_path, 'w') as f:
    params = {
        'ima': 1,          # show only minerals approved by ima
        'format': 'json'
    }
    response = requests.get(MINDAT_API_URL+"/geomaterials/",
                    params=params,
                    headers=headers)

    result_data = response.json()["results"]
    json_data = {"results": result_data}

    while True:
        try:
            next_url = response.json()["next"]
            response = requests.get(next_url, headers=headers)
            json_data["results"] += response.json()['results']

        except requests.exceptions.MissingSchema as e:
            # This error indicates the `next_url` is none
            break

    json.dump(json_data, f, indent=4)
print("Successfully saved " + str(len(json_data['results'])) + " entries to " + str(ima_file_path))

### Get the Items with Selected Fields


Examples for Mindat API fields: `id,name,updttime,mindat_formula,mindat_formula_note,ima_formula,ima_status,ima_notes,varietyof,synid,polytypeof,groupid,entrytype,entrytype_text,description_short,impurities,elements,sigelements,tlform,cim,occurrence,otheroccurrence,industrial,discovery_year,diapheny,cleavage,parting,tenacity,colour,csmetamict,opticalextinction,hmin,hardtype,hmax,vhnmin,vhnmax,vhnerror,vhng,vhns,luminescence,lustre,lustretype,aboutname,other,streak,csystem,cclass,spacegroup,a,b,c,alpha,beta,gamma,aerror,berror,cerror,alphaerror,betaerror,gammaerror,va3,z,dmeas,dmeas2,dcalc,dmeaserror,dcalcerror,cleavagetype,fracturetype,morphology,twinning,epitaxidescription,opticaltype,opticalsign,opticalalpha,opticalbeta,opticalgamma,opticalomega,opticalepsilon,opticalalpha2,opticalbeta2,opticalgamma2,opticalepsilon2,opticalomega2,opticaln,opticaln2,optical2vcalc,optical2vmeasured,optical2vcalc2,optical2vmeasured2,opticalalphaerror,opticalbetaerror,opticalgammaerror,opticalomegaerror,opticalepsilonerror,opticalnerror,optical2vcalcerror,optical2vmeasurederror,opticaldispersion,opticalpleochroism,opticalpleochorismdesc,opticalbirefringence,opticalcomments,opticalcolour,opticalinternal,opticaltropic,opticalanisotropism,opticalbireflectance,opticalr,uv,ir,magnetism,type_specimen_store,commenthard,cim,strunz10ed1,strunz10ed2,strunz10ed3,strunz10ed4,dana8ed1,dana8ed2,dana8ed3,dana8ed4,thermalbehaviour,commentluster,commentbreak,commentdense,commentcrystal,commentcolor,electrical,tranglide,nolocadd,weighting,specdispm,spacegroupset,approval_year,publication_year,ima_history,rock_parent,rock_parent2,rock_root,rock_bgs_code,meteoritical_code,key_elements,shortcode_ima,~all,*`

[Source](https://api.mindat.org/schema/redoc/#tag/items/operation/items_list)

In [None]:
MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

select_file_name = "mindat_items_select.json"
select_file_path = Path(WORKING_DIR, select_file_name)
select_file_path

In [None]:
# set your selected fields here
fields_str = 'id,name,mindat_formula'

In [None]:
with open(select_file_path, 'w') as f:
    params = {
        'fields': fields_str, # put your selected fields here
        'format': 'json'
    }
    response = requests.get(MINDAT_API_URL+"/items/",
                    params=params,
                    headers=headers)

    result_data = response.json()["results"]
    json_data = {"results": result_data}

    while True:
        try:
            next_url = response.json()["next"]
            response = requests.get(next_url, headers=headers)
            json_data["results"] += response.json()['results']

        except requests.exceptions.MissingSchema as e:
            # This error indicates the `next_url` is none
            break

    json.dump(json_data, f, indent=4)
print("Successfully saved " + str(len(json_data['results'])) + " entries to " + str(select_file_path))

### Get the Items with Omitted Fields

In [None]:
MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

omit_file_name = "mindat_items_omit.json"
omit_file_path = Path(WORKING_DIR, omit_file_name)
omit_file_path

In [None]:
omit_str = 'id,name,updttime'

In [None]:
with open(omit_file_path, 'w') as f:
    params = {
        'omit': omit_str,
        'format': 'json'
    }
    response = requests.get(MINDAT_API_URL+"/items/",
                    params=params,
                    headers=headers)

    result_data = response.json()["results"]
    json_data = {"results": result_data}

    while True:
        try:
            next_url = response.json()["next"]
            response = requests.get(next_url, headers=headers)
            json_data["results"] += response.json()['results']

        except requests.exceptions.MissingSchema as e:
            # This error indicates the `next_url` is none
            break

    json.dump(json_data, f, indent=4)
print("Successfully saved " + str(len(json_data['results'])) + " entries to " + str(omit_file_path))

### Get All the Items


❗Please note that this section of codes will retrieve hundreds of **MB** of data from the Mindat server.

❗This operation may consume a significant amount of API access quota.

In [None]:
MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

all_file_name = "mindat_items_all.json"
all_file_path = Path(WORKING_DIR, all_file_name)
all_file_path

In [None]:
with open(all_file_path, 'w') as f:
    params = {
        'format': 'json',
    }
    response = requests.get(MINDAT_API_URL+"/items/",
                            params=params,
                            headers=headers)
    result_data = response.json()["results"]
    json_data = {"results": result_data}

    while True:
        try:
            next_url = response.json()["next"]
            response = requests.get(next_url, headers=headers)
            json_data["results"] += response.json()['results']

        except requests.exceptions.MissingSchema as e:
            # This error indicates the `next_url` is none
            break

    json.dump(json_data, f, indent=4)
print("Successfully saved " + str(len(json_data['results'])) + " entries to " + str(all_file_path))