## Retrieving data from the database - demo

* Every calculation has a unique `task_id` that is common across all collections
* Energies, structures, and other calculation info are stored in the `tasks` collection
* All SCAN calculations are tagged `production-scan-vXX` where "XX" is a number
* Volumetric data is stored in separate collections due to its large size

---
## Header

#### Imports

In [1]:
import os
from pathlib import Path
import re
import numpy as np
import xlrd
from matplotlib import pyplot as plt

In [2]:
from pymatgen.ext.matproj import MPRester, MPRestError
from monty.serialization import loadfn, dumpfn
from pymatgen import Structure, Composition
from pymatgen.analysis.reaction_calculator import ComputedEntry, ComputedReaction
from pymatgen.util.plotting import pretty_plot, periodic_table_heatmap
from pymatgen.core import periodic_table
from pymatgen.io.vasp.outputs import Elfcar, Chgcar, Poscar

#### Settings and utility functions

In [3]:
%load_ext autoreload
%autoreload 2

---
## Main Code

### Querying the `tasks` collection for energies, structures, etc.

#### Connect to Ryan's taskdb using `maggma` `MongoStore`

In [4]:
from maggma.stores import MongoStore
ryandb_tasks = MongoStore(database="mp_rk_calculations",
                         collection_name="tasks",
                         host="mongodb03.nersc.gov",
                         port=27017,
                         username="mp_rk_calculations_ro",
                         password="4df3t3t3554544",
                         last_updated_field = "last_updated",
                         key = "metadata.task_id")
ryandb_tasks.connect()

#### Example: retrieve SCAN task IDS, energies, and structures for selected formulas

In [5]:
formulas = ["KCl",
'NaCl',
"RbCl",
"CsCl",
"FeCl3",
"GaCl3",
"RhCl3",
"AuCl3",
"InCl3",
"IrCl3",
"MgO",
"CaO",
"SrO",
"CuO",
"ZnO",
"AgO",
"CdO",
"SnO",
"HgO",
"CsO2",
"SrO2",
"BaO2",
"IrO2",
]

`MongoStore.query()` always returns a list of documents, so we use list comprehensions to get the data we want. Here, `t` is the document (t for "task document" and you can see that each field requested, like `formula_pretty` is a field in the task doc you can see in Robo3T

In [21]:
tasks = [(t["formula_pretty"],t["task_id"],t["output"]["energy_per_atom"],t["output"]["structure"]) for t in \
         ryandb_tasks.query({"formula_pretty":{"$in":formulas},"tags":{"$regex":"production-scan"}})]

In [15]:
#tasks = [t for t in ryandb_tasks.query({"formula_pretty":{"$in":formulas},"tags":{"$regex":"production-scan"}})]

In [22]:
tasks[0]

('AgO',
 5360,
 -18.98283738,
 {'@module': 'pymatgen.core.structure',
  '@class': 'Structure',
  'charge': None,
  'lattice': {'matrix': [[2.29807909, 2.20352511, -0.47979231],
    [2.29807909, -2.20352511, -0.47979231],
    [-0.75826146, -0.0, -5.49172091]],
   'a': 3.219765686355278,
   'b': 3.219765686355278,
   'c': 5.543821695820723,
   'alpha': 87.13449710011457,
   'beta': 87.13449710011457,
   'gamma': 86.37284932575909,
   'volume': 57.22209642544247},
  'sites': [{'species': [{'element': 'Ag', 'occu': 1}],
    'abc': [-0.0, 0.5, 0.5],
    'xyz': [0.769908815, -1.101762555, -2.9857566099999997],
    'label': 'Ag',
    'properties': {'magmom': 0.0}},
   {'species': [{'element': 'Ag', 'occu': 1}],
    'abc': [0.5, 0.0, -0.0],
    'xyz': [1.149039545, 1.101762555, -0.239896155],
    'label': 'Ag',
    'properties': {'magmom': 0.0}},
   {'species': [{'element': 'O', 'occu': 1}],
    'abc': [0.97842492, 0.02157508, 0.25],
    'xyz': [2.108513725, 2.1084426489394827, -1.8527225375],

### Querying the `GridFS` collections for Volumetric data

* Each file (AECCAR, CHGCAR, ELFCAR) is stored in its own collection
* We will use a `maggma` `GridFSStore` to access each collection

### Connect to the volumetric data stores

In [17]:
from maggma.stores import GridFSStore
elfcar_store = GridFSStore(database="mp_rk_calculations",
                         collection_name="elfcar_fs",
                         host="mongodb03.nersc.gov",
                         port=27017,
                         username="mp_rk_calculations_ro",
                         password="4df3t3t3554544",
                         last_updated_field = "last_updated",
                         key = "metadata.task_id")

elfcar_store.connect()

In [9]:
from maggma.stores import GridFSStore
chgcar_store = GridFSStore(database="mp_rk_calculations",
                         collection_name="chgcar_fs",
                         host="mongodb03.nersc.gov",
                         port=27017,
                         username="mp_rk_calculations_ro",
                         password="4df3t3t3554544",
                         last_updated_field = "last_updated",
                         key = "metadata.task_id")

chgcar_store.connect()

In [10]:
from maggma.stores import GridFSStore
aeccar0_store = GridFSStore(database="mp_rk_calculations",
                         collection_name="aeccar0_fs",
                         host="mongodb03.nersc.gov",
                         port=27017,
                         username="mp_rk_calculations_ro",
                         password="4df3t3t3554544",
                         last_updated_field = "last_updated",
                         key = "metadata.task_id")

aeccar0_store.connect()

In [11]:
from maggma.stores import GridFSStore
aeccar2_store = GridFSStore(database="mp_rk_calculations",
                         collection_name="aeccar2_fs",
                         host="mongodb03.nersc.gov",
                         port=27017,
                         username="mp_rk_calculations_ro",
                         password="4df3t3t3554544",
                         last_updated_field = "last_updated",
                         key = "metadata.task_id")

aeccar2_store.connect()

#### Example: Download POSCAR, CHGCAR, ELFCAR, AECCAR0, AECCAR2 to directories

In this example, I'll pick the `task_id` of the first task in the `tasks` list from above and write the data files to the current directory

In [12]:
cpd = tasks[0]

if not Path.exists(Path.cwd() / cpd[0]):
    Path.mkdir(Path.cwd() /cpd[0])
    target_dir = Path.cwd() / cpd[0]

# save structure as POSCAR
Structure.from_dict(cpd[3]).to("poscar", target_dir / "POSCAR")

# ELFCAR
elfcar = [e for e in elfcar_store.query({"metadata.task_id":cpd[1]})][0]
if elfcar.get("data_aug"):
    del elfcar["data_aug"] # bug fix line
Elfcar.from_dict(elfcar).write_file(target_dir / "ELFCAR")

# CHGCAR
chgcar = [e for e in chgcar_store.query({"metadata.task_id":cpd[1]})][0]
Chgcar.from_dict(chgcar).write_file(target_dir / "CHGCAR")

# AECCAR0
aec0 = [e for e in aeccar0_store.query({"metadata.task_id":cpd[1]})][0]
if aec0.get("data_aug"):
    del aec0["data_aug"] # bug fix line
Chgcar.from_dict(aec0).write_file(target_dir / "AECCAR0")

# AECCAR2
aec2 = [e for e in aeccar2_store.query({"metadata.task_id":cpd[1]})][0]
if aec2.get("data_aug"):
    del aec2["data_aug"] # bug fix line
Chgcar.from_dict(aec2).write_file(target_dir / "AECCAR2")

## Appendix

---