This notebook creates a csv-file from json files of toy data. The toy data contains input parameters and results of simulations

In [1]:
import pandas as pd
import json
import os
import requests

Information about the data stored in the json file

In [2]:
filename = "Ge-1_Se-1"
element_list = {elt.split("-")[0]: int(elt.split("-")[1]) for elt in filename.split("_")}

In [3]:
JSON_PATH = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data/" + filename + ".json")

In [4]:
with open(JSON_PATH) as file:
    data = json.load(file)

In [5]:
raw_df = pd.DataFrame(data)
raw_df.columns

Index(['ecutrho', 'k_density', 'ecutwfc', 'n_iterations', 'time', 'converged',
       'accuracy', 'fermi', 'total_energy'],
      dtype='object')

We only want the cut-off radii, the k-point spacing and whether the algorithm converged together with the accuracy (w.r.t. a reference calculation)

In [6]:
rel_cols = ['ecutrho', 'k_density', 'ecutwfc', 'converged' , 'accuracy', 'total_energy']
df = raw_df[rel_cols]

We set the reference energy as the energy with the highest simulation parameters.
We then compute the relative energy difference with this reference energy.

In [7]:
converged_rows = df.loc[:,'converged'] == True
idx_ref = 0
for idx, row in df.loc[converged_rows].iterrows():
    if (
        row["ecutwfc"] > df.loc[idx_ref, "ecutwfc"]
        or row["ecutrho"] > df.loc[idx_ref, "ecutrho"]
        or row["k_density"] < df.loc[idx_ref, "k_density"]
    ):
        idx_ref = idx

ref_energy = df.loc[idx_ref, "total_energy"]
print(f"Ref energy: {ref_energy} (found at index {idx_ref})")

df = df.assign(delta_E = df.loc[:,'total_energy'] - ref_energy)

Ref energy: -256.52875705 (found at index 660)


In [8]:
# LOADING ALL ELEMENT KEYS
url_table = requests.get("https://archive.materialscloud.org/record/file?record_id=862&filename=SSSP_1.1.2_PBE_efficiency.json&file_id=a5642f40-74af-4073-8dfd-706d2c7fccc2")
text_table = url_table.text
sssp_table = json.loads(text_table)
periodic_table_keys = list(sssp_table.keys())

For each element in the periodic table, we want to know the relative contribution to the total number of atoms in the structure, i.e. 0.0 -> not in the structure, 1.0 -> all the atoms in the structure are from this element.

In our toy example, we have data from GeTe simulations. Hence, the two element get a value of 0.5 each.

In [9]:
# ADDING A ZERO COLUMN FOR EACH ELEMENT
for element in periodic_table_keys:
    df = df.assign(**{element: 0.0})

total_elt = sum(list(element_list.values()))
for elt, nb_elt in element_list.items():
    df = df.assign(**{elt: nb_elt / total_elt})

In [10]:
filepath = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data/" + filename + ".csv")
df.to_csv(filepath)