This notebook creates a csv-file from json files of toy data. The toy data contains input parameters and results of simulations

In [12]:
import pandas as pd
import json
import os
import requests

Information about the data stored in the json file

In [13]:
filename = "data_GeTe"
element_list = ["Ge", "Te"]

In [14]:
JSON_PATH = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data/" + filename + ".json")

In [15]:
with open(JSON_PATH) as file:
    data = json.load(file)

In [16]:
raw_df = pd.DataFrame(data)
raw_df.columns

Index(['ecutrho', 'k_density', 'ecutwfc', 'n_iterations', 'time', 'converged',
       'accuracy', 'fermi', 'total_energy'],
      dtype='object')

We only want the cut-off radii, the k-point spacing and whether the algorithm converged together with the accuracy (w.r.t. a reference calculation)

In [17]:
rel_cols = ['ecutrho', 'k_density', 'ecutwfc', 'converged' , 'accuracy', 'total_energy']
df = raw_df[rel_cols]

We set the reference energy as the energy with the highest simulation parameters.
We then compute the relative energy difference with this reference energy.

In [18]:
converged_rows = df['converged'] == True
idx_ref = 0
for idx, row in df.loc[converged_rows].iterrows():
    if (row["ecutrho"] > df.loc[converged_rows, "ecutrho"][idx_ref]
    or row["k_density"] < df.loc[converged_rows, "k_density"][idx_ref]
    or row["ecutwfc"] > df.loc[converged_rows, "ecutwfc"][idx_ref]):
        idx_ref = idx

ref_energy = df.loc[converged_rows, "total_energy"][idx_ref]
print(f"Ref energy: {ref_energy} (found at index {idx_ref})")

df["delta_E"] = df["total_energy"] - ref_energy

Ref energy: -239.64221973 (found at index 573)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["delta_E"] = df["total_energy"] - ref_energy


In [19]:
# LOADING ALL ELEMENT KEYS
url_table = requests.get("https://archive.materialscloud.org/record/file?record_id=862&filename=SSSP_1.1.2_PBE_efficiency.json&file_id=a5642f40-74af-4073-8dfd-706d2c7fccc2")
text_table = url_table.text
sssp_table = json.loads(text_table)
periodic_table_keys = list(sssp_table.keys())

For each element in the periodic table, we want to know the relative contribution to the total number of atoms in the structure, i.e. 0.0 -> not in the structure, 1.0 -> all the atoms in the structure are from this element.

In our toy example, we have data from GeTe simulations. Hence, the two element get a value of 0.5 each.

In [20]:
# ADDING A ZERO COLUMN FOR EACH ELEMENT
for element in periodic_table_keys:
    df[element] = 0.0

nb_elt = len(element_list)
for elt in element_list:
    df[elt] += 1.0 / nb_elt

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[element] = 0.0


In [21]:
filepath = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data/" + filename + ".csv")
df.to_csv(filepath)