This notebook creates a csv-file from a json file of toy data. The toy data contains input parameters and results of simulations of GeTe

In [40]:
import pandas as pd
import numpy as np
import json
import os
import requests

In [48]:
DATA_PATH = "../data"
JSON_PATH = DATA_PATH + os.sep + "toy_data.json"

In [49]:
with open(JSON_PATH) as file:
    data = json.load(file)

In [50]:
raw_df = pd.DataFrame(data)
raw_df.columns

Index(['ecutrho', 'k_density', 'ecutwfc', 'n_iterations', 'time', 'converged',
       'accuracy', 'fermi', 'total_energy'],
      dtype='object')

We only want the cut-off radii, the k-point spacing and whether the algorithm converged together with the accuracy (w.r.t. a reference calculation)

In [51]:
rel_cols = ['ecutrho', 'k_density', 'ecutwfc', 'converged' , 'accuracy']
df = raw_df[rel_cols]

In [52]:
# LOADING ALL ELEMENT KEYS
url_table = requests.get("https://archive.materialscloud.org/record/file?record_id=862&filename=SSSP_1.1.2_PBE_efficiency.json&file_id=a5642f40-74af-4073-8dfd-706d2c7fccc2")
text_table = url_table.text
sssp_table = json.loads(text_table)
periodic_table_keys = list(sssp_table.keys())

For each element in the periodic table, we want to know the relative contribution to the total number of atoms in the structure, i.e. 0.0 -> not in the structure, 1.0 -> all the atoms in the structure are from this element.

In our toy example, we have data from GeTe simulations. Hence, the two element get a value of 0.5 each.

In [53]:
# ADDING A ZERO COLUMN FOR EACH ELEMENT
for element in periodic_table_keys:
    df[element] = 0.0
    
df['Ge'] = 0.5
df['Te'] = 0.5

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[element] = 0.0


In [54]:
filename = "toy_data.csv"
filepath = DATA_PATH + os.sep + filename
df.to_csv(filepath)