<a href="https://colab.research.google.com/github/crhysc/jarvis-tools-notebooks/blob/master/cdvae_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inverse Design of Next-Generation Superconductors Using Data-Driven Deep Generative Models

# Tutorial: CDVAE, Crystal Diffusion Variational AutoEncoder



[Reference DOI](https://pubs.acs.org/doi/10.1021/acs.jpclett.3c01260)

Authors: Kamal Choudhary (kamal.choudhary@nist.gov), Charles "Rhys" Campbell (crc00042@mix.wvu.edu)

# (1) INTRODUCTION AND MOTIVATION


In [2]:
!pip install -q condacolab
import condacolab, os, sys
condacolab.install()
print("Done")

✨🍰✨ Everything looks OK!
Done


# (2) INSTALLATION, CONFIGURATION, AND DEPENDENCIES


## Install CDVAE

In [1]:
import os
%cd /content
if not os.path.exists('cdvae'):
  !git clone https://github.com/txie-93/cdvae.git
print("Done")

/content
Cloning into 'cdvae'...
remote: Enumerating objects: 197, done.[K
remote: Counting objects: 100% (60/60), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 197 (delta 24), reused 19 (delta 19), pack-reused 137 (from 1)[K
Receiving objects: 100% (197/197), 138.14 MiB | 23.27 MiB/s, done.
Resolving deltas: 100% (62/62), done.
Updating files: 100% (89/89), done.
Done


# Switch Colab Runtime to GPU
At the top menu by the Colab logo, select **Runtime** -> **Change runtime type** -> **Any GPU**    

If this works, create GPU-based conda environment.  

If this fails due to usage limits, make the CPU-based conda environment.  



## Create **GPU**-based conda environment for CDVAE

#### Creating the GPU legacy env takes 7 minutes


In [None]:
%cd /content/cdvae
%pwd
!mamba env update -n base -f env.yml -vv
!mamba env create -p /usr/local/envs/cdvae_legacy -f env.yml
!conda run -p /usr/local/envs/cdvae_legacy --live-stream\
    mamba install -c conda-forge "torchmetrics<0.8" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install mkl=2024.0 --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install "monty==2022.9.9"
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install -c conda-forge "pymatgen>=2022.0.8,<2023" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install -e .
print("Done")

In [None]:
!conda run -p /usr/local/envs/cdvae_legacy python -c "import sys; print(sys.version)"
# proves that conda is running python 3.8.*

## Create **CPU**-based conda environment for CDVAE

#### Creating the CPU legacy env takes 7 minutes


In [None]:
%cd /content/cdvae
%pwd
!mamba env update -n base -f env.yml -vv
!mamba env create -p /usr/local/envs/cdvae_legacy -f env.cpu.yml
!conda run -p /usr/local/envs/cdvae_legacy --live-stream\
    mamba install -c conda-forge "torchmetrics<0.8" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install mkl=2024.0 --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install "monty==2022.9.9"
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install -c conda-forge "pymatgen>=2022.0.8,<2023" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install -e .
print("Done")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m







libclang13-19.1.7    | 11.3 MB   | :  47% 0.4653979210964169/1 [00:13<00:08, 16.27s/it] [A[A[A[A[A[A[A[A[A[A[A[A[A[A[A
cudatoolkit-11.1.1   | 929.6 MB  | :  32% 0.3180790813747267/1 [00:13<00:25, 37.51s/it] [A












icu-75.1             | 11.6 MB   | : 100% 1.0/1 [00:13<00:00,  5.11s/it]               [A[A[A[A[A[A[A[A[A[A[A[A[A















pytorch-1.8.1        | 1.27 GB   | :  22% 0.22441064428451374/1 [00:13<00:38, 50.02s/it]













libopenvino-intel-cp | 11.5 MB   | : 100% 1.0/1 [00:13<00:00,  5.86s/it]               [A[A[A[A[A[A[A[A[A[A[A[A[A[A
cudatoolkit-11.1.1   | 929.6 MB  | :  32% 0.32083577795272467/1 [00:13<00:25, 37.25s/it][A
















ffmpeg-7.1.0         | 9.9 MB    | :   0% 0.0015848763484000092/1 [00:13<2:22:57, 8591.21s/it][A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A[A














libclang13-19.1.7    | 11.3 MB   | :  70% 0.7022

# (3) DATASET ETL

#### Extract and transform data


Data was generated using this [script](https://github.com/JARVIS-Materials-Design/cdvae/blob/main/scripts/generate_data_cdvae.py). It lives in the JARVIS Materials design repository, and it compiles a set of around 1000 structures and their superconducting critical temperatures into the format required for CDVAE training.

In [None]:
%cd /content/cdvae/scripts
!wget https://raw.githubusercontent.com/JARVIS-Materials-Design/cdvae/refs/heads/main/scripts/generate_data_cdvae.py

# (4) TRAIN WITHOUT PROPERTY PREDICTOR

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python -u -m cdvae.run data=(CHANGE TO SUPERCON) expname=perov

# (5) TRAIN WITH PROPERTY PREDICTOR

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
   python -u -m cdvae.run data=(CHANGE TO SUPERCON) expname=perov

# (6) INFERENCE

# (7) NEXT STEPS & REFERENCES

___
___
___
___
___

In [None]:
# Commented out IPython magic to ensure Python compatibility.
%%time
# # Install required packages.
# import os
# import torch
# os.environ['TORCH'] = torch.__version__
# print(torch.__version__)
#
!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
!pip install -q pytorch-lightning wandb torchmetrics==0.6.0 pymatgen==2022.4.26
!pip install -q hydra-core jarvis-tools python-dotenv p-tqdm accelerate

In [None]:
!pip install -q pytorch-lightning==1.3.6

In [None]:
import os
os.chdir('/content')
if not os.path.exists('cdvae'):
  !git clone https://github.com/JARVIS-Materials-Design/cdvae.git

In [None]:
os.chdir('cdvae')
!pip install -e .

In [None]:
# Commented out IPython magic to ensure Python compatibility.
import os
if not os.path.exists("/content/cdvae/WABDB"):
  os.makedirs("/content/cdvae/WABDB")
if not os.path.exists("/content/cdvae/HYDRA_JOBS"):
  os.makedirs("/content/cdvae/HYDRA_JOBS")
os.environ["PROJECT_ROOT"]="/content/cdvae"
os.environ["WABDB"]="/content/cdvae/WABDB"
os.environ["WABDB_DIR"]="/content/cdvae/WABDB"
os.environ["HYDRA_JOBS"]="/content/cdvae/HYDRA_JOBS"
%env HYDRA_FULL_ERROR=1

In [None]:
!echo $HYDRA_FULL_ERROR

In [None]:
"""Data was generated using this [script](https://github.com/JARVIS-Materials-Design/cdvae/blob/main/scripts/generate_data_cdvae.py)."""

In [None]:
import yaml
import pprint

In [None]:
with open('/content/cdvae/conf/train/default.yaml','r') as f:
  yam = yaml.safe_load(f)
#yam.pop('early_stopping')
#yam['pl_trainer']['fast_dev_run']=True
yam['pl_trainer']['gpus']=0

In [None]:
with open('/content/cdvae/conf/train/default.yaml','w') as f:
  yaml.dump(yam,f)

In [None]:
pprint.pprint(yam)

In [None]:
with open('/content/cdvae/conf/optim/default.yaml','r') as f:
  yam = yaml.safe_load(f)

In [None]:
yam['use_lr_scheduler']=False

In [None]:
with open('/content/cdvae/conf/optim/default.yaml','w') as f:
  yaml.dump(yam,f)

In [None]:
pprint.pprint(yam)

In [None]:
# Commented out IPython magic to ensure Python compatibility.
%%time
# import os
# os.environ["WANDB_ANONYMOUS"] = "must"
!python cdvae/run.py data=supercon expname=supercon_test02 model.predict_property=True

In [None]:
!pip install smact

In [None]:
"""Export conda environment"""

In [None]:
!conda env export

from datetime import date
d1 = today.strftime("%Y-%m-%d")

In [None]:
temp_dir=!ls /content/cdvae/HYDRA_JOBS/singlerun/

In [None]:
os.environ['TMP_DIR']=temp_dir[0]

In [None]:
!echo $TMP_DIR

In [None]:
"""Adjust path accordingly"""

In [None]:
# Commented out IPython magic to ensure Python compatibility.
%%time
!python scripts/evaluate.py --n_step_each 5 --num_batches_to_samples 5 --batch_size 5 --model_path "/content/cdvae/HYDRA_JOBS/singlerun/$TMP_DIR/supercon_test02" --tasks opt gen recon

In [None]:
!pip install matminer

In [None]:
# Commented out IPython magic to ensure Python compatibility.
%%time
!python scripts/compute_metrics.py --root_path  "/content/cdvae/HYDRA_JOBS/singlerun/$TMP_DIR/supercon_test02" --tasks   gen recon

In [None]:
temp_dir[0]

In [None]:
!ls /content/cdvae/HYDRA_JOBS/singlerun/$TMP_DIR/supercon_test02/

In [None]:
import torch
from jarvis.core.atoms import Atoms
from jarvis.core.atoms import pmg_to_atoms
from jarvis.core.lattice import Lattice
from jarvis.core.specie import atomic_numbers_to_symbols
from jarvis.db.jsonutils import dumpjson
from jarvis.analysis.structure.spacegroup import Spacegroup3D
from collections import Counter
from pymatgen.core.structure import Structure
import pandas as pd
opt_path = "/content/cdvae/HYDRA_JOBS/singlerun/"+temp_dir[0]+"/supercon_test02/eval_opt.pt"
gen_path = "/content/cdvae/HYDRA_JOBS/singlerun/"+temp_dir[0]+"/supercon_test02/eval_gen.pt"
recon_path = "/content/cdvae/HYDRA_JOBS/singlerun/"+temp_dir[0]+"/supercon_test02/eval_recon.pt"
csv_path = "/content/cdvae/data/supercon/test.csv"
df = pd.read_csv(csv_path)
x = torch.load(recon_path)
y = torch.load(gen_path)
z = torch.load(opt_path)
print(len(df),x["num_atoms"].shape,y["num_atoms"].shape,z["num_atoms"].shape)

In [None]:
num_atoms = x["num_atoms"]
atom_types = x["atom_types"]
frac_coords = x["frac_coords"]
lengths = x["lengths"]
angles = x["angles"]
index_list = torch.cumsum(num_atoms[0], dim=0).numpy().tolist()
indice_tuples = []
for i, ii in enumerate(index_list):
    if i == 0:
        tup = [0, index_list[i] - 1]
    else:
        tup = [index_list[i - 1] - 1, index_list[i] - 1]
    indice_tuples.append(tup)

In [None]:
recon_structures = []

In [None]:
for id_needed in range(num_atoms.shape[1]):
    id_fracs = frac_coords[0].numpy()[
        indice_tuples[id_needed][0] : indice_tuples[id_needed][1]
    ]
    id_atom_types = atom_types[0].numpy()[
        indice_tuples[id_needed][0] : indice_tuples[id_needed][1]
    ]
    id_lengths = lengths[0].numpy()[id_needed]
    id_angles = angles[0].numpy()[id_needed]
    lat = Lattice.from_parameters(
        id_lengths[0],
        id_lengths[1],
        id_lengths[2],
        id_angles[0],
        id_angles[1],
        id_angles[2],
    ).matrix
    atoms = Atoms(
        lattice_mat=lat,
        elements=atomic_numbers_to_symbols(id_atom_types),
        coords=id_fracs,
        cartesian=False,
    )
    # spg_numb = Spacegroup3D(atoms).space_group_number
    # spg_numbs.append(spg_numb)

    # print()
    # print()
    # print()
    # print("jarvis\n", atoms)
    # struct = Structure(
    #    lattice=Lat.from_parameters(
    #        id_lengths[0],
    #        id_lengths[1],
    #        id_lengths[2],
    #        id_angles[0],
    #        id_angles[1],
    #        id_angles[2],
    #    ),
    #    species=id_atom_types,
    #    coords=id_fracs,
    #    coords_are_cartesian=False,
    # )
    # atoms = pmg_to_atoms(struct)
    # print("pmg\n", atoms)
    # print()
    # print()
    # print()

    # gen_structures.append(atoms.to_dict())
    recon_structures.append(atoms)

In [None]:
test_structures=[]
jids = []
for i,ii in df.iterrows():
  atoms=pmg_to_atoms(Structure.from_str(ii['cif'],fmt='cif'))
  test_structures.append(atoms)#.to_dict())
  jids.append(ii['material_id'])

In [None]:
df

In [None]:
"""Uploading to JARVIS-Leaderboard."""

In [None]:
from jarvis.io.vasp.inputs import Poscar
import json
f=open('AI-AtomGen-Tc_supercon-dft_3d-test-rmse.csv','w')
line='id,target,prediction\n'
f.write(line)
for i,j,k in zip(test_structures,recon_structures,jids):
  print(k,i.composition.reduced_formula,j.composition.reduced_formula)
  line = k+","+Poscar(i).to_string().replace('\n','\\n')+","+Poscar(j).to_string().replace('\n','\\n')+"\n"
  #line = k+","+json.dumps(i.to_dict())+","+json.dumps(j.to_dict())+"\n"
  f.write(line)
f.close()
#zip file before uploading to JARVIS-Leadrboard

In [None]:
dfx = pd.read_csv('AI-AtomGen-Tc_supercon-dft_3d-test-rmse.csv')

In [None]:
import pandas as pd
info = {}
test_path = pd.read_csv("/content/cdvae/data/supercon/test.csv")
val_path = pd.read_csv("/content/cdvae/data/supercon/val.csv")
train_path = pd.read_csv("/content/cdvae/data/supercon/train.csv")
test={}
val={}
train={}

In [None]:
for i,ii in train_path.iterrows():
  atoms=pmg_to_atoms(Structure.from_str(ii['cif'],fmt='cif'))
  pos = Poscar(atoms).to_string().replace('\n','\\n')
  jid=ii['material_id']
  train[jid]=pos

In [None]:
for i,ii in val_path.iterrows():
  atoms=pmg_to_atoms(Structure.from_str(ii['cif'],fmt='cif'))
  pos = Poscar(atoms).to_string().replace('\n','\\n')
  jid=ii['material_id']
  val[jid]=pos

In [None]:
for i,ii in test_path.iterrows():
  atoms=pmg_to_atoms(Structure.from_str(ii['cif'],fmt='cif'))
  pos = Poscar(atoms).to_string().replace('\n','\\n')
  jid=ii['material_id']
  test[jid]=pos
info['train']=train
info['val']=val
info['test']=test

In [None]:
from jarvis.db.jsonutils import dumpjson
dumpjson(data=info,filename='dft_3d_Tc_supercon.json')

In [None]:
test_path

In [None]:
!cp scripts/compute_metrics.py scripts/eval_utils.py .

In [None]:
!ls

In [None]:
import numpy as np
from pymatgen.analysis.structure_matcher import StructureMatcher
import pandas as pd
from jarvis.io.vasp.inputs import Poscar
from tqdm import tqdm
df=pd.read_csv('AI-AtomGen-Tc_supercon-dft_3d-test-rmse.csv')

In [None]:
matcher = StructureMatcher(stol=0.5, angle_tol=10, ltol=0.3)
rms = []
for m, mm in tqdm(df.iterrows()):
    try:
        atoms_target = (
            Poscar.from_string(
                (mm["target"].replace("\\n", "\n"))
            ).atoms
        ).pymatgen_converter()
        atoms_pred = (
            Poscar.from_string(
                (mm["prediction"].replace("\\n", "\n"))
            ).atoms
        ).pymatgen_converter()
        # rms_dist = matcher.get_rms_dist(atoms_pred,atoms_target)
        rms_dist = matcher.get_rms_anonymous(atoms_pred, atoms_target)
        if rms_dist[0] is not None:
            rms.append(rms_dist[0])
    except Exception as exp:
        print("exp", exp)
        pass
rms = round(np.array(rms).mean(), 4)
print('rms', rms)

In [None]:
!conda env export

In [None]:
!pip install matminer==0.9.0

In [None]:
!rm -rf /usr/local/lib/python3.10/dist-packages/pandas*
!pip uninstall pandas -y
!pip install pandas==1.5.3

In [None]:
from p_tqdm import p_map
from compute_metrics import GenEval,get_crystal_array_list,Crystal, RecEval
crys_array_list, true_crystal_array_list = get_crystal_array_list(recon_path)
pred_crys = p_map(lambda x: Crystal(x), crys_array_list)
gt_crys = p_map(lambda x: Crystal(x), true_crystal_array_list)
rec_evaluator = RecEval(pred_crys, gt_crys)
recon_metrics = rec_evaluator.get_metrics()

In [None]:
crys_array_list, _ = get_crystal_array_list(gen_path)
gen_crys = p_map(lambda x: Crystal(x), crys_array_list)
gen_evaluator = GenEval(gen_crys, gt_crys, eval_model_name='carbon')
gen_metrics = gen_evaluator.get_metrics()
print(recon_metrics)

In [None]:
print(recon_metrics)

In [None]:
print(gen_metrics)

In [None]:
pip install alignn

In [None]:
from alignn.pretrained import get_multiple_predictions

In [None]:
atoms

In [None]:
atoms.write_poscar('POSCAR')

In [None]:
"""Quickly predict Tc and other properties"""

In [None]:
!pretrained.py --model_name jv_supercon_tc_alignn --file_format poscar --file_path POSCAR

In [None]:
!pretrained.py --model_name jv_supercon_debye_alignn --file_format poscar --file_path POSCAR

In [None]:
!pretrained.py --model_name jv_supercon_edos_alignn --file_format poscar --file_path POSCAR

In [None]:
!pretrained.py --model_name jv_supercon_a2F_alignn --file_format poscar --file_path POSCAR

In [None]:
!pretrained.py --model_name jv_formation_energy_peratom_alignn --file_format poscar --file_path POSCAR

In [None]:
!pip freeze