<a href="https://colab.research.google.com/github/crhysc/jarvis-tools-notebooks/blob/master/cdvae_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inverse Design of Next-Generation Superconductors Using Data-Driven Deep Generative Models

# Tutorial: CDVAE, Crystal Diffusion Variational AutoEncoder



[Reference DOI](https://pubs.acs.org/doi/10.1021/acs.jpclett.3c01260)

Authors: Charles "Rhys" Campbell (crc00042@mix.wvu.edu), Kamal Choudhary (kamal.choudhary@nist.gov),

# (1) INTRODUCTION AND MOTIVATION


# (2) INSTALLATION, CONFIGURATION, AND DEPENDENCIES


# Install Conda

In [1]:
!pip install -q condacolab
import condacolab, os, sys
condacolab.install()
print("Done")

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:17
🔁 Restarting kernel...
Done


# Install CDVAE

In [1]:
import os
%cd /content
if not os.path.exists('cdvae'):
  !git clone https://github.com/txie-93/cdvae.git
print("Done")

/content
Cloning into 'cdvae'...
remote: Enumerating objects: 197, done.[K
remote: Counting objects: 100% (60/60), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 197 (delta 24), reused 19 (delta 19), pack-reused 137 (from 1)[K
Receiving objects: 100% (197/197), 138.14 MiB | 18.83 MiB/s, done.
Resolving deltas: 100% (62/62), done.
Updating files: 100% (89/89), done.
Done


# Switch Colab Runtime to GPU
At the top menu by the Colab logo, select **Runtime** -> **Change runtime type** -> **Any GPU**    

If this works, create GPU-based conda environment.  

If this fails due to usage limits, make the CPU-based conda environment.  



# Create **GPU**-based conda environment for CDVAE

#### Creating the **GPU** legacy env takes 7 minutes


In [None]:
%%time
%cd /content/cdvae
!mamba env create -p /usr/local/envs/cdvae_legacy -f env.yml
!conda run -p /usr/local/envs/cdvae_legacy --live-stream\
    mamba install -c conda-forge "torchmetrics<0.8" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install mkl=2024.0 --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install "monty==2022.9.9"
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install -c conda-forge "pymatgen>=2022.0.8,<2023" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install -e .
print("Done")

In [None]:
!conda run -p /usr/local/envs/cdvae_legacy python -c "import sys; print(sys.version)"
# proves that conda is running python 3.8.*

# Create **CPU**-based conda environment for CDVAE

#### Creating the **CPU** legacy env takes 7 minutes


In [None]:
%%time
%cd /content/cdvae
!mamba env create -p /usr/local/envs/cdvae_legacy -f env.cpu.yml
!conda run -p /usr/local/envs/cdvae_legacy --live-stream\
    mamba install -c conda-forge "torchmetrics<0.8" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install mkl=2024.0 --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install "monty==2022.9.9"
!conda run -p /usr/local/envs/cdvae_legacy \
    mamba install -c conda-forge "pymatgen>=2022.0.8,<2023" --yes
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install -e .
print("Done")

/content/cdvae
Channels:
 - pytorch
 - conda-forge
 - defaults
 - pyg
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
Solving environment: | / - \ | / - \ | / - \ | / - done


    current version: 24.11.2
    latest version: 25.5.0

Please update conda by running

    $ conda update -n base -c conda-forge conda



Downloading and Extracting Packages:
pytorch-1.8.1        | 1.27 GB   | :   0% 0/1 [00:00<?, ?it/s]
cudatoolkit-11.1.1   | 929.6 MB  | :   0% 0/1 [00:00<?, ?it/s][A

mkl-2024.2.2         | 118.9 MB  | :   0% 0/1 [00:00<?, ?it/s][A[A


qt-main-5.15.15      | 50.2 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A



qt6-main-6.8.1       | 48.8 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A[A




vtk-base-9.3.1       | 44.2 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A[A[A

# Install Dataset ETL dependencies


In [None]:
!conda run -p /usr/local/envs/cdvae_legacy \
    pip install pandas jarvis-tools

# (3) DATASET ETL (Extract-Transform-Load)


# Download data pre-processor

Data was generated using this [script](https://github.com/JARVIS-Materials-Design/cdvae/blob/main/scripts/generate_data_cdvae.py). It lives in the JARVIS Materials design repository, and it compiles a set of around 1000 structures and their superconducting critical temperatures into the format required for CDVAE training.

In [None]:
%cd /content/cdvae/scripts
!wget https://raw.githubusercontent.com/JARVIS-Materials-Design/cdvae/refs/heads/main/scripts/generate_data_cdvae.py

# Run data pre-processor

In [None]:
!conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python generate_data_cdvae.py
print("Done")

# Move train/test/val data to the correct spot

In [None]:
%cd /content
%mkdir /content/cdvae/data/supercon
%mv /content/cdvae/scripts/train.csv /content/cdvae/data/supercon/
%mv /content/cdvae/scripts/val.csv /content/cdvae/data/supercon/
%mv /content/cdvae/scripts/test.csv /content/cdvae/data/supercon/
print("Done")

# Pull the supercon Hydra config YAML from JARVIS

In [None]:
%cd /content/cdvae/conf/data/
!wget https://raw.githubusercontent.com/JARVIS-Materials-Design/cdvae/refs/heads/main/conf/data/supercon.yaml

# (4) TRAIN WITHOUT PROPERTY PREDICTOR

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python -u -m cdvae.run data=perov expname=supercon

# (5) TRAIN WITH PROPERTY PREDICTOR

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
   python -u -m cdvae.run data=supercon expname=supercon model.predict_property=True

# (6) INFERENCE

# Reconstruction

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python scripts/evaluate.py --model_path MODEL_PATH --tasks recon

# Generation

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python scripts/evaluate.py --model_path MODEL_PATH --tasks gen

# Optimization

In [None]:
!PROJECT_ROOT=/content/cdvae \
 HYDRA_JOBS=/content/cdvae/hydra_outputs \
 WABDB_DIR=/content/cdvae/wandb_outputs \
 conda run -p /usr/local/envs/cdvae_legacy --live-stream \
    python scripts/evaluate.py --model_path MODEL_PATH --tasks opt

# (7) NEXT STEPS & REFERENCES