Skip to content

Commit

Permalink
Added examples for parsing DFT
Browse files Browse the repository at this point in the history
  • Loading branch information
BowenD-UCB committed Sep 25, 2023
1 parent 195d48c commit 2ebc57f
Showing 1 changed file with 109 additions and 12 deletions.
121 changes: 109 additions & 12 deletions examples/fine_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,34 +50,120 @@
"id": "16eeae1e",
"metadata": {},
"source": [
"## 1. Prepare Training Data\n"
"## 0. Parse DFT outputs to CHGNet readable formats\n"
]
},
{
"cell_type": "markdown",
"id": "286c110a",
"metadata": {},
"source": [
"CHGNet is interfaced to [Pymatgen](https://pymatgen.org/), the training samples (normally coming from different DFTs like VASP),\n",
"need to be converted to [pymatgen.core.structure](https://pymatgen.org/pymatgen.core.html#module-pymatgen.core.structure).\n",
"\n",
"To convert VASP calculation to pymatgen structures and CHGNet labels, you can use the following [code](https://github.com/CederGroupHub/chgnet/blob/main/chgnet/utils/vasp_utils.py):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "208fa4aa",
"id": "72ada11a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from chgnet.utils import parse_vasp_dir\n",
"\n",
"# ./my_vasp_calc_dir contains vasprun.xml OSZICAR etc.\n",
"dataset_dict = parse_vasp_dir(file_root=\"./my_vasp_calc_dir\")\n",
"print(dataset_dict.keys())"
]
},
{
"cell_type": "markdown",
"id": "b8b3a8cd",
"metadata": {},
"source": [
"After the DFT calculations are parsed, we can save the parsed structures and labels to disk,\n",
"so that they can be easily reloaded during multiple rounds of training.\n",
"The Pymatgen structures can be saved in either json, pickle, cif, or CHGNet graph.\n",
"\n",
"For super-large training dataset, like MPtrj dataset, we recommend [converting them to CHGNet graphs](https://github.com/CederGroupHub/chgnet/blob/main/examples/make_graphs.py). This will save significant memory and graph computing time.\n",
"\n",
"Below are the example codes to save the structures."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9a74cae",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"try:\n",
" from chgnet import ROOT\n",
"# Structure to json\n",
"from chgnet.utils import write_json\n",
"\n",
" lmo = Structure.from_file(f\"{ROOT}/examples/mp-18767-LiMnO2.cif\")\n",
"except Exception:\n",
" from urllib.request import urlopen\n",
"dict_to_json = [struct.as_dict() for struct in dataset_dict[\"structure\"]]\n",
"write_json(dict_to_json, \"CHGNet_structures.json\")\n",
"\n",
" url = \"https://raw.githubusercontent.com/CederGroupHub/chgnet/main/examples/mp-18767-LiMnO2.cif\"\n",
" cif = urlopen(url).read().decode(\"utf-8\")\n",
" lmo = Structure.from_str(cif, fmt=\"cif\")"
"\n",
"# Structure to pickle\n",
"import pickle\n",
"\n",
"with open(\"CHGNet_structures.p\", \"wb\") as f:\n",
" pickle.dump(dataset_dict, f)\n",
"\n",
"\n",
"# Structure to cif\n",
"for idx, struct in enumerate(dataset_dict[\"structure\"]):\n",
" struct.to(filename=f\"{idx}.cif\")\n",
"\n",
"\n",
"# Structure to CHGNet graph\n",
"from chgnet.graph import CrystalGraphConverter\n",
"\n",
"converter = CrystalGraphConverter()\n",
"for idx, struct in enumerate(dataset_dict[\"structure\"]):\n",
" graph = converter(struct)\n",
" graph.save(fname=f\"{idx}.pt\")"
]
},
{
"cell_type": "markdown",
"id": "61c551cd",
"metadata": {},
"source": [
"For other types of DFT calculations, please refer to their interfaces\n",
"in [pymatgen.io](https://pymatgen.org/pymatgen.io.html#module-pymatgen.io).\n",
"\n",
"see: [Quantum Espresso](https://pymatgen.org/pymatgen.io.html#module-pymatgen.io.pwscf)\n",
"\n",
"see: [CP2K](https://pymatgen.org/pymatgen.io.cp2k.html#module-pymatgen.io.cp2k)\n",
"\n",
"see: [Gaussian](https://pymatgen.org/pymatgen.io.html#module-pymatgen.io.gaussian)\n"
]
},
{
"cell_type": "markdown",
"id": "e1611921",
"metadata": {},
"source": [
"## 1. Prepare Training Data"
]
},
{
"cell_type": "markdown",
"id": "9ec2524a",
"metadata": {},
"source": [
"We create a dummy fine-tuning dataset by using CHGNet prediction with some random noise.\n",
"Below we will create a dummy fine-tuning dataset by using CHGNet prediction with some random noise.\n",
"For your purpose of fine-tuning to a specific chemical system or AIMD data, please modify the block below\n"
]
},
Expand All @@ -88,6 +174,17 @@
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" from chgnet import ROOT\n",
"\n",
" lmo = Structure.from_file(f\"{ROOT}/examples/mp-18767-LiMnO2.cif\")\n",
"except Exception:\n",
" from urllib.request import urlopen\n",
"\n",
" url = \"https://raw.githubusercontent.com/CederGroupHub/chgnet/main/examples/mp-18767-LiMnO2.cif\"\n",
" cif = urlopen(url).read().decode(\"utf-8\")\n",
" lmo = Structure.from_str(cif, fmt=\"cif\")\n",
"\n",
"structures, energies_per_atom, forces, stresses, magmoms = [], [], [], [], []\n",
"\n",
"for _ in range(100):\n",
Expand Down Expand Up @@ -172,7 +269,7 @@
"\n",
"The `batch_size` is defined to be 8 for small GPU-memory. If > 10 GB memory is available, we highly recommend to increase `batch_size` for better speed.\n",
"\n",
"If you have very large numbers of structures (which is typical for AIMD), putting them all in a python list can quickly run into memory issues. In this case we highly recommend you to pre-convert all the structures into graphs and save them as shown in `examples/make_graphs.py`. Then directly train CHGNet by loading the graphs from disk instead of memory using the `GraphData` class defined in `data/dataset.py`.\n"
"If you have very large numbers (>100K) of structures (which is typical for AIMD), putting them all in a python list can quickly run into memory issues. In this case we highly recommend you to pre-convert all the structures into graphs and save them as shown in `examples/make_graphs.py`. Then directly train CHGNet by loading the graphs from disk instead of memory using the `GraphData` class defined in `data/dataset.py`.\n"
]
},
{
Expand Down

0 comments on commit 2ebc57f

Please sign in to comment.