DFTbondDependency

Repository for bond dependency paper

Dependency:

RDKit
xyz2mol
pandas
csv
Numpy
requests
statsmodel
sklearn

To install the xyz2mol package, visit: https://github.com/jensengroup/xyz2mol.git

All the other packages are part of standard Python library and can be installed with either Conda, PIP, etc.

If this repository is used, please cite us.

Citation to the code can be downloaded by clicking: "Cite this repository" in the right side panel.
Citation to the preprint in: https://doi.org/10.26434/chemrxiv-2022-9prf3

Description of the Data.

The data is not publicly available now. It will be publicly available upon acceptance of our paper in a peer-reviewed journal. The preprint can be downloaded from: https://doi.org/10.26434/chemrxiv-2022-9prf3

Description of the codes.

The repository contains python scripts for the calculations described in the paper: https://doi.org/10.26434/chemrxiv-2022-9prf3

get_data.py: This script will download data from the DTU Data website. The public link to the website will be available upon acceptance of the paper. This file will download either all the files from the database (if -all/--all option is given) Or it will only download the xyzfiles and the log files of the energy calculations
make_molecule_bond_en_csv.py: This file makes a csv file containing energy values, list of bonds, SMILES string, chemical formula, etc from the xyzfiles and logfiles downloaded by using the get_data.py script.
make_reaction_ids.py: This script will create a csv file with only two columns: 'reactantindex','pdtindex'. The indices are the index of the molecules in the csv file made by the make_molecule_bond_en_csv.py script.
process_reaction_conversion_jobs.sh: This is a bash script to create the final csv file containing all the information related to the reactions. It takes the csv file containing molecular data (created by the script make_molecule_bond_en_csv.py), the indices of the reactants and products in form of a csv file (created by using the script make_reaction_ids.py), The G4MP2 energies of the molecules as csv file (with index and energy), the path of the python script make_reactions_parallel.py, number of Nodes to be used, and number of processors per each nodes. It first split the csv file containing indices of the reactants and products according to the number of Nodes and saves those in a json file with names Node_n.json with n from {1,2,...n} if n number of nodes are used.
make_reactions_parallel.py: This file takes csv file containing indices for the reactions, csv file containing all the data of the molecules, csv file containing G4MP2 energy, number of processors, json file containing the indices of the csv file with "reactantindex","pdtindex".
submit.sh: An example submit script to run make_reactions_parallel.py in a single node with multiple processors. It is called from the script process_reaction_conversion_jobs.sh. It is written for the slurm scheduler.
detect_correlation.py: This script is for detecting correlation between the variables (bonds). It takes the directory location of the csv files containing all the reaction data (created by the process_reaction_conversion_jobs.sh) script. By default, it randomly chose 10% of the total data to detect correlation.
do_linear_regression.py: This script performs the linear regression between the bond change and the DFT error to reaction energies. It takes as argument the directory location for the reaction data file, and the name of the DFT functionals.
correct_reaction_energy.py: This script calculates the reaction energy and the correction to it for a given DFT functional. It takes as input the log files of reactants (with the option -r), products (with the option -p), and name of the DFT functional and prints out the reaction energy for the DFT functional, the correction, and the corrected reaction energy.
CITATION.cff: This file is to provide citation data for this repository in bibtex or APA format.

How to run:

The help message for each of the files (except submit.sh) can be obtained by running the corresponding script with -h.

The steps described in the paper can be followed by running the below scripts in the following sequence:

get_data.py
make_molecule_bond_en_csv.py
make_reaction_ids.py
process_reaction_conversion_jobs.sh, make_reactions_parallel.py, submit.sh
detect_correlation.py
do_linear_regression.py

License:

All the scripts in this repository are covered under the MIT license terms (LICENSE.txt).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DFTbondDependency

Dependency:

Description of the Data.

Description of the codes.

How to run:

License:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
CITATION.cff		CITATION.cff
LICENSE.txt		LICENSE.txt
README.md		README.md
correct_reaction_energy.py		correct_reaction_energy.py
detect_correlation.py		detect_correlation.py
do_linear_regression.py		do_linear_regression.py
get_data.py		get_data.py
links.json		links.json
make_molecule_bond_en_csv.py		make_molecule_bond_en_csv.py
make_reaction_ids.py		make_reaction_ids.py
make_reactions_parallel.py		make_reactions_parallel.py
process_reaction_conversion_jobs.sh		process_reaction_conversion_jobs.sh
split_reactionscsv.py		split_reactionscsv.py
submit.sh		submit.sh

License

chemsurajit/DFTbondDependency

Folders and files

Latest commit

History

Repository files navigation

DFTbondDependency

Dependency:

Description of the Data.

Description of the codes.

How to run:

License:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages