# Tutorial on training an Exchange Correlation Functional using Deepchem

### Introduction to Density Functional Theory and Exchange Correlation Functionals

   Density-functional theory (DFT) is a theory used to calculate the electronic structure of atoms, molecules, and solids. Its objective is
to use the fundamental laws of quantum mechanics to quantitatively
comprehend the properties of materials.
    
   There are serious limitations to the tradional methods used to approximate solutions to the Schrödinger equation of N interacting electrons moving in an external potential. Whereas in DFT, instead of the many-body wave function, the density (n(r)) is a function of three spatial coordinates.
    
   The many-body electronic ground state can be described using single-particle    equations and an effective potential thanks to the Kohn-Sham theory. The exchange-correlation potential, which accounts for many-body effects, the Hartree potential, which describes the electrostatic electron-electron interaction, and the ionic potential resulting from the atomic cores make up the effective potential.
    
   The difference between the total exact energy and the total of the rest of the energy terms (such as kinetic energy), is known as the exchange-correlation energy. The exchange-correlation functional is obtained by calculating the functional derivate of the XC energy w.r.t the
electron density function. 

### Setup 

In [None]:
!pip install --pre deepchem
!pip install git+https://github.com/diffqc/dqc.git
!pip install pylibxc2

### Datasets

#### Types 

There are 4 types of DFT data object implementations that are used to determine the type of calculation to be carried out on the data point. These types are: "ae", "ie", "dm", "dens", that stand for atomization energy, ionization energy, density matrix and density profile respectively.

#### Calculating ground truth values 

   For the AE and IE entry types, we use pre- calculated values from the NIST databases. For the DM and Dens datatypes, we calcute the true values using a non-DFT method called CCSD. (we use the PYSCF library to do so) 
   In this example, we will calculate the initial density matrix for Hydrogen Fluoride. 

In [None]:
# INSTALL PYSCF 
!pip install pyscf
import pyscf

In [None]:
# create a pyscf mol 
mol = gto.M(atom= 'H 0.86625 0 0; F -0.86625 0 0', basis='6-311++G(3df,3pd)' ,unit="Bohr")
mf  = scf.UHF(mol).run()
mcc = cc.UCCSD(mf)
mcc.kernel()

# obtain the total density matrix
modm = mcc.make_rdm1()
aodm0 = np.dot(mf.mo_coeff[0], np.dot(modm[0], mf.mo_coeff[0].T))
aodm1 = np.dot(mf.mo_coeff[1], np.dot(modm[1], mf.mo_coeff[1].T))
aodm = aodm0 + aodm1

# save the value in a .npy file
dm = torch.as_tensor(aodm, dtype=torch.double)
outfile = "output.npy"
np.save(outfile, dm)


#### An example of the data type format: 
"
e_type : 'ie'
true_val: '0.53411947056'
systems: [{'moldesc': 'N 0 0 0',
      'basis': '6-311++G(3df,3pd)',
       'spin': '3'},
      {'moldesc': 'N 0 0 0',
      'basis': '6-311++G(3df,3pd)',
      'charge': 1,
       'spin': '2'}]  
"
       
These values are stored in .yaml files. In case of 'dens' and 'dm', the true value will be replaced by the .npy file name containing the ccsd values.

### Featurizing and loading the dataset 

After we have built the data file, we will load it and featurize using a dataloader implemented in deepchem. 

In [2]:
from deepchem.data.data_loader import DFTYamlLoader
#name of the yaml file
inputs = '../../deepchem/models/tests/assets/test_dftxcdata.yaml'
data = DFTYamlLoader()
dataset = data.create_dataset(inputs)


FileNotFoundError: [Errno 2] No such file or directory: '../../deepchem/models/tests/assets/test_dftxcdata.yaml'

### Training the XC Model

The dataset can now be used to train our own exchange correlation functional . 

In [None]:
from deepchem.models.dft.dftxc import XCModel
import deepchem as dc
import tempfile

# create a file to save the model 
model_dir = tempfile.mkdtemp()

# initialise the model 
model = XCModel("lda_x",
                    batch_size=1,
                    log_frequency=1,
                    mode="classification",
                    n_tasks=2,
                    model_dir=model_dir)

# Calculate the loss on fitting the model 
loss = model.fit(dataset, nb_epoch=1, checkpoint_interval=1)

#### Predicts and evaluation using the XC model 

The predictions can be run on various different molecules but we have used the same dataset for simplicity, in this tutorial. 

In [None]:
predict = model.predict(dataset)

# Evaluate 
metric = dc.metrics.Metric(dc.metrics.mae_score)
scores = model.evaluate(dataset, [metric])

#### Notes 

- The entry type "Density Matrix" cannot be used on model.evaluate as of now.
- To run predictions on this data type, a dataset containing only "dm" entries must be used.
- When initializing the XCModel, the user may build and pass a different pytorch model to train the XC Functional, instead of using the default method. 


## References 

- Kasim, Muhammad F., and Sam M. Vinko. "Learning the exchange-correlation
    functional from nature with fully differentiable density functional
    theory." Physical Review Letters 127.12 (2021): 126403.\
  Most of our code has been derived from the reference above. The implementation of the paper can be found in the github link below :
 https://github.com/mfkasim1/xcnn 

- Encyclopedia of Condensed Matter Physics, 2005.
- Kohn, W. and Sham, L.J., 1965. Self-consistent equations including
    exchange and correlation effects. Physical review, 140(4A), p.A1133.
