# DeepDonor: Computational discovery of donor materials with high power conversion efficiency for organic solar cells

Deep learning method (DeepDonor) has been developed to discover the high-performance donor materials for organic solar cells by predicting their power conversion efficiency (PCE) using the quantum deep field (QDF) model with excellent extrapolation performance. 

## Dataset preparation

In order to evaluate the models in different PCE intervals, stratified sampling was adopted by using scikit-learn. The data in each dataset were divided into 18 intervals based on their PCE values, and the training, validation and test sets were split randomly with a ratio of 8:1:1 in each interval independently. 

In [1]:
import dataset
from data_preprocess import dataset_process

In [2]:
dataset_process.generate_dataset ('dataset/SM/SM.csv','train','val','test')

X_train: 143    C1(C=CC=C2)=C2C3=N/C1=N\C4=C5C(C=CC=C5)=C6N4[S...
25     CCCCCCCCC(CCCCCCCC)N1C2=C(C=C(C3=C(CCCCCC)C=C(...
888    CCCCC(CC)CC1(CC(CC)CCCC)C2=C(SC(C3=CC=C(C4=C(F...
700    N#C/C(C#N)=C/C1=CC2=C(C(S3)=C(C4=C3C5=C(C6=C(N...
478    CCCCCCC(S1)=CC=C1C2=CC=C(S2)C(S3)=CC=C3C(N(CC(...
                             ...                        
896    FC1=C(C2=NSN=C2C(C3=CC=C(C4=CC5=C(C(C=C(C(CC(C...
576    CCCCCCC(S1)=CC=C1C2=C3C(C=C(C4=CC=C(C5=CC(CCCC...
335    CCCCCCC1=C(SC(/C=C(C#N)\C#N)=C1CCCCCC)/C=C/C(S...
652    C1(C2=C(C3=CC=CC=C31)C=CC4=C52)=CC=C5C6=CC=C7C...
32     CC12C3(C)SC(C4=CC=C(/C=C/C5=CC=C(N(C6=CC=C(/C=...
Name: SMILES, Length: 882, dtype: object
y_train: 143    0.0
25     0.0
888    8.0
700    4.0
478    2.0
      ... 
896    8.0
576    3.0
335    1.0
652    4.0
32     0.0
Name: PCE, Length: 882, dtype: float64
X_train: 123    O=C1C2=C(C3=NC=C(C4=CC(SC(C=CC=C5)=C5N6CC(CCCC...
655    O=C1N(CC(CC)CCCC)C(C2=CC=C(C(S3)=CC(C3=C4OCC(C...
244    CCCCCCC1=C(/C=C\C2=CC=C(

## Generate 3D coordinate

The simplified molecular-input line-entry system (SMILES) of each molecule was processed by RDKit to obtain its 3D conformers. We applied experimental-torsion basic knowledge distance geometry (ETKDG) method (25) to generate conformers using the distance geometry and correct the conformers using the torsion angle preferences. Then, Merck molecular force field (MMFF) method(26) was used to further optimize the conformer of each molecule. All the molecules in SM and P dataset were represented as 3D coordinates after conformer optimization. The atoms and their 3D coordinates were served as the input of QDF.

In [5]:
from data_preprocess import coordinate

In [7]:
coordinate.generae_coordinate('train','test','val')



## Training

First, the QDF-SM model was trained on the small molecule donor dataset. Then, the QDF-SM model was fine-tuned on polymer donor dataset by transfer learning, and the QDF-P model was obtained. 

It is recommended to calculate on the supercomputing!

In [9]:
cd model

C:\Users\BM109X32G-10GPU-02\Documents\DeepDonor\model


bash preprocess.sh

bash SM.sh

bash DeepP.sh

## Predicting

The trained model can be used to predict PCE for new donor materials

The generation of 3D coordinate and preprocess are the same as training process

In [None]:
bash Predict.sh

## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn