# DeepDAP：Deep learning-assisted to accelerate the discovery of donor/acceptor pairs for high-performance organic solar cells

It is a deep learning-based framework built for new donor/acceptor pairs (DeepDAP) discovery. The framework contains data collection section, PCE prediction section and molecular discovery section. Specifically, a large D/A pair dataset was built by collecting experimental data from literature. Then, a novel RoBERTa-based dual-encoder model (DeRoBERTa) was developed for PCE prediction by using the SMILES of donor and acceptor pairs as the input. Two pretrained ChemBERTa2 encoders were loaded as initial parameters of the dual-encoder. The model was trained, tested and validated on the experimental dataset.

## Here, we have shown how to use the model to predict the PCE based on the SMILES of donor and acceptors.

### 1. Download the trained model 

In [19]:
pip install wget

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py): started
  Building wheel for wget (setup.py): finished with status 'done'
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=bbe0bf5cb9300b74075403b78b87a86314ebb304aff57056f38fc4dcb6bd6fca
  Stored in directory: C:\Users\BM109X32G-10GPU-02\AppData\Local\Temp\pip-ephem-wheel-cache-v97hsxon\wheels\bd\a8\c3\3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Note: you may need to restart the kernel to use updated packages.


#### download the prediction model

In [1]:
import wget
url = r"https://github.com/JinYSun/DeepDAP/releases/download/v1.0.0/test.ckpt"

In [5]:
wget.download(url,"DeepDAP/OSC/test.ckpt")

100% [........................................................................] 81596523 / 81596523

'DeepDAP/OSC/test (1).ckpt'

### It is recommended to retrain and calculate on the supercomputing!

### 2. Predict For large-scale screening

put the dataset containing the SMILES of donors and acceptors in dataset/OSC/test.csv

In [1]:
from DeepDAP import screen

Some weights of the model checkpoint at DeepChem/ChemBERTa-10M-MLM were not used when initializing RobertaModel: ['lm_head.decoder.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MLM and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be 

In [2]:
x = screen.smiles_aas_test( r"DeepDAP\dataset\OSC\test.csv")


100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:19<00:00,  4.83s/it]


### 3. Predict the PCE by using D/A pairs

In [3]:
from DeepDAP import run

Some weights of the model checkpoint at DeepChem/ChemBERTa-10M-MLM were not used when initializing RobertaModel: ['lm_head.decoder.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MLM and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be 

In [5]:
a = run.smiles_adp_test('CCCCC(CC)CC1=C(F)C=C(C2=C3C=C(C4=CC=C(C5=C6C(=O)C7=C(CC(CC)CCCC)SC(CC(CC)CCCC)=C7C(=O)C6=C(C6=CC=C(C)S6)S5)S4)SC3=C(C3=CC(F)=C(CC(CC)CCCC)S3)C3=C2SC(C)=C3)S1','CCCCC(CC)CC1=CC=C(C2=C3C=C(C)SC3=C(C3=CC=C(CC(CC)CCCC)S3)C3=C2SC(C2=CC4=C(C5=CC(Cl)=C(CC(CC)CCCC)S5)C5=C(C=C(C)S5)C(C5=CC(Cl)=C(CC(CC)CCCC)S5)=C4S2)=C3)S1')                         
print(a)
                     

7.416102886199951


## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn