# BiBERTa：Deep learning-assisted to accelerate the discovery of donor/acceptor pairs for high-performance organic solar cells

It is a deep learning-based framework built for new donor/acceptor pairs discovery. The framework contains data collection section, PCE prediction section and molecular discovery section. Specifically, a large D/A pair dataset was built by collecting experimental data from literature. Then, a novel RoBERTa-based dual-encoder model (BiBERTa) was developed for PCE prediction by using the SMILES of donor and acceptor pairs as the input. Two pretrained ChemBERTa2 encoders were loaded as initial parameters of the dual-encoder. The model was trained, tested and validated on the experimental dataset.

## Here, we have shown how to use the model to predict the PCE based on NFAs.

### 1. Download the pretrained and finetuned model 

In [19]:
pip install wget

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py): started
  Building wheel for wget (setup.py): finished with status 'done'
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=bbe0bf5cb9300b74075403b78b87a86314ebb304aff57056f38fc4dcb6bd6fca
  Stored in directory: C:\Users\BM109X32G-10GPU-02\AppData\Local\Temp\pip-ephem-wheel-cache-v97hsxon\wheels\bd\a8\c3\3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Note: you may need to restart the kernel to use updated packages.


#### download the prediction model

In [5]:
import wget
url = r"https://github.com/JinYSun/biberta/releases/download/v1.0.0/test.ckpt"

In [8]:
wget.download(url,"OSC/test.ckpt")

  0% [                                                                        ]        0 / 81596523  0% [                                                                        ]     8192 / 81596523  0% [                                                                        ]    16384 / 81596523  0% [                                                                        ]    24576 / 81596523  0% [                                                                        ]    32768 / 81596523  0% [                                                                        ]    40960 / 81596523  0% [                                                                        ]    49152 / 81596523  0% [                                                                        ]    57344 / 81596523  0% [                                                                        ]    65536 / 81596523  0% [                                                                        ]    73728 / 81596523

  0% [                                                                        ]   704512 / 81596523  0% [                                                                        ]   712704 / 81596523  0% [                                                                        ]   720896 / 81596523  0% [                                                                        ]   729088 / 81596523  0% [                                                                        ]   737280 / 81596523  0% [                                                                        ]   745472 / 81596523  0% [                                                                        ]   753664 / 81596523  0% [                                                                        ]   761856 / 81596523  0% [                                                                        ]   770048 / 81596523  0% [                                                                        ]   778240 / 81596523

  3% [..                                                                      ]  3244032 / 81596523  3% [..                                                                      ]  3252224 / 81596523  3% [..                                                                      ]  3260416 / 81596523  4% [..                                                                      ]  3268608 / 81596523  4% [..                                                                      ]  3276800 / 81596523  4% [..                                                                      ]  3284992 / 81596523  4% [..                                                                      ]  3293184 / 81596523  4% [..                                                                      ]  3301376 / 81596523  4% [..                                                                      ]  3309568 / 81596523  4% [..                                                                      ]  3317760 / 81596523

  6% [....                                                                    ]  5611520 / 81596523  6% [....                                                                    ]  5619712 / 81596523  6% [....                                                                    ]  5627904 / 81596523  6% [....                                                                    ]  5636096 / 81596523  6% [....                                                                    ]  5644288 / 81596523  6% [....                                                                    ]  5652480 / 81596523  6% [....                                                                    ]  5660672 / 81596523  6% [.....                                                                   ]  5668864 / 81596523  6% [.....                                                                   ]  5677056 / 81596523  6% [.....                                                                   ]  5685248 / 81596523

  8% [.....                                                                   ]  6782976 / 81596523  8% [.....                                                                   ]  6791168 / 81596523  8% [.....                                                                   ]  6799360 / 81596523  8% [......                                                                  ]  6807552 / 81596523  8% [......                                                                  ]  6815744 / 81596523  8% [......                                                                  ]  6823936 / 81596523  8% [......                                                                  ]  6832128 / 81596523  8% [......                                                                  ]  6840320 / 81596523  8% [......                                                                  ]  6848512 / 81596523  8% [......                                                                  ]  6856704 / 81596523

  9% [.......                                                                 ]  8110080 / 81596523  9% [.......                                                                 ]  8118272 / 81596523  9% [.......                                                                 ]  8126464 / 81596523  9% [.......                                                                 ]  8134656 / 81596523  9% [.......                                                                 ]  8142848 / 81596523  9% [.......                                                                 ]  8151040 / 81596523  9% [.......                                                                 ]  8159232 / 81596523 10% [.......                                                                 ]  8167424 / 81596523 10% [.......                                                                 ]  8175616 / 81596523 10% [.......                                                                 ]  8183808 / 81596523

 11% [........                                                                ]  9420800 / 81596523 11% [........                                                                ]  9428992 / 81596523 11% [........                                                                ]  9437184 / 81596523 11% [........                                                                ]  9445376 / 81596523 11% [........                                                                ]  9453568 / 81596523 11% [........                                                                ]  9461760 / 81596523 11% [........                                                                ]  9469952 / 81596523 11% [........                                                                ]  9478144 / 81596523 11% [........                                                                ]  9486336 / 81596523 11% [........                                                                ]  9494528 / 81596523

 13% [.........                                                               ] 10747904 / 81596523 13% [.........                                                               ] 10756096 / 81596523 13% [.........                                                               ] 10764288 / 81596523 13% [.........                                                               ] 10772480 / 81596523 13% [.........                                                               ] 10780672 / 81596523 13% [.........                                                               ] 10788864 / 81596523 13% [.........                                                               ] 10797056 / 81596523 13% [.........                                                               ] 10805248 / 81596523 13% [.........                                                               ] 10813440 / 81596523 13% [.........                                                               ] 10821632 / 81596523

 14% [..........                                                              ] 12075008 / 81596523 14% [..........                                                              ] 12083200 / 81596523 14% [..........                                                              ] 12091392 / 81596523 14% [..........                                                              ] 12099584 / 81596523 14% [..........                                                              ] 12107776 / 81596523 14% [..........                                                              ] 12115968 / 81596523 14% [..........                                                              ] 12124160 / 81596523 14% [..........                                                              ] 12132352 / 81596523 14% [..........                                                              ] 12140544 / 81596523 14% [..........                                                              ] 12148736 / 81596523

 16% [...........                                                             ] 13336576 / 81596523 16% [...........                                                             ] 13344768 / 81596523 16% [...........                                                             ] 13352960 / 81596523 16% [...........                                                             ] 13361152 / 81596523 16% [...........                                                             ] 13369344 / 81596523 16% [...........                                                             ] 13377536 / 81596523 16% [...........                                                             ] 13385728 / 81596523 16% [...........                                                             ] 13393920 / 81596523 16% [...........                                                             ] 13402112 / 81596523 16% [...........                                                             ] 13410304 / 81596523

 17% [............                                                            ] 14647296 / 81596523 17% [............                                                            ] 14655488 / 81596523 17% [............                                                            ] 14663680 / 81596523 17% [............                                                            ] 14671872 / 81596523 17% [............                                                            ] 14680064 / 81596523 18% [............                                                            ] 14688256 / 81596523 18% [............                                                            ] 14696448 / 81596523 18% [............                                                            ] 14704640 / 81596523 18% [............                                                            ] 14712832 / 81596523 18% [............                                                            ] 14721024 / 81596523

 19% [..............                                                          ] 16039936 / 81596523 19% [..............                                                          ] 16048128 / 81596523 19% [..............                                                          ] 16056320 / 81596523 19% [..............                                                          ] 16064512 / 81596523 19% [..............                                                          ] 16072704 / 81596523 19% [..............                                                          ] 16080896 / 81596523 19% [..............                                                          ] 16089088 / 81596523 19% [..............                                                          ] 16097280 / 81596523 19% [..............                                                          ] 16105472 / 81596523 19% [..............                                                          ] 16113664 / 81596523

 21% [...............                                                         ] 17465344 / 81596523 21% [...............                                                         ] 17473536 / 81596523 21% [...............                                                         ] 17481728 / 81596523 21% [...............                                                         ] 17489920 / 81596523 21% [...............                                                         ] 17498112 / 81596523 21% [...............                                                         ] 17506304 / 81596523 21% [...............                                                         ] 17514496 / 81596523 21% [...............                                                         ] 17522688 / 81596523 21% [...............                                                         ] 17530880 / 81596523 21% [...............                                                         ] 17539072 / 81596523

 23% [................                                                        ] 18776064 / 81596523 23% [................                                                        ] 18784256 / 81596523 23% [................                                                        ] 18792448 / 81596523 23% [................                                                        ] 18800640 / 81596523 23% [................                                                        ] 18808832 / 81596523 23% [................                                                        ] 18817024 / 81596523 23% [................                                                        ] 18825216 / 81596523 23% [................                                                        ] 18833408 / 81596523 23% [................                                                        ] 18841600 / 81596523 23% [................                                                        ] 18849792 / 81596523

 24% [.................                                                       ] 20103168 / 81596523 24% [.................                                                       ] 20111360 / 81596523 24% [.................                                                       ] 20119552 / 81596523 24% [.................                                                       ] 20127744 / 81596523 24% [.................                                                       ] 20135936 / 81596523 24% [.................                                                       ] 20144128 / 81596523 24% [.................                                                       ] 20152320 / 81596523 24% [.................                                                       ] 20160512 / 81596523 24% [.................                                                       ] 20168704 / 81596523 24% [.................                                                       ] 20176896 / 81596523

 26% [..................                                                      ] 21397504 / 81596523 26% [..................                                                      ] 21405696 / 81596523 26% [..................                                                      ] 21413888 / 81596523 26% [..................                                                      ] 21422080 / 81596523 26% [..................                                                      ] 21430272 / 81596523 26% [..................                                                      ] 21438464 / 81596523 26% [..................                                                      ] 21446656 / 81596523 26% [..................                                                      ] 21454848 / 81596523 26% [..................                                                      ] 21463040 / 81596523 26% [..................                                                      ] 21471232 / 81596523

 27% [...................                                                     ] 22544384 / 81596523 27% [...................                                                     ] 22552576 / 81596523 27% [...................                                                     ] 22560768 / 81596523 27% [...................                                                     ] 22568960 / 81596523 27% [...................                                                     ] 22577152 / 81596523 27% [...................                                                     ] 22585344 / 81596523 27% [...................                                                     ] 22593536 / 81596523 27% [...................                                                     ] 22601728 / 81596523 27% [...................                                                     ] 22609920 / 81596523 27% [...................                                                     ] 22618112 / 81596523

100% [........................................................................] 81596523 / 81596523

'OSC/test.ckpt'

### 2. Predict of large-scale dataset

In [3]:
import screen

screen.smiles_aas_test(r'dataset/OSC/test.csv')

tokenizer_config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config.json:   0%|          | 0.00/631 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/6.96k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.26k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/420 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config.json:   0%|          | 0.00/17.7k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/6.96k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.26k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/420 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/13.7M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MLM and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


pytorch_model.bin:   0%|          | 0.00/14.0M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MTR and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MLM and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-10M-MTR and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[{'acceptor': 'CCCCCCCCC1(CCCCCCCC)c2cc3c(cc2-c2cc4c(cc21)-c1sc(/C=C2\\C(=O)c5ccccc5C2=C(C#N)C#N)cc1C4(CCCCCCCC)CCCCCCCC)C(CCCCCCCC)(CCCCCCCC)c1cc(/C=C2\\C(=O)c4ccccc4C2=C(C#N)C#N)sc1-3',
  'donor': 'CCCCC(CC)Cc1sc(-c2c3cc(-c4ccc(-c5sc(-c6ccc(C)s6)c6c5C(=O)c5c(CC(CC)CCCC)sc(CC(CC)CCCC)c5C6=O)s4)sc3c(-c3cc(F)c(CC(CC)CCCC)s3)c3cc(C)sc23)cc1F',
  'predict': 10.976292610168457},
 {'acceptor': 'CCCCCCc1ccc(C2(c3ccc(CCCCCC)cc3)c3cc4c(cc3-c3sc5cc(/C=C6\\C(=O)c7ccccc7C6=C(C#N)C#N)sc5c32)C(c2ccc(CCCCCC)cc2)(c2ccc(CCCCCC)cc2)c2c-4sc3cc(/C=C4\\C(=O)c5ccccc5C4=C(C#N)C#N)sc23)cc1',
  'donor': 'CCCCCCCCOc1cccc(-c2nc3c(-c4ccc(C)s4)c(F)c(F)c(-c4ccc(-c5cc6c(-c7cc(F)c(CC(CC)CCCC)s7)c7sc(C)cc7c(-c7cc(F)c(CC(CC)CCCC)s7)c6s5)s4)c3nc2-c2cccc(OCCCCCCCC)c2)c1',
  'predict': 8.411056518554688},
 {'acceptor': 'CCCCCCc1ccc(C2(c3ccc(CCCCCC)cc3)c3c(sc4cc(/C=C5\\C(=O)c6cc(F)c(F)cc6C5=C(C#N)C#N)sc34)-c3sc4c5c(sc4c32)-c2sc3cc(/C=C4\\C(=O)c6cc(F)c(F)cc6C4=C(C#N)C#N)sc3c2C5(c2ccc(CCCCCC)cc2)c2ccc(CCCCCC)cc2)cc1',
  'do

Users can also use API to predict the PCE locally as follows. It should be noted that the Hugging Face Space (web server) is must running instead of building before prediction or  the read operation will timed out .

### 3. Predict of D/A pair SMILES

In [5]:
import run
a = run.smiles_adp_test ('CCCCC(CC)CC1=C(F)C=C(C2=C3C=C(C4=CC=C(C5=C6C(=O)C7=C(CC(CC)CCCC)SC(CC(CC)CCCC)=C7C(=O)C6=C(C6=CC=C(C)S6)S5)S4)SC3=C(C3=CC(F)=C(CC(CC)CCCC)S3)C3=C2SC(C)=C3)S1','CCCCC(CC)CC1=CC=C(C2=C3C=C(C)SC3=C(C3=CC=C(CC(CC)CCCC)S3)C3=C2SC(C2=CC4=C(C5=CC(Cl)=C(CC(CC)CCCC)S5)C5=C(C=C(C)S5)C(C5=CC(Cl)=C(CC(CC)CCCC)S5)=C4S2)=C3)S1')  
print(a)

7.416102886199951


## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn