# DeepAcceptor：Deep learning-based design and screening of non-fullerene acceptor materials for organic solar cells

It is a time-consuming and costly process to develop affordable and high-performance organic photovoltaic materials. Developing reliable computational methods to predict the power conversion efficiency (PCE) is crucial to triage unpromising molecules in large-scale databases and accelerate the material discovery process. In this study, a deep learning-based framework (DeepAcceptor) has been built to design and discover high-efficient small molecule acceptor materials. Specifically, an experimental dataset was constructed by collecting data from publications. Then, a BERT-based model was customized to predict PCEs by taking fully advantages of the atom, bond, connection information in molecular structures of acceptors, and this customized architecture is termed as abcBERT. The computation molecules and experimental molecules were used to pre-train and fine-tune the model, respectively.The molecular graph was used as the input and the computation molecules and experimental molecules were used to pretrain and finetune the model, respectively. DeepAcceptor is a promising method to predict the PCE and speed up the discovery of high-performance acceptor materials.

## Here, we have shown how to use the model to predict the PCE based on NFAs.

### 1. Download the pretrained and finetuned model 

In [19]:
pip install wget

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py): started
  Building wheel for wget (setup.py): finished with status 'done'
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=bbe0bf5cb9300b74075403b78b87a86314ebb304aff57056f38fc4dcb6bd6fca
  Stored in directory: C:\Users\BM109X32G-10GPU-02\AppData\Local\Temp\pip-ephem-wheel-cache-v97hsxon\wheels\bd\a8\c3\3cf2c14a1837a4e04bd98631724e81f33f462d86a1d895fae0
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Note: you may need to restart the kernel to use updated packages.


#### download the prediction model

In [20]:
import wget
url = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/data.h5"

In [21]:
wget.download(url,"regression_weights/data.h5")

  0% [                                                                        ]            0B / 16M  0% [                                                                        ]            8K / 16M  0% [                                                                        ]           16K / 16M  0% [                                                                        ]           24K / 16M  0% [                                                                        ]           32K / 16M  0% [                                                                        ]           40K / 16M  0% [                                                                        ]           48K / 16M  0% [                                                                        ]           56K / 16M  0% [                                                                        ]           64K / 16M  0% [                                                                        ]           72K / 16M

  3% [##                                                                      ]          672K / 16M  3% [##                                                                      ]          680K / 16M  4% [##                                                                      ]          688K / 16M  4% [##                                                                      ]          696K / 16M  4% [##                                                                      ]          704K / 16M  4% [##                                                                      ]          712K / 16M  4% [###                                                                     ]          720K / 16M  4% [###                                                                     ]          728K / 16M  4% [###                                                                     ]          736K / 16M  4% [###                                                                     ]          744K / 16M

  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M  8% [######                                                                  ]            1M / 16M

 22% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M 23% [################                                                        ]            3M / 16M

 30% [#####################                                                   ]            5M / 16M 30% [#####################                                                   ]            5M / 16M 30% [#####################                                                   ]            5M / 16M 30% [#####################                                                   ]            5M / 16M 30% [#####################                                                   ]            5M / 16M 30% [######################                                                  ]            5M / 16M 30% [######################                                                  ]            5M / 16M 30% [######################                                                  ]            5M / 16M 30% [######################                                                  ]            5M / 16M 30% [######################                                                  ]            5M / 16M

 40% [############################                                            ]            6M / 16M 40% [############################                                            ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M 40% [#############################                                           ]            6M / 16M

 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M 46% [#################################                                       ]            7M / 16M

 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M 52% [#####################################                                   ]            8M / 16M

 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M 57% [#########################################                               ]            9M / 16M

 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 61% [############################################                            ]           10M / 16M 62% [############################################                            ]           10M / 16M 62% [############################################                            ]           10M / 16M

 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M 69% [##################################################                      ]           11M / 16M

 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 74% [#####################################################                   ]           12M / 16M 75% [######################################################                  ]           12M / 16M

100% [########################################################################]           16M / 16M

'regression_weights/data.h5'

### download the pretrained model

In [22]:
url1 = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/bert_weightsMedium_80.h5"
url2 = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/bert_weights_encoderMedium_80.h5"
wget.download(url1,"medium_weights/bert_weightsMedium_80.h5")
wget.download(url2,"medium_weights/bert_weights_encoderMedium_80.h5")

  0% [                                                                        ]            0B / 16M  0% [                                                                        ]            8K / 16M  0% [                                                                        ]           16K / 16M  0% [                                                                        ]           24K / 16M  0% [                                                                        ]           32K / 16M  0% [                                                                        ]           40K / 16M  0% [                                                                        ]           48K / 16M  0% [                                                                        ]           56K / 16M  0% [                                                                        ]           64K / 16M  0% [                                                                        ]           72K / 16M

100% [########################################################################]           16M / 16M

'medium_weights/bert_weights_encoderMedium_80.h5'

## Note: Retraining the model is highly recommended to ensure the accuracy of the model!

### 2. Dataset preparation

The atom, bond, connection information were calculated by using rdkit.The training,test and validation dataset are preprocess by runing the utils.py

The preprocess of test dataset is shown as follows.

In [3]:
import utils

import os
from collections import OrderedDict

import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdchem

from compound_constants import DAY_LIGHT_FG_SMARTS_LIST


from utils import mol_to_geognn_graph_data_MMFF3d

In [6]:
    import pandas as pd 
    from tqdm import tqdm
    f = pd.read_csv (r"data/reg/test.csv")
    re = []
    pce = f['PCE']
    for ind,smile in enumerate ( f.iloc[:,0]):
        
        atom,adj = mol_to_geognn_graph_data_MMFF3d(smile)
        np.save('data/reg/test/adj'+str(ind)+'.npy',np.array(adj))
        re.append([atom,'data/reg/test/adj'+str(ind)+'.npy',pce[ind] ])
    r = pd.DataFrame(re)
    r.to_csv('data/reg/test/test.csv')
    print('Done!')
    

Done!


### It is recommended to retrain and calculate on the supercomputing!

### 3. Predict

In [2]:
import predict
from predict import *

E:\anaconda\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
E:\anaconda\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll


In [6]:
    result =[]
    r2_list = []
    for seed in [24]:
        print(seed)
        prediction_val= main('reg/test/test')
        result.append(prediction_val)
        
    print(result)

24
data
[array([10.609959 ,  7.0304646,  5.202743 ,  7.549271 ], dtype=float32)]


## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn