# DeepAcceptor：Deep learning-based design and screening of non-fullerene acceptor materials for organic solar cells

It is a time-consuming and costly process to develop affordable and high-performance organic photovoltaic materials. Developing reliable computational methods to predict the power conversion efficiency (PCE) is crucial to triage unpromising molecules in large-scale databases and accelerate the material discovery process. In this study, a deep learning-based framework (DeepAcceptor) has been built to design and discover high-efficient small molecule acceptor materials. Specifically, an experimental dataset was constructed by collecting data from publications. Then, a BERT-based model was customized to predict PCEs by taking fully advantages of the atom, bond, connection information in molecular structures of acceptors, and this customized architecture is termed as abcBERT. The computation molecules and experimental molecules were used to pre-train and fine-tune the model, respectively. The molecular graph was used as the input and the computation molecules and experimental molecules were used to pretrain and finetune the model, respectively. 
DeepAcceptor is a promising method to predict the PCE and speed up the discovery of high-performance acceptor materials.

It's a toy data example for the whole process. 
It was used to test that the code works. 
All parameters were set small to show how the abcBERT worked.

## Dataset preparation

### Download the pretrained and finetuned model 

In [4]:
pip install wget

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Note: you may need to restart the kernel to use updated packages.


In [5]:
import wget
url = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/data.h5"
wget.download(url,"regression_weights/data.h5")

  0% [                                                                        ]        0 / 17610752  0% [                                                                        ]     8192 / 17610752  0% [                                                                        ]    16384 / 17610752  0% [                                                                        ]    24576 / 17610752  0% [                                                                        ]    32768 / 17610752  0% [                                                                        ]    40960 / 17610752  0% [                                                                        ]    49152 / 17610752  0% [                                                                        ]    57344 / 17610752  0% [                                                                        ]    65536 / 17610752  0% [                                                                        ]    73728 / 17610752

  5% [...                                                                     ]   917504 / 17610752  5% [...                                                                     ]   925696 / 17610752  5% [...                                                                     ]   933888 / 17610752  5% [...                                                                     ]   942080 / 17610752  5% [...                                                                     ]   950272 / 17610752  5% [...                                                                     ]   958464 / 17610752  5% [...                                                                     ]   966656 / 17610752  5% [...                                                                     ]   974848 / 17610752  5% [....                                                                    ]   983040 / 17610752  5% [....                                                                    ]   991232 / 17610752

 18% [.............                                                           ]  3252224 / 17610752 18% [.............                                                           ]  3260416 / 17610752 18% [.............                                                           ]  3268608 / 17610752 18% [.............                                                           ]  3276800 / 17610752 18% [.............                                                           ]  3284992 / 17610752 18% [.............                                                           ]  3293184 / 17610752 18% [.............                                                           ]  3301376 / 17610752 18% [.............                                                           ]  3309568 / 17610752 18% [.............                                                           ]  3317760 / 17610752 18% [.............                                                           ]  3325952 / 17610752

 27% [...................                                                     ]  4775936 / 17610752 27% [...................                                                     ]  4784128 / 17610752 27% [...................                                                     ]  4792320 / 17610752 27% [...................                                                     ]  4800512 / 17610752 27% [...................                                                     ]  4808704 / 17610752 27% [...................                                                     ]  4816896 / 17610752 27% [...................                                                     ]  4825088 / 17610752 27% [...................                                                     ]  4833280 / 17610752 27% [...................                                                     ]  4841472 / 17610752 27% [...................                                                     ]  4849664 / 17610752

 34% [.........................                                               ]  6144000 / 17610752 34% [.........................                                               ]  6152192 / 17610752 34% [.........................                                               ]  6160384 / 17610752 35% [.........................                                               ]  6168576 / 17610752 35% [.........................                                               ]  6176768 / 17610752 35% [.........................                                               ]  6184960 / 17610752 35% [.........................                                               ]  6193152 / 17610752 35% [.........................                                               ]  6201344 / 17610752 35% [.........................                                               ]  6209536 / 17610752 35% [.........................                                               ]  6217728 / 17610752

 41% [.............................                                           ]  7274496 / 17610752 41% [.............................                                           ]  7282688 / 17610752 41% [.............................                                           ]  7290880 / 17610752 41% [.............................                                           ]  7299072 / 17610752 41% [.............................                                           ]  7307264 / 17610752 41% [.............................                                           ]  7315456 / 17610752 41% [.............................                                           ]  7323648 / 17610752 41% [.............................                                           ]  7331840 / 17610752 41% [..............................                                          ]  7340032 / 17610752 41% [..............................                                          ]  7348224 / 17610752

 47% [..................................                                      ]  8396800 / 17610752 47% [..................................                                      ]  8404992 / 17610752 47% [..................................                                      ]  8413184 / 17610752 47% [..................................                                      ]  8421376 / 17610752 47% [..................................                                      ]  8429568 / 17610752 47% [..................................                                      ]  8437760 / 17610752 47% [..................................                                      ]  8445952 / 17610752 48% [..................................                                      ]  8454144 / 17610752 48% [..................................                                      ]  8462336 / 17610752 48% [..................................                                      ]  8470528 / 17610752

 54% [......................................                                  ]  9535488 / 17610752 54% [.......................................                                 ]  9543680 / 17610752 54% [.......................................                                 ]  9551872 / 17610752 54% [.......................................                                 ]  9560064 / 17610752 54% [.......................................                                 ]  9568256 / 17610752 54% [.......................................                                 ]  9576448 / 17610752 54% [.......................................                                 ]  9584640 / 17610752 54% [.......................................                                 ]  9592832 / 17610752 54% [.......................................                                 ]  9601024 / 17610752 54% [.......................................                                 ]  9609216 / 17610752

 60% [...........................................                             ] 10665984 / 17610752 60% [...........................................                             ] 10674176 / 17610752 60% [...........................................                             ] 10682368 / 17610752 60% [...........................................                             ] 10690560 / 17610752 60% [...........................................                             ] 10698752 / 17610752 60% [...........................................                             ] 10706944 / 17610752 60% [...........................................                             ] 10715136 / 17610752 60% [...........................................                             ] 10723328 / 17610752 60% [...........................................                             ] 10731520 / 17610752 60% [...........................................                             ] 10739712 / 17610752

 66% [...............................................                         ] 11632640 / 17610752 66% [...............................................                         ] 11640832 / 17610752 66% [...............................................                         ] 11649024 / 17610752 66% [...............................................                         ] 11657216 / 17610752 66% [...............................................                         ] 11665408 / 17610752 66% [...............................................                         ] 11673600 / 17610752 66% [...............................................                         ] 11681792 / 17610752 66% [...............................................                         ] 11689984 / 17610752 66% [...............................................                         ] 11698176 / 17610752 66% [...............................................                         ] 11706368 / 17610752

 72% [....................................................                    ] 12730368 / 17610752 72% [....................................................                    ] 12738560 / 17610752 72% [....................................................                    ] 12746752 / 17610752 72% [....................................................                    ] 12754944 / 17610752 72% [....................................................                    ] 12763136 / 17610752 72% [....................................................                    ] 12771328 / 17610752 72% [....................................................                    ] 12779520 / 17610752 72% [....................................................                    ] 12787712 / 17610752 72% [....................................................                    ] 12795904 / 17610752 72% [....................................................                    ] 12804096 / 17610752

 80% [.........................................................               ] 14123008 / 17610752 80% [.........................................................               ] 14131200 / 17610752 80% [.........................................................               ] 14139392 / 17610752 80% [.........................................................               ] 14147584 / 17610752 80% [.........................................................               ] 14155776 / 17610752 80% [.........................................................               ] 14163968 / 17610752 80% [.........................................................               ] 14172160 / 17610752 80% [.........................................................               ] 14180352 / 17610752 80% [..........................................................              ] 14188544 / 17610752 80% [..........................................................              ] 14196736 / 17610752

 85% [.............................................................           ] 15130624 / 17610752 85% [.............................................................           ] 15138816 / 17610752 86% [.............................................................           ] 15147008 / 17610752 86% [.............................................................           ] 15155200 / 17610752 86% [.............................................................           ] 15163392 / 17610752 86% [..............................................................          ] 15171584 / 17610752 86% [..............................................................          ] 15179776 / 17610752 86% [..............................................................          ] 15187968 / 17610752 86% [..............................................................          ] 15196160 / 17610752 86% [..............................................................          ] 15204352 / 17610752

 90% [.................................................................       ] 15990784 / 17610752 90% [.................................................................       ] 15998976 / 17610752 90% [.................................................................       ] 16007168 / 17610752 90% [.................................................................       ] 16015360 / 17610752 90% [.................................................................       ] 16023552 / 17610752 91% [.................................................................       ] 16031744 / 17610752 91% [.................................................................       ] 16039936 / 17610752 91% [.................................................................       ] 16048128 / 17610752 91% [.................................................................       ] 16056320 / 17610752 91% [.................................................................       ] 16064512 / 17610752

 96% [.....................................................................   ] 16957440 / 17610752 96% [.....................................................................   ] 16965632 / 17610752 96% [.....................................................................   ] 16973824 / 17610752 96% [.....................................................................   ] 16982016 / 17610752 96% [.....................................................................   ] 16990208 / 17610752 96% [.....................................................................   ] 16998400 / 17610752 96% [.....................................................................   ] 17006592 / 17610752 96% [.....................................................................   ] 17014784 / 17610752 96% [.....................................................................   ] 17022976 / 17610752 96% [.....................................................................   ] 17031168 / 17610752

'regression_weights/data (1).h5'

In [6]:
url1 = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/bert_weightsMedium_80.h5"
url2 = r"https://github.com/JinYSun/DeepAcceptor/releases/download/v1.0.0/bert_weights_encoderMedium_80.h5"
wget.download(url1,"medium_weights/bert_weightsMedium_80.h5")
wget.download(url2,"medium_weights/bert_weights_encoderMedium_80.h5")

  0% [                                                                        ]        0 / 17408120  0% [                                                                        ]     8192 / 17408120  0% [                                                                        ]    16384 / 17408120  0% [                                                                        ]    24576 / 17408120  0% [                                                                        ]    32768 / 17408120  0% [                                                                        ]    40960 / 17408120  0% [                                                                        ]    49152 / 17408120  0% [                                                                        ]    57344 / 17408120  0% [                                                                        ]    65536 / 17408120  0% [                                                                        ]    73728 / 17408120

  5% [...                                                                     ]   892928 / 17408120  5% [...                                                                     ]   901120 / 17408120  5% [...                                                                     ]   909312 / 17408120  5% [...                                                                     ]   917504 / 17408120  5% [...                                                                     ]   925696 / 17408120  5% [...                                                                     ]   933888 / 17408120  5% [...                                                                     ]   942080 / 17408120  5% [...                                                                     ]   950272 / 17408120  5% [...                                                                     ]   958464 / 17408120  5% [...                                                                     ]   966656 / 17408120

 18% [.............                                                           ]  3153920 / 17408120 18% [.............                                                           ]  3162112 / 17408120 18% [.............                                                           ]  3170304 / 17408120 18% [.............                                                           ]  3178496 / 17408120 18% [.............                                                           ]  3186688 / 17408120 18% [.............                                                           ]  3194880 / 17408120 18% [.............                                                           ]  3203072 / 17408120 18% [.............                                                           ]  3211264 / 17408120 18% [.............                                                           ]  3219456 / 17408120 18% [.............                                                           ]  3227648 / 17408120

 25% [..................                                                      ]  4481024 / 17408120 25% [..................                                                      ]  4489216 / 17408120 25% [..................                                                      ]  4497408 / 17408120 25% [..................                                                      ]  4505600 / 17408120 25% [..................                                                      ]  4513792 / 17408120 25% [..................                                                      ]  4521984 / 17408120 26% [..................                                                      ]  4530176 / 17408120 26% [..................                                                      ]  4538368 / 17408120 26% [..................                                                      ]  4546560 / 17408120 26% [..................                                                      ]  4554752 / 17408120

 32% [.......................                                                 ]  5677056 / 17408120 32% [.......................                                                 ]  5685248 / 17408120 32% [.......................                                                 ]  5693440 / 17408120 32% [.......................                                                 ]  5701632 / 17408120 32% [.......................                                                 ]  5709824 / 17408120 32% [.......................                                                 ]  5718016 / 17408120 32% [.......................                                                 ]  5726208 / 17408120 32% [.......................                                                 ]  5734400 / 17408120 32% [.......................                                                 ]  5742592 / 17408120 33% [.......................                                                 ]  5750784 / 17408120

 38% [...........................                                             ]  6643712 / 17408120 38% [...........................                                             ]  6651904 / 17408120 38% [...........................                                             ]  6660096 / 17408120 38% [...........................                                             ]  6668288 / 17408120 38% [...........................                                             ]  6676480 / 17408120 38% [...........................                                             ]  6684672 / 17408120 38% [...........................                                             ]  6692864 / 17408120 38% [...........................                                             ]  6701056 / 17408120 38% [...........................                                             ]  6709248 / 17408120 38% [...........................                                             ]  6717440 / 17408120

 44% [................................                                        ]  7774208 / 17408120 44% [................................                                        ]  7782400 / 17408120 44% [................................                                        ]  7790592 / 17408120 44% [................................                                        ]  7798784 / 17408120 44% [................................                                        ]  7806976 / 17408120 44% [................................                                        ]  7815168 / 17408120 44% [................................                                        ]  7823360 / 17408120 44% [................................                                        ]  7831552 / 17408120 45% [................................                                        ]  7839744 / 17408120 45% [................................                                        ]  7847936 / 17408120

 51% [....................................                                    ]  8904704 / 17408120 51% [....................................                                    ]  8912896 / 17408120 51% [....................................                                    ]  8921088 / 17408120 51% [....................................                                    ]  8929280 / 17408120 51% [....................................                                    ]  8937472 / 17408120 51% [....................................                                    ]  8945664 / 17408120 51% [.....................................                                   ]  8953856 / 17408120 51% [.....................................                                   ]  8962048 / 17408120 51% [.....................................                                   ]  8970240 / 17408120 51% [.....................................                                   ]  8978432 / 17408120

 58% [.........................................                               ] 10117120 / 17408120 58% [.........................................                               ] 10125312 / 17408120 58% [.........................................                               ] 10133504 / 17408120 58% [.........................................                               ] 10141696 / 17408120 58% [.........................................                               ] 10149888 / 17408120 58% [..........................................                              ] 10158080 / 17408120 58% [..........................................                              ] 10166272 / 17408120 58% [..........................................                              ] 10174464 / 17408120 58% [..........................................                              ] 10182656 / 17408120 58% [..........................................                              ] 10190848 / 17408120

 64% [..............................................                          ] 11214848 / 17408120 64% [..............................................                          ] 11223040 / 17408120 64% [..............................................                          ] 11231232 / 17408120 64% [..............................................                          ] 11239424 / 17408120 64% [..............................................                          ] 11247616 / 17408120 64% [..............................................                          ] 11255808 / 17408120 64% [..............................................                          ] 11264000 / 17408120 64% [..............................................                          ] 11272192 / 17408120 64% [..............................................                          ] 11280384 / 17408120 64% [..............................................                          ] 11288576 / 17408120

 70% [..................................................                      ] 12328960 / 17408120 70% [...................................................                     ] 12337152 / 17408120 70% [...................................................                     ] 12345344 / 17408120 70% [...................................................                     ] 12353536 / 17408120 71% [...................................................                     ] 12361728 / 17408120 71% [...................................................                     ] 12369920 / 17408120 71% [...................................................                     ] 12378112 / 17408120 71% [...................................................                     ] 12386304 / 17408120 71% [...................................................                     ] 12394496 / 17408120 71% [...................................................                     ] 12402688 / 17408120

 77% [.......................................................                 ] 13426688 / 17408120 77% [.......................................................                 ] 13434880 / 17408120 77% [.......................................................                 ] 13443072 / 17408120 77% [.......................................................                 ] 13451264 / 17408120 77% [.......................................................                 ] 13459456 / 17408120 77% [.......................................................                 ] 13467648 / 17408120 77% [.......................................................                 ] 13475840 / 17408120 77% [.......................................................                 ] 13484032 / 17408120 77% [.......................................................                 ] 13492224 / 17408120 77% [.......................................................                 ] 13500416 / 17408120

 83% [............................................................            ] 14614528 / 17408120 83% [............................................................            ] 14622720 / 17408120 84% [............................................................            ] 14630912 / 17408120 84% [............................................................            ] 14639104 / 17408120 84% [............................................................            ] 14647296 / 17408120 84% [............................................................            ] 14655488 / 17408120 84% [............................................................            ] 14663680 / 17408120 84% [............................................................            ] 14671872 / 17408120 84% [............................................................            ] 14680064 / 17408120 84% [............................................................            ] 14688256 / 17408120

 90% [................................................................        ] 15671296 / 17408120 90% [................................................................        ] 15679488 / 17408120 90% [................................................................        ] 15687680 / 17408120 90% [................................................................        ] 15695872 / 17408120 90% [................................................................        ] 15704064 / 17408120 90% [................................................................        ] 15712256 / 17408120 90% [.................................................................       ] 15720448 / 17408120 90% [.................................................................       ] 15728640 / 17408120 90% [.................................................................       ] 15736832 / 17408120 90% [.................................................................       ] 15745024 / 17408120

 96% [.....................................................................   ] 16752640 / 17408120 96% [.....................................................................   ] 16760832 / 17408120 96% [.....................................................................   ] 16769024 / 17408120 96% [.....................................................................   ] 16777216 / 17408120 96% [.....................................................................   ] 16785408 / 17408120 96% [.....................................................................   ] 16793600 / 17408120 96% [.....................................................................   ] 16801792 / 17408120 96% [.....................................................................   ] 16809984 / 17408120 96% [.....................................................................   ] 16818176 / 17408120 96% [.....................................................................   ] 16826368 / 17408120

  0% [                                                                        ]        0 / 17095616  0% [                                                                        ]     8192 / 17095616  0% [                                                                        ]    16384 / 17095616  0% [                                                                        ]    24576 / 17095616  0% [                                                                        ]    32768 / 17095616  0% [                                                                        ]    40960 / 17095616  0% [                                                                        ]    49152 / 17095616  0% [                                                                        ]    57344 / 17095616  0% [                                                                        ]    65536 / 17095616  0% [                                                                        ]    73728 / 17095616

  5% [....                                                                    ]  1015808 / 17095616  5% [....                                                                    ]  1024000 / 17095616  6% [....                                                                    ]  1032192 / 17095616  6% [....                                                                    ]  1040384 / 17095616  6% [....                                                                    ]  1048576 / 17095616  6% [....                                                                    ]  1056768 / 17095616  6% [....                                                                    ]  1064960 / 17095616  6% [....                                                                    ]  1073152 / 17095616  6% [....                                                                    ]  1081344 / 17095616  6% [....                                                                    ]  1089536 / 17095616

 14% [..........                                                              ]  2523136 / 17095616 14% [..........                                                              ]  2531328 / 17095616 14% [..........                                                              ]  2539520 / 17095616 14% [..........                                                              ]  2547712 / 17095616 14% [..........                                                              ]  2555904 / 17095616 14% [..........                                                              ]  2564096 / 17095616 15% [..........                                                              ]  2572288 / 17095616 15% [..........                                                              ]  2580480 / 17095616 15% [..........                                                              ]  2588672 / 17095616 15% [..........                                                              ]  2596864 / 17095616

 21% [...............                                                         ]  3686400 / 17095616 21% [...............                                                         ]  3694592 / 17095616 21% [...............                                                         ]  3702784 / 17095616 21% [...............                                                         ]  3710976 / 17095616 21% [...............                                                         ]  3719168 / 17095616 21% [...............                                                         ]  3727360 / 17095616 21% [...............                                                         ]  3735552 / 17095616 21% [...............                                                         ]  3743744 / 17095616 21% [...............                                                         ]  3751936 / 17095616 21% [...............                                                         ]  3760128 / 17095616

 35% [.........................                                               ]  6127616 / 17095616 35% [.........................                                               ]  6135808 / 17095616 35% [.........................                                               ]  6144000 / 17095616 35% [.........................                                               ]  6152192 / 17095616 36% [.........................                                               ]  6160384 / 17095616 36% [.........................                                               ]  6168576 / 17095616 36% [..........................                                              ]  6176768 / 17095616 36% [..........................                                              ]  6184960 / 17095616 36% [..........................                                              ]  6193152 / 17095616 36% [..........................                                              ]  6201344 / 17095616

 42% [..............................                                          ]  7274496 / 17095616 42% [..............................                                          ]  7282688 / 17095616 42% [..............................                                          ]  7290880 / 17095616 42% [..............................                                          ]  7299072 / 17095616 42% [..............................                                          ]  7307264 / 17095616 42% [..............................                                          ]  7315456 / 17095616 42% [..............................                                          ]  7323648 / 17095616 42% [..............................                                          ]  7331840 / 17095616 42% [..............................                                          ]  7340032 / 17095616 42% [..............................                                          ]  7348224 / 17095616

 48% [..................................                                      ]  8257536 / 17095616 48% [..................................                                      ]  8265728 / 17095616 48% [..................................                                      ]  8273920 / 17095616 48% [..................................                                      ]  8282112 / 17095616 48% [..................................                                      ]  8290304 / 17095616 48% [..................................                                      ]  8298496 / 17095616 48% [..................................                                      ]  8306688 / 17095616 48% [...................................                                     ]  8314880 / 17095616 48% [...................................                                     ]  8323072 / 17095616 48% [...................................                                     ]  8331264 / 17095616

 53% [......................................                                  ]  9224192 / 17095616 54% [......................................                                  ]  9232384 / 17095616 54% [......................................                                  ]  9240576 / 17095616 54% [......................................                                  ]  9248768 / 17095616 54% [......................................                                  ]  9256960 / 17095616 54% [.......................................                                 ]  9265152 / 17095616 54% [.......................................                                 ]  9273344 / 17095616 54% [.......................................                                 ]  9281536 / 17095616 54% [.......................................                                 ]  9289728 / 17095616 54% [.......................................                                 ]  9297920 / 17095616

 59% [..........................................                              ] 10125312 / 17095616 59% [..........................................                              ] 10133504 / 17095616 59% [..........................................                              ] 10141696 / 17095616 59% [..........................................                              ] 10149888 / 17095616 59% [..........................................                              ] 10158080 / 17095616 59% [..........................................                              ] 10166272 / 17095616 59% [..........................................                              ] 10174464 / 17095616 59% [..........................................                              ] 10182656 / 17095616 59% [..........................................                              ] 10190848 / 17095616 59% [..........................................                              ] 10199040 / 17095616

 64% [..............................................                          ] 11108352 / 17095616 65% [..............................................                          ] 11116544 / 17095616 65% [..............................................                          ] 11124736 / 17095616 65% [..............................................                          ] 11132928 / 17095616 65% [..............................................                          ] 11141120 / 17095616 65% [..............................................                          ] 11149312 / 17095616 65% [..............................................                          ] 11157504 / 17095616 65% [...............................................                         ] 11165696 / 17095616 65% [...............................................                         ] 11173888 / 17095616 65% [...............................................                         ] 11182080 / 17095616

 72% [...................................................                     ] 12337152 / 17095616 72% [...................................................                     ] 12345344 / 17095616 72% [....................................................                    ] 12353536 / 17095616 72% [....................................................                    ] 12361728 / 17095616 72% [....................................................                    ] 12369920 / 17095616 72% [....................................................                    ] 12378112 / 17095616 72% [....................................................                    ] 12386304 / 17095616 72% [....................................................                    ] 12394496 / 17095616 72% [....................................................                    ] 12402688 / 17095616 72% [....................................................                    ] 12410880 / 17095616

 78% [........................................................                ] 13484032 / 17095616 78% [........................................................                ] 13492224 / 17095616 78% [........................................................                ] 13500416 / 17095616 79% [........................................................                ] 13508608 / 17095616 79% [........................................................                ] 13516800 / 17095616 79% [........................................................                ] 13524992 / 17095616 79% [........................................................                ] 13533184 / 17095616 79% [.........................................................               ] 13541376 / 17095616 79% [.........................................................               ] 13549568 / 17095616 79% [.........................................................               ] 13557760 / 17095616

 86% [.............................................................           ] 14712832 / 17095616 86% [.............................................................           ] 14721024 / 17095616 86% [..............................................................          ] 14729216 / 17095616 86% [..............................................................          ] 14737408 / 17095616 86% [..............................................................          ] 14745600 / 17095616 86% [..............................................................          ] 14753792 / 17095616 86% [..............................................................          ] 14761984 / 17095616 86% [..............................................................          ] 14770176 / 17095616 86% [..............................................................          ] 14778368 / 17095616 86% [..............................................................          ] 14786560 / 17095616

 92% [..................................................................      ] 15843328 / 17095616 92% [..................................................................      ] 15851520 / 17095616 92% [..................................................................      ] 15859712 / 17095616 92% [..................................................................      ] 15867904 / 17095616 92% [..................................................................      ] 15876096 / 17095616 92% [..................................................................      ] 15884288 / 17095616 92% [..................................................................      ] 15892480 / 17095616 93% [..................................................................      ] 15900672 / 17095616 93% [...................................................................     ] 15908864 / 17095616 93% [...................................................................     ] 15917056 / 17095616

 99% [....................................................................... ] 17072128 / 17095616 99% [....................................................................... ] 17080320 / 17095616 99% [....................................................................... ] 17088512 / 17095616100% [........................................................................] 17095616 / 17095616

'medium_weights/bert_weights_encoderMedium_80 (1).h5'

The atom types and bond information were calculated by using rdkit.The training,test and validation dataset are preprocess by runing the utils .py

In [2]:
import utils

import os
from collections import OrderedDict

import numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdchem

from compound_constants import DAY_LIGHT_FG_SMARTS_LIST


from utils import mol_to_geognn_graph_data_MMFF3d

In [3]:
    import pandas as pd 
    from tqdm import tqdm
    f = pd.read_csv (r"data/reg/train.csv")
    re = []
    pce = f['PCE']
    for ind,smile in enumerate ( f.iloc[:,1]):
        
        atom,adj = mol_to_geognn_graph_data_MMFF3d(smile)
        np.save('data/reg/train/adj'+str(ind)+'.npy',np.array(adj))
        re.append([atom,'data/reg/train/adj'+str(ind)+'.npy',pce[ind] ])
    r = pd.DataFrame(re)
    r.to_csv('data/reg/train/train.csv')
    print('done')
    

done


In [4]:
    f = pd.read_csv (r"data/reg/test.csv")
    re = []
    pce = f['PCE']
    for ind,smile in enumerate ( f.iloc[:,1]):
        
        atom,adj = mol_to_geognn_graph_data_MMFF3d(smile)
        np.save('data/reg/test/adj'+str(ind)+'.npy',np.array(adj))
        re.append([atom,'data/reg/test/adj'+str(ind)+'.npy',pce[ind] ])
    r = pd.DataFrame(re)
    r.to_csv('data/reg/test/test.csv')
    print('done')

done


In [6]:
        f = pd.read_table ('data/chem1.txt')
        re = []
        for ind,smile in enumerate ( f.iloc[:,0]):
            print(ind)
            atom,adj = mol_to_geognn_graph_data_MMFF3d(smile)
            np.save('data//adj/'+str(ind)+'.npy',np.array(adj))
            re.append([atom,'data/adj/'+str(ind)+'.npy'])
            r = pd.DataFrame(re)
            r.to_csv('data/adj/re.csv')

0
1
2
3
4


## Pre-Training

First, the  masked language model (MLM) task  was chosen as the SMILES was converted into molecular graph by using RDKit. Then, a supernode was added, which was made to connected to all the atoms in a molecule. A mask atoms model was used to pretrain the model similar to MLM task in NLP. As shown in Figure 1, the pretrained model consisting of the embedding layer, transformer encoder layers and classification layers was used to predict the masked atoms. The computational molecules were represented as embeddings including word token embeddings and positional embeddings. Then the embedding was used as the input of transformer encoder layers. Specifically, 15% of the atoms in a molecule were randomly selected, and these atoms have an 80% probability of being represented as [MASK], 10% probability of being replaced by other atoms and 10% probability of keeping unchanged. In pretraining stage, the classification linear layers were added to the transformer encoder layers and used to predict the masked atoms. The original molecules were used as the truth to train the model and predict the types of masked atoms.

### It is recommended to calculate on the supercomputing!

In [7]:
import pretrain

In [8]:
pretrain.main()

Epoch 1 Batch 0 Loss 0.7081
Accuracy: 0.0000
Test Accuracy: 0.0000
medium_weights/bert_weightsMedium_1.h5
Epoch 1 Loss 0.7081
Time taken for 1 epoch: 0.8134782314300537 secs

Accuracy: 0.0000
Saving checkpoint
Epoch 2 Batch 0 Loss nan
Accuracy: 0.0000
Test Accuracy: 0.0000
medium_weights/bert_weightsMedium_2.h5
Epoch 2 Loss nan
Time taken for 1 epoch: 0.3538072109222412 secs

Accuracy: 0.0000
Saving checkpoint


The pretrained model can be used to finetune the model.

## Train

The pre-trained model can be used to predict PCE for new NFA materials

In [9]:
import regression

In [10]:
    result =[]
    r2_list = []
    for seed in [24]:
        print(seed)
        r2 ,prediction_val,prediction_test= regression.main(seed)
        result.append(prediction_val)
        r2_list.append(r2)
    print(r2_list)

24
data
load_wieghts
best r2: 0.1220
best r2: 0.1220
stopping_monitor: 1
The model has been trained
[0.122]


## Predict

Prediction on large scale dataset

In [11]:
import predict
from predict import *

In [13]:
np.set_printoptions(threshold=sys.maxsize)
prediction_val= main()

data
finish!  Results can be found in abcBERT/results.csv


Prediction for single molecule

In [15]:
import predictbysmiles

from predictbysmiles import *


In [18]:
prediction_val = predictbysmiles.main ('CCCCCCCCC1=CC=C(C2(C3=CC=C(CCCCCCCC)C=C3)C3=CC4=C(C=C3C3=C2C2=C(C=C(C5=CC=C(/C=C6/C(=O)C7=C(C=CC=C7)C6=C(C#N)C#N)C6=NSN=C56)S2)S3)C(C2=CC=C(CCCCCCCC)C=C2)(C2=CC=C(CCCCCCCC)C=C2)C2=C4SC3=C2SC(C2=CC=C(/C=C4\C(=O)C5=C(C=CC=C5)C4=C(C#N)C#N)C4=NSN=C24)=C3)C=C1')

[10.348401]


## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn