# DeepDonor: Computational discovery of donor materials with high power conversion efficiency for organic solar cells

Deep learning method (DeepDonor) has been developed to discover the high-performance donor materials for organic solar cells by predicting their power conversion efficiency (PCE) using the quantum deep field (QDF) model with excellent extrapolation performance. 

## Dataset preparation

In order to evaluate the models in different PCE intervals, stratified sampling was adopted by using scikit-learn. The data in each dataset were divided into 18 intervals based on their PCE values, and the training, validation and test sets were split randomly with a ratio of 8:1:1 in each interval independently. 

In [1]:
import dataset
from data_preprocess import dataset_process

In [2]:
dataset_process.generate_dataset ('dataset/SM/SM.csv','train','val','test')

TRAIN_INDEX: [143  25 888 700 478 482  98 279 814 106 969 626 284 177 366 214 392 820
 380 127 619 225 521 736 351 472 175 169 450 167 683 559 879 477 871 826
 594 567 605 305 271 280 577 331 453 418 452 530 174 297  88 910  93 227
 738  48 562 721 301  11 302 383 391 727 237 759 100 884 754 632 753 492
 215 132 344 273 777 841 101 706 877 467 314 725 346 571 949 737 531 715
 977 839 133 145 315 752 917 845 623 964 370 573 544 788 337 248  61 892
 328 834 150 308 289 139 543 724 744 440 961 205 776 270 373  16 902 518
 958 485 890 194 644 436 540 419 558 252 120 528 209 196 642 296 677 498
 671 937  97 258 934 417 862 309 648 508  45 299 109 750 653 203 743 974
 911 153 461 613 940 123 350 163 402 595 735 779 549 828  81 621 815 412
 728 802 204 372 499 663 539 333 487  51 771 609 295 179 813  24 973 230
 479  71 832  14 449 849 926 405  65 416 894 484 878 180 513  13 908 375
 323 283 491 382 957 144 324 590 646  22  38 806 446 389 235 137 489 378
 210 186  50 608 680 312 597 574 394 6

## Generate 3D coordinate

The simplified molecular-input line-entry system (SMILES) of each molecule was processed by RDKit to obtain its 3D conformers. We applied experimental-torsion basic knowledge distance geometry (ETKDG) method (25) to generate conformers using the distance geometry and correct the conformers using the torsion angle preferences. Then, Merck molecular force field (MMFF) method(26) was used to further optimize the conformer of each molecule. All the molecules in SM and P dataset were represented as 3D coordinates after conformer optimization. The atoms and their 3D coordinates were served as the input of QDF.

In [5]:
from data_preprocess import coordinate

In [7]:
coordinate.generae_coordinate('train','test','val')

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
erro48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
erro232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273

## Training

First, the QDF-SM model was trained on the small molecule donor dataset. Then, the QDF-SM model was fine-tuned on polymer donor dataset by transfer learning, and the QDF-P model was obtained. 

It is recommended to calculate on the supercomputing!

In [9]:
cd model

C:\Users\BM109X32G-10GPU-02\Documents\DeepDonor\model


bash preprocess.sh

bash SM.sh

bash DeepP.sh

## Predicting

The trained model can be used to predict PCE for new donor materials

The generation of 3D coordinate and preprocess are the same as training process

In [None]:
bash Predict.sh

## Acknowledgement

Jinyu Sun 

E-mail: jinyusun@csu.edu.cn