
# <a id='toc1_'></a>[ ](#toc0_)
|
$\LARGE \textbf{\textcolor{red}{PCE: Designing Organic Donor and Acceptor Molecules for Photovoltaic Devices}}$
|
---

 ###  **MVOTO KONGO Patrick Sorrel**, sorrel.mvoto@facsciences-uy1.cm

 
* Department of Physics, Faculty of Science, University of Yaounde I,  Ph.D Candidate
* AtomiC Molecular Physics et Biophysics
  
  
26 JANVIER 2025

---

<table width="100%"><tr style="background-color:white;">
    <td style="text-align:left;padding:0px;width:142px'">
        <a href="https://qworld.net" target="_blank">
            <img src="images/QC.jpg"></a></td>
            <img src="images/ML.jpeg"></a></td>
    <td width="*">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
    <!-- ############################################# -->
    <td style="padding:0px;width:40px;">
        <a href="https://github.com/GitNindjapatrick/" target="_blank">
        <img align="right" src="images/github.png" width="40px"></a> </td>
    <td style="padding:0px;width:5px;"></td>
    <td style="padding:0px;width:40px;">
        <a href="https://linkedin.com/in/patrick-sorrel-mvoto-kongo-641a41273" target="_blank">
        <img align="right" src="images/LinkedIn.png"></a></td>
    <td style="padding:0px;width:5px;"></td>
    <!-- ############################################# -->
    <td style="padding:0px;width:40px;">
        <a href="https://discord.MVOTO.net"
           target="_blank">
        <img align="right" src="images/discord.jpeg"></a></td>
</tr></table>


[Assessing the Synthetic Compatibility of GDB-9 Molecules with PCBM and PCDTBT for Organic Solar Cells: A Computational Approach](First_test_result.ipynb) &nbsp;|&nbsp;
[Computational Screening of GDB-9 Molecules for Organic Semiconductors](MVOTO_EXTRACTION_GDB.ipynb)&nbsp;|&nbsp;
[PCE: Designing Organic Donor and Acceptor Molecules for Photovoltaic Devices](PCE_GDB9.ipynb) &nbsp;|&nbsp;
|[gdb9: Openbabel xTB-crest and Pyscf Quantum Chemistry Calculations Assessment](MVOTO_PROPRIETES.ipynb)&nbsp;|&nbsp;




## 1. Introduction
Organic photovoltaic (OPV) devices rely on efficient donor-acceptor pairs for charge separation. This study aims to identify suitable organic donor molecules for use with PCBM and acceptor molecules for use with PCDTBT using the GDB-9 database and the Scharber model.

### Objectives:
1. **Donor Molecule Selection**: Identify organic donor molecules compatible with PCBM.
2. **Acceptor Molecule Selection**: Identify organic acceptor molecules compatible with PCDTBT.

---

## 2. Methodology

### 2.1 Selection Criteria
- **HOMO(eV)-LUMO_DFT GAP_DFT**: Suitable for efficient charge transfer.
- **Energy Level Alignment**: Favorable alignment with PCBM or PCDTBT.
- **Stability**: Ionization potential (IP) and electron affinity (EA) within acceptable ranges.

### 2.2 Computational Tools
- **Software**: PySCF for DFT calculations.
- **Functional**: B3LYP with 6-31G(2df,p) basis set.
- **Analysis**:
  - HOMO(eV)-LUMO_DFT GAP_DFT calculation.
  - Energy level alignment with PCBM and PCDTBT.
  - Binding energy calculations.

### 2.3 Scharber Model
The Scharber model is used to predict the power conversion efficiency (PCE) of OPV devices based on the donor-acceptor pair's electronic properties:
\[ 
\text{PCE} = \frac{J_{\text{sc}} \times V_{\text{oc}} \times \text{FF}}{100} 
\]
where \( J_{\text{sc}} \) is the short-circuit current, \( V_{\text{oc}} \) is the open-circuit voltage, and FF is the fill factor.

---

## 3. Results and Discussion

### 3.1 Donor Molecule Selection
- **Energy Level Alignment**: HOMO(eV) of the donor molecule is aligned with the LUMO_DFT of PCBM.
- **Charge Transfer Efficiency**: Calculated using the charge transfer integral (J).

### 3.2 Acceptor Molecule Selection
- **Energy Level Alignment**: LUMO_DFT of the acceptor molecule is aligned with the HOMO(eV) of PCDTBT.
- **Stability and Interaction**: Binding energies and non-covalent interactions are evaluated.

### 3.3 PCE Prediction Using Scharber Model
- **Short-Circuit Current (\( J_{\text{sc}} \))**: Estimated based on charge transfer efficiency.
- **Open-Circuit Voltage (\( V_{\text{oc}} \))**: Calculated from the difference between the donor's HOMO(eV) and the acceptor's LUMO_DFT.
- **Fill Factor (FF)**: Assumed or calculated based on device characteristics.

---
E values for optimized donor-acceptor pairs.

---


## References
1. **Assessing the Synthetic Compatibility of GDB-9 Molecules**: [DOI: 10.1021/acs.jpcb.9b03234](https://doi.org/10.1021/acs.jpcb.9b03234)
2. **Scharber Model for PCE Prediction**: [DOI: 10.1002/adma.200390093](https://doi.org/10.1002/adma.200390093)
3. **GDB-9 Database**: [DOI: 10.1039/C7SC02266K](https://doi.org/10.1039/C7SC02266K)



In [1]:
import pandas.util  # Assuming 'util' is an alias for pandas.util
import pandas as pd
# Read the CSV file
#df = pd.read_pickle("news_dataset_for_ml.pkl")
df = pd.read_pickle("dataset_opv_properties.pkl")
df


Unnamed: 0,mol_id,SMILES,formula,mass,inchi,HOMO(eV),LUMO(eV),GAP(eV),dipole_moment,mulliken_charges
0,000000001,[O-]C(=O)CC(C[N+](C)(C)C)OC(=O)C,C9H17NO4,203.235580,InChI=1S/C9H17NO4/c1-7(11)14-8(5-9(12)13)6-10(...,-4.609609,-0.261229,4.348379,11.419443,"[-0.542286, 0.622923, -0.486172, -0.478169, 0...."
1,000000003,OC1C=CC=C(C1O)C(=O)O,C7H8O4,156.136020,InChI=1S/C7H8O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-...,-6.721212,-2.133373,4.587840,5.099241,"[-0.122757, -0.14337, 0.076825, 0.022155, 0.03..."
2,000000004,CC(CN)O,C3H9NO,75.109660,"InChI=1S/C3H9NO/c1-3(5)2-4/h3,5H,2,4H2,1H3",-6.508963,1.844932,8.353895,2.773043,"[-0.463549, 0.167547, -0.158924, -0.741233, -0..."
3,000000005,NCC(=O)COP(=O)(O)O,C3H8NO5P,169.073081,"InChI=1S/C3H8NO5P/c4-1-3(5)2-9-10(6,7)8/h1-2,4...",-7.028701,-1.371454,5.657247,6.730409,"[-0.258052, 0.470183, -0.474284, -0.102044, -0..."
4,000000006,[O-][N+](=O)c1ccc(c(c1)[N+](=O)[O-])Cl,C6H3ClN2O4,202.552020,InChI=1S/C6H3ClN2O4/c7-5-2-1-4(8(10)11)3-6(5)9...,-8.051849,-3.137473,4.914376,3.716931,"[-0.125953, -0.148386, -0.088975, 0.265526, -0..."
...,...,...,...,...,...,...,...,...,...,...
17453,000024994,COC(CC(OC)OC)C,C7H16O3,148.200140,"InChI=1S/C7H16O3/c1-6(8-2)5-7(9-3)10-4/h6-7H,5...",-6.530732,2.231334,8.762066,1.212562,"[-0.467136, 0.146682, -0.337628, 0.390323, -0...."
17454,000024995,N[C@H](C(=O)O)CC(Cl)Cl,C4H7Cl2NO2,172.009880,"InChI=1S/C4H7Cl2NO2/c5-3(6)1-2(7)4(8)9/h2-3H,1...",-7.360680,-0.220412,7.140267,5.064914,"[-0.268207, -0.100945, 0.200411, 0.627104, -0...."
17455,000024997,CN(CCOC1c2ccccc2CCc2c1cccc2)C,C19H23NO,281.392020,InChI=1S/C19H23NO/c1-20(2)13-14-21-19-17-9-5-3...,-5.942966,0.002721,5.945688,1.450867,"[-0.298012, -0.384351, -0.304429, -0.143074, -..."
17456,000024998,CC(=O)OC(CCl)Cl,C4H6Cl2O2,156.995240,"InChI=1S/C4H6Cl2O2/c1-3(7)8-4(6)2-5/h4H,2H2,1H3",-7.923955,-0.568718,7.355237,1.843535,"[-0.549803, 0.634713, -0.440806, -0.441348, 0...."



 ###  <a id='toc1_4_'></a>[Densité de courant de court-circuit(${J_{sc}}$)](#toc0_)


 <div  class="alert alert-info">
Dépend du nombre de photons absorbes par le matériau et donc de l’ ́épaisseur
de la couche active ainsi que de son spectre d’absorption et représente la quantité maximale courant électrique que cellule solaire organique peut générer dans des conditions de court-circuit

</div> 

\begin{equation}
\begin{split}

{J_{sc}}= Ae^{-E_{GAP_DFT}^2/B}\\.
\end{split}
\end{equation}
A et B représente la paramètres d'ajustements et déjà fixés dans TartarusA = 433.11633173034136 ,B = 2.3353220382662894, ${E_{GAP_DFT}}$  représente le énergies du GAP_DFT du donneurs correspondant a la différences de deux orbitales moléculaire HOMO(eV) et LUMO_DFT sa valeur maximale dans les travaux CEP est estimée ${3.8 eV}$ et donc les gamme de valeurs normales se situe entre ${[0.8856;3.2627 ]eV}$

###  <a id='toc1_4_'></a>[Facteur de forme / facteur de remplissage  (FF) ](#toc0_)


 <div  class="alert alert-info">
 
* Informe sur la capacité du transport des charges dans le dispositif et sur la qualité de l’interface entre le donneur et l’accepteur estimé a 65 %

* Est proportionnelle a l'efficacite quantique externe lorsques des énergies photons absorbé >${E_{GAP_DFT}}$  
</div> 
        

\begin{equation}
\begin{split}
                 
{FF}= \frac{V_{oc}}{V_{oc}+a*{K_{b}}*T}
\end{split}
\end{equation}

* ${a}$ le coefficient d'absorption 
* $ {K_{b}}$ la constance de Boltzmann
* ${T}$ la temperature externes

###  <a id='toc1_4_'></a>[Efficacité de conversion (PCE)](#toc0_)


 <div  class="alert alert-info">
Rapport entre la puissance électrique produite par la cellule et la puissance lumineuse incidente  sa valeurs actuelles est estimée a 12% l'objectif est les 20%

</div> 
        

\begin{equation}
\begin{split}
                 
{PCE}={100} \frac{V_{oc}*FF*J_{sc}}{P_{in}}
\end{split}
\end{equation}
* puissance lumineuse incidente $ {P_{in}} $ 



In [2]:

import numpy as np
# Define parameters for Scharber model
A = 433.11633173034136
B = 2.3353220382662894
Pin = 900.1393292842149

In [3]:
def gaussian(x, A, B):
    return A * np.exp(-x** 2 / B)

In [4]:
# Re-import necessary packages after environment reset
import pandas as pd
import numpy as np

# Example Gaussian function
def gaussian(x, A, B):
    return A * np.exp(-x**2 / B)

# Fill Factor Calculation Function
def calculate_ff(voc, n=2, T=300):
    q = 1.602e-19  # Elementary charge in C
    k = 1.381e-23  # Boltzmann constant in J/K
    vt = k * T / q  # Thermal voltage
    v_oc_norm = voc / (n * vt)
    with np.errstate(divide='ignore', invalid='ignore'):
        ff = (v_oc_norm - np.log(v_oc_norm + 0.72)) / (v_oc_norm + 1)
    ff = np.where(np.isnan(ff), 0.0, ff)  # Handling NaNs
    return np.clip(ff, 0, 1)  # Ensure FF stays in [0,1]

# Updated PCE calculation function
def calculate_voc_pce_jsc_ff(df1, Pin, A, B):
    for i in df1.index:
        # PCBM Case
        voc_1 = max((abs(df1.loc[i, "HOMO(eV)"]) - 4.3) - 0.3, 0.0)
        lumo_offset_1 = df1.loc[i, "LUMO(eV)"] + 4.3

        if lumo_offset_1 < 0.3:
            jsc_1, pce_1, ff_1 = 0.0, 0.0, 0.0
        else:
            jsc_1 = min(gaussian(df1.loc[i, "GAP(eV)"], A, B), 415.22529811760637)
            ff_1 = calculate_ff(voc_1)
            pce_1 = 100 * voc_1 * ff_1 * jsc_1 / Pin

        # PCDTBT Case
        voc_2 = max((5.5 - abs(df1.loc[i, "LUMO(eV)"])) - 0.3, 0.0)
        lumo_offset_2 = -3.6 - df1.loc[i, "LUMO(eV)"]

        if lumo_offset_2 < 0.3:
            jsc_2, pce_2, ff_2 = 0.0, 0.0, 0.0
        else:
            jsc_2 = min(gaussian(df1.loc[i, "GAP(eV)"], A, B), 415.22529811760637)
            ff_2 = calculate_ff(voc_2)
            pce_2 = 100 * voc_2 * ff_2 * jsc_2 / Pin

        # Store values
        df1.loc[i, "voc_pcbm(V)"] = voc_1
        df1.loc[i, "jsc_pcbm(A.m-2)"] = jsc_1
        df1.loc[i, "ff_pcbm"] = ff_1
        df1.loc[i, "pce_pcbm(%)"] = pce_1

        df1.loc[i, "voc_pcdtbt(V)"] = voc_2
        df1.loc[i, "jsc_pcdtbt(A.m-2)"] = jsc_2
        df1.loc[i, "ff_pcdtbt"] = ff_2
        df1.loc[i, "pce_pcdtbt(%)"] = pce_2

    return df1
df2 =calculate_voc_pce_jsc_ff(df, Pin, A, B)

import pandas as pd
from scipy.stats import norm  # Assuming 'norm' is used for the Gaussian function

def calculate_voc_pce_jsc(df1, Pin, A, B):
    """
    This function calculates VOC, PCE, and Jsc for each row in the DataFrame and adds them as new columns.

    Args:
    
        my_df (pandas.DataFrame): The DataFrame containing molecule data with 'HOMO(eV)_calibrated' and 'LUMO_DFT_calibrated' columns.
        Pin (float): The incident light power density.
        A (float): Gaussian function parameter A.
        B (float): Gaussian function parameter B.

    Returns:
        pandas.DataFrame: The modified DataFrame with VOC, PCE, and Jsc columns.
        
    """
    for i in df1.index:
        # Scharber model objective 1: Optimization of donor for phenyl-C61-butyric acid methyl ester (PCBM) acceptors
        
        voc_1 = (abs(df1.loc[i,"HOMO(eV)"]) - abs(-4.3)) - 0.3
        if voc_1 < 0.0:
            voc_1 = 0.0
        LUMO_DFT_offset_1 =  df1.loc[i, "LUMO(eV)"]  + 4.3
        if LUMO_DFT_offset_1 < 0.3:
            pce_1 = 0.0
            jsc_1 = 0.0
        else:
            jsc_1 = gaussian(df1.loc[i, "GAP(eV)"] , A, B)
            if jsc_1 > 415.22529811760637:
                jsc_1 = 415.22529811760637
            pce_1 = 100 * voc_1 * 0.65 * jsc_1 / Pin
        

        # Scharber model objective 2: Optimization of acceptor for poly[N-90-heptadecanyl-2,7-carbazole-alt-5,5-(40,70-di-2-thienyl-20,10,30-benzothiadiazole)] (PCDTBT) donor
        voc_2 = (abs(-5.5) - abs(df1.loc[i, "LUMO(eV)"])) - 0.3
        if voc_2 < 0.0:
            voc_2 = 0.0
        LUMO_DFT_offset_2 = -3.6 - df1.loc[i, "LUMO(eV)"]
        if LUMO_DFT_offset_2 < 0.3:
            pce_2 = 0.0
            jsc_2 = 0.0
        else:
            jsc_2 = gaussian(df1.loc[i, "GAP(eV)"], A, B)
            if jsc_2 > 415.22529811760637:
                jsc_2 = 415.22529811760637
            pce_2 = 100 * voc_2 * 0.65 * jsc_2 / Pin



        # Add separate VOC, PCE, and Jsc for each objective
        df1.loc[i, "voc_pcbm(V)"] = voc_1
        df1.loc[i, "jsc_pcbm(A.m-2)"] = jsc_1
        df1.loc[i, "pce_pcbm(%)"] = pce_1

        df1.loc[i, "voc_pcdtbt(V)"] = voc_2
        df1.loc[i, "jsc_pcdtbt(A.m-2)"] = jsc_2
        df1.loc[i, "pce_pcdtbt(%)"] = pce_2

    return df1

# Assuming you have defined the values for Pin, A, and B

df2 = calculate_voc_pce_jsc(df, Pin, A, B)

In [5]:
df2 

Unnamed: 0,mol_id,SMILES,formula,mass,inchi,HOMO(eV),LUMO(eV),GAP(eV),dipole_moment,mulliken_charges,voc_pcbm(V),jsc_pcbm(A.m-2),ff_pcbm,pce_pcbm(%),voc_pcdtbt(V),jsc_pcdtbt(A.m-2),ff_pcdtbt,pce_pcdtbt(%)
0,000000001,[O-]C(=O)CC(C[N+](C)(C)C)OC(=O)C,C9H17NO4,203.235580,InChI=1S/C9H17NO4/c1-7(11)14-8(5-9(12)13)6-10(...,-4.609609,-0.261229,4.348379,11.419443,"[-0.542286, 0.622923, -0.486172, -0.478169, 0....",0.009609,1.319023e-01,0.240130,3.381045e-05,4.938771,0.0,0.0,0.0
1,000000003,OC1C=CC=C(C1O)C(=O)O,C7H8O4,156.136020,InChI=1S/C7H8O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-...,-6.721212,-2.133373,4.587840,5.099241,"[-0.122757, -0.14337, 0.076825, 0.022155, 0.03...",2.121212,5.276012e-02,0.887381,1.103292e-02,3.066627,0.0,0.0,0.0
2,000000004,CC(CN)O,C3H9NO,75.109660,"InChI=1S/C3H9NO/c1-3(5)2-4/h3,5H,2,4H2,1H3",-6.508963,1.844932,8.353895,2.773043,"[-0.463549, 0.167547, -0.158924, -0.741233, -0...",1.908963,4.553771e-11,0.877920,8.478405e-12,3.355068,0.0,0.0,0.0
3,000000005,NCC(=O)COP(=O)(O)O,C3H8NO5P,169.073081,"InChI=1S/C3H8NO5P/c4-1-3(5)2-9-10(6,7)8/h1-2,4...",-7.028701,-1.371454,5.657247,6.730409,"[-0.258052, 0.470183, -0.474284, -0.102044, -0...",2.428701,4.839625e-04,0.898565,1.173344e-04,3.828546,0.0,0.0,0.0
4,000000006,[O-][N+](=O)c1ccc(c(c1)[N+](=O)[O-])Cl,C6H3ClN2O4,202.552020,InChI=1S/C6H3ClN2O4/c7-5-2-1-4(8(10)11)3-6(5)9...,-8.051849,-3.137473,4.914376,3.716931,"[-0.125953, -0.148386, -0.088975, 0.265526, -0...",3.451849,1.397273e-02,0.923063,4.946007e-03,2.062527,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17453,000024994,COC(CC(OC)OC)C,C7H16O3,148.200140,"InChI=1S/C7H16O3/c1-6(8-2)5-7(9-3)10-4/h6-7H,5...",-6.530732,2.231334,8.762066,1.212562,"[-0.467136, 0.146682, -0.337628, 0.390323, -0....",1.930732,2.286425e-12,0.878971,4.310661e-13,2.968666,0.0,0.0,0.0
17454,000024995,N[C@H](C(=O)O)CC(Cl)Cl,C4H7Cl2NO2,172.009880,"InChI=1S/C4H7Cl2NO2/c5-3(6)1-2(7)4(8)9/h2-3H,1...",-7.360680,-0.220412,7.140267,5.064914,"[-0.268207, -0.100945, 0.200411, 0.627104, -0....",2.760680,1.429997e-07,0.908216,3.983183e-08,4.979588,0.0,0.0,0.0
17455,000024997,CN(CCOC1c2ccccc2CCc2c1cccc2)C,C19H23NO,281.392020,InChI=1S/C19H23NO/c1-20(2)13-14-21-19-17-9-5-3...,-5.942966,0.002721,5.945688,1.450867,"[-0.298012, -0.384351, -0.304429, -0.143074, -...",1.342966,1.154577e-04,0.841122,1.448896e-05,5.197279,0.0,0.0,0.0
17456,000024998,CC(=O)OC(CCl)Cl,C4H6Cl2O2,156.995240,"InChI=1S/C4H6Cl2O2/c1-3(7)8-4(6)2-5/h4H,2H2,1H3",-7.923955,-0.568718,7.355237,1.843535,"[-0.549803, 0.634713, -0.440806, -0.441348, 0....",3.323955,3.765664e-08,0.920721,1.280309e-08,4.631282,0.0,0.0,0.0


In [6]:
df_don =  df2.loc[df2["pce_pcbm(%)"] > 4.0, :]

df_don 


Unnamed: 0,mol_id,SMILES,formula,mass,inchi,HOMO(eV),LUMO(eV),GAP(eV),dipole_moment,mulliken_charges,voc_pcbm(V),jsc_pcbm(A.m-2),ff_pcbm,pce_pcbm(%),voc_pcdtbt(V),jsc_pcdtbt(A.m-2),ff_pcdtbt,pce_pcdtbt(%)
721,977,O=O,O2,31.9988,InChI=1S/O2/c1-2,-5.744323,-3.779661,1.964662,2e-06,"[2e-05, -2e-05]",1.144323,82.944656,0.821455,8.661863,1.420339,0.0,0.0,0.0
1236,1712,O=C1N=c2c(=C1CNc1ccc(cc1)S(=O)(=O)Nc1ccccn1)c1...,C21H15N5O3S2,449.5055,InChI=1S/C21H15N5O3S2/c27-21-15(19-16(25-21)8-...,-5.866775,-3.8232,2.043575,2.558893,"[-0.099377, -0.158062, 0.049844, -0.514708, 0....",1.266775,72.437998,0.834169,8.503739,1.3768,0.0,0.0,0.0
5568,7801,Oc1ccc(cc1)O.O=C1C=CC(=O)C=C1,C12H10O4,218.2054,InChI=1S/C6H6O2.C6H4O2/c2*7-5-1-2-6(8)4-3-5/h1...,-5.151115,-3.684422,1.466694,0.764199,"[-0.167442, -0.18339, 0.44273, -0.486933, -0.1...",0.551115,172.406105,0.705587,7.447933,1.515578,0.0,0.0,0.0
7354,10142,OC1=C(O)C(=C2C(=O)C(=C(C(=O)C2=O)C)O)C(=O)C(=C...,C14H10O8,306.2244,InChI=1S/C14H10O8/c1-3-7(15)11(19)5(12(20)8(3)...,-5.676295,-3.338837,2.337458,5.249848,"[-0.575146, 0.120215, 0.21312, 0.385793, -0.40...",1.076295,41.737938,0.813407,4.059385,1.861163,0.0,0.0,0.0
16329,22986,[O-]C(=O)CC[C@H](C(=O)[O-])NC(=O)c1ccc(cc1)NCc...,C19H17N7Na2O6,485.36112,InChI=1S/C19H19N7O6.2Na/c20-19-25-15-14(17(30)...,-5.031385,-3.344279,1.687106,16.287625,"[-0.111819, -0.158835, 0.254452, -0.182902, -0...",0.431385,128.020583,0.656981,4.030772,1.855721,0.0,0.0,0.0
17081,24180,COc1ccc(cc1O)S(=O)(=O)[O-].COc1ccc(cc1O)S(=O)(...,C14H14CaO10S2,446.46296,InChI=1S/2C7H8O5S.Ca/c2*1-12-7-3-2-5(4-6(7)8)1...,-5.118462,-3.621835,1.496626,22.292557,"[-0.277554, -0.645663, 0.327814, 0.248951, -0....",0.518462,165.980616,0.693906,6.633836,1.578165,0.0,0.0,0.0
17437,24964,[O-]S(=S)(=O)[O-].[Ca+2],CaO3S2,152.2062,"InChI=1S/Ca.H2O3S2/c;1-5(2,3)4/h;(H2,1,2,3,4)/...",-5.382412,-3.265366,2.117046,14.89816,"[-0.699291, 1.140335, -0.699319, -0.329317, -0...",0.782412,63.55052,0.766666,4.234977,1.934634,0.0,0.0,0.0


df_filtered

In [7]:
from pathlib import Path
from rdkit.Chem import RDConfig
from rdkit import Chem
import os, sys
sys.path.append(os.path.join(RDConfig.RDContribDir, 'SA_Score'))
import sascorer

In [8]:
for i in range(len( df2)):
    mol_rdkit = Chem.MolFromSmiles(df2.loc[i,'SMILES'])

    if mol_rdkit is not None:
        # Ajoute les hydrogènes explicites
        mol = Chem.AddHs(mol_rdkit)
        charge = Chem.rdmolops.GetFormalCharge(mol)
        atom_number = mol.GetNumAtoms()
        sas = sascorer.calculateScore(mol)
        df2.at[i,'sas1(%)']=sas
        df2.at[i,'pce_pcbm_sas(%)'] =   df2.at[i,'pce_pcbm(%)']- sas
        df2.at[i,'pce_pcdtbt_sas(%)'] = df2.at[i,'pce_pcdtbt(%)'] - sas

        # Génère la conformation 3D initiale de la moléculeet optimisation avec GFN-XTB

    df2

[15:04:05] Explicit valence for atom # 1 Cl, 7, is greater than permitted


In [9]:
df2

Unnamed: 0,mol_id,SMILES,formula,mass,inchi,HOMO(eV),LUMO(eV),GAP(eV),dipole_moment,mulliken_charges,...,jsc_pcbm(A.m-2),ff_pcbm,pce_pcbm(%),voc_pcdtbt(V),jsc_pcdtbt(A.m-2),ff_pcdtbt,pce_pcdtbt(%),sas1(%),pce_pcbm_sas(%),pce_pcdtbt_sas(%)
0,000000001,[O-]C(=O)CC(C[N+](C)(C)C)OC(=O)C,C9H17NO4,203.235580,InChI=1S/C9H17NO4/c1-7(11)14-8(5-9(12)13)6-10(...,-4.609609,-0.261229,4.348379,11.419443,"[-0.542286, 0.622923, -0.486172, -0.478169, 0....",...,1.319023e-01,0.240130,3.381045e-05,4.938771,0.0,0.0,0.0,6.700900,-6.700866,-6.700900
1,000000003,OC1C=CC=C(C1O)C(=O)O,C7H8O4,156.136020,InChI=1S/C7H8O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-...,-6.721212,-2.133373,4.587840,5.099241,"[-0.122757, -0.14337, 0.076825, 0.022155, 0.03...",...,5.276012e-02,0.887381,1.103292e-02,3.066627,0.0,0.0,0.0,7.926554,-7.915521,-7.926554
2,000000004,CC(CN)O,C3H9NO,75.109660,"InChI=1S/C3H9NO/c1-3(5)2-4/h3,5H,2,4H2,1H3",-6.508963,1.844932,8.353895,2.773043,"[-0.463549, 0.167547, -0.158924, -0.741233, -0...",...,4.553771e-11,0.877920,8.478405e-12,3.355068,0.0,0.0,0.0,7.508943,-7.508943,-7.508943
3,000000005,NCC(=O)COP(=O)(O)O,C3H8NO5P,169.073081,"InChI=1S/C3H8NO5P/c4-1-3(5)2-9-10(6,7)8/h1-2,4...",-7.028701,-1.371454,5.657247,6.730409,"[-0.258052, 0.470183, -0.474284, -0.102044, -0...",...,4.839625e-04,0.898565,1.173344e-04,3.828546,0.0,0.0,0.0,7.009484,-7.009367,-7.009484
4,000000006,[O-][N+](=O)c1ccc(c(c1)[N+](=O)[O-])Cl,C6H3ClN2O4,202.552020,InChI=1S/C6H3ClN2O4/c7-5-2-1-4(8(10)11)3-6(5)9...,-8.051849,-3.137473,4.914376,3.716931,"[-0.125953, -0.148386, -0.088975, 0.265526, -0...",...,1.397273e-02,0.923063,4.946007e-03,2.062527,0.0,0.0,0.0,4.683290,-4.678344,-4.683290
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17453,000024994,COC(CC(OC)OC)C,C7H16O3,148.200140,"InChI=1S/C7H16O3/c1-6(8-2)5-7(9-3)10-4/h6-7H,5...",-6.530732,2.231334,8.762066,1.212562,"[-0.467136, 0.146682, -0.337628, 0.390323, -0....",...,2.286425e-12,0.878971,4.310661e-13,2.968666,0.0,0.0,0.0,7.091972,-7.091972,-7.091972
17454,000024995,N[C@H](C(=O)O)CC(Cl)Cl,C4H7Cl2NO2,172.009880,"InChI=1S/C4H7Cl2NO2/c5-3(6)1-2(7)4(8)9/h2-3H,1...",-7.360680,-0.220412,7.140267,5.064914,"[-0.268207, -0.100945, 0.200411, 0.627104, -0....",...,1.429997e-07,0.908216,3.983183e-08,4.979588,0.0,0.0,0.0,7.285893,-7.285893,-7.285893
17455,000024997,CN(CCOC1c2ccccc2CCc2c1cccc2)C,C19H23NO,281.392020,InChI=1S/C19H23NO/c1-20(2)13-14-21-19-17-9-5-3...,-5.942966,0.002721,5.945688,1.450867,"[-0.298012, -0.384351, -0.304429, -0.143074, -...",...,1.154577e-04,0.841122,1.448896e-05,5.197279,0.0,0.0,0.0,6.450492,-6.450478,-6.450492
17456,000024998,CC(=O)OC(CCl)Cl,C4H6Cl2O2,156.995240,"InChI=1S/C4H6Cl2O2/c1-3(7)8-4(6)2-5/h4H,2H2,1H3",-7.923955,-0.568718,7.355237,1.843535,"[-0.549803, 0.634713, -0.440806, -0.441348, 0....",...,3.765664e-08,0.920721,1.280309e-08,4.631282,0.0,0.0,0.0,6.815336,-6.815336,-6.815336


In [16]:
df_acc = df2.loc[df2["pce_pcbm(%)"] > 0.0, :]
df_acc.shape

(17334, 21)

In [11]:
df2.to_csv('PCE_paper_GDB9.csv', index=False)   # Save with index

In [12]:
df2.to_pickle("PCE_paper_GDB9.pkl")