# <a id='toc1_'></a>[PCE:Couplage de rdkit ,xTB, Pyscf et  AlGORITHME DE SCHARBER( model de Tartarus)  ](#toc0_)

1. **MVOTO KONGO Patrick Sorrel**, sorrel.mvoto@facsciences-uy1.cm
    * Department of Physics, Faculty of Science, University of Yaounde I 
Etudiant de Master au Laboratoire de 
    * Physique Atomique Moleculaire et Biophysique

MAi 2024

### <a id='toc1_'></a>[Utilisation de Pandas  pour extraire les information dans le fichiers hce.csv et creer un DataFrame](#toc0_)
<!-- ![MolecularDimension.png](attachment:MolecularDimension.png) -->
![MolecularDimension.png](./Graphics/Pandas.jpg)

In [1]:
import pandas.util  # Assuming 'util' is an alias for pandas.util
import pandas as pd
# Read the CSV file
df1 = pd.read_csv("hce.csv")

# Filter rows with pce_1 > 10.79
df_acc = df1[df1["pce_1"] > 10.79][["smiles", "pce_1","pce_2","pce_pcbm_sas", "pce_pcdtbt_sas","sas"]]

# Filter rows with pce_2 > 33.8
df_don = df1[df1["pce_2"] > 33.8][["smiles","pce_1", "pce_2","pce_pcbm_sas", "pce_pcdtbt_sas","sas"]]

# Concatenate DataFrames and reset index
my1_df = pd.concat([df_acc, df_don], ignore_index=True)
smiles1=['smiles0','smiles1','smiles2','smiles3','smiles4','smiles5',]
# Create a new column with 'smiles n' format (assuming 'n' starts from 1)
my1_df['smiles_key'] = smiles1  # Ensure smiles are strings

# Reorder columns
my1_df = my1_df[['smiles_key', 'smiles', 'pce_1', 'pce_2',"pce_pcbm_sas", "pce_pcdtbt_sas","sas"]]


my1_df

Unnamed: 0,smiles_key,smiles,pce_1,pce_2,pce_pcbm_sas,pce_pcdtbt_sas,sas
0,smiles0,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,10.802524,15.454414,6.7958,11.447689,4.006724
1,smiles1,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,10.799833,15.222096,6.970612,11.392876,3.82922
2,smiles2,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,10.796661,15.942718,7.075333,12.221391,3.721327
3,smiles3,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,0.0,33.912133,-2.203358,31.708776,2.203358
4,smiles4,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,0.0,33.889634,-4.088903,29.800731,4.088903
5,smiles5,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,0.0,33.961095,-3.268084,30.693011,3.268084


####  <a id='toc1_3_'></a>[Utilisation de `RDKIT`pour la generer des coordonnees 3D xyz](#toc0_)

<img src="./Graphics/Rdkit.png" width="400"></center>

#### <a id='toc1_1_'></a>[Representation Moleculaire et Descripteurs](#toc0_)
1.**Descripteurs 0D**. Concernant les discripteurs « Composition », ne fournissent aucune information sur la structure moléculaire ou la connectivité des atomes.

2.**Descripteurs 1D**. Lié à des ensembles de sous-structures telles que des groupes fonctionnels.

3.**Descripteurs 2D**. Fonctionnalités fournissant des informations sur la topologie moléculaire basées sur la représentation graphique des molécules.

4.**Descripteurs 3D**. Caractéristiques géométriques fournissant des informations sur les coordonnées spatiales des atomes d'une molécule.

5.**Descripteurs 4D**. Également appelés **descripteurs basés sur une grille**, introduisent une quatrième dimension caractérisant les interactions entre les molécules et les sites actifs d'un récepteur ou les multiples états conformationnels des molécules.


<!-- ![MolecularDimension.png](attachment:MolecularDimension.png) -->
![MolecularDimension.png](./Graphics/MolecularDimension.png)

In [2]:
from rdkit import Chem
from rdkit.Chem import Draw
#from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem import AllChem
from pathlib import Path
#IPythonConsole.ipython_3d = True
import py3Dmol

import rdkit
print(rdkit.__version__)

2024.03.2



####  <a id='toc1_3_'></a>[Représentation des molécules 3D avec py3Dmol ](#toc0_)

In [3]:
def view3D(mol, output_file, fmt='mol'):
        """3D vizualisation of mol/xyz file with Py3DMol
        """

        view = py3Dmol.view(width=400, height=400)

        if fmt=='mol':
            view.addModel(Chem.MolToMolBlock(mol), fmt)
            # view.addModel(mol, fmt)
        elif fmt=='xyz':
            view.addModel(mol, fmt)

        view.setStyle({'stick': {}, 'sphere': {'scale': .30}})
        view.zoomTo()
        view.show()
        # view.png(output_file)

####  <a id='toc1_3_'></a>[Création d’un DataFrame pour stocker les descriptors des smiles ](#toc0_)

In [4]:
import os

from rdkit.Chem import Descriptors


properties = ['smiles_key','mol_rdkit' ,'SMILE', 'Poids_Moléculaire', 'LogP', 'TPSA', 'QED']
data = []

for i in range(len(my1_df)):
    mol_rdkit = Chem.MolFromSmiles(my1_df.loc[i, 'smiles'])
    smi_key=my1_df.loc[i, 'smiles_key']
    smi=my1_df.loc[i, 'smiles']
    if mol_rdkit is not None:
        # Ajoute les hydrogènes explicites
        mol_rdkit = Chem.AddHs(mol_rdkit)

        # Génère la conformation 3D initiale de la molécule
        AllChem.EmbedMolecule(mol_rdkit)

        # Optimise la conformation 3D de la molécule en utilisant MMFF
        AllChem.MMFFOptimizeMolecule(mol_rdkit, maxIters=200, mmffVariant="MMFF94s")

        # Canonicalise l'orientation de la conformation
        Chem.rdMolTransforms.CanonicalizeMol(mol_rdkit, normalizeCovar=True, ignoreHs=False)

        # Calcul des descripteurs
        molwt = Descriptors.MolWt(mol_rdkit)
        logp = Descriptors.MolLogP(mol_rdkit)
        tpsa = Descriptors.TPSA(mol_rdkit)
        qed = Descriptors.qed(mol_rdkit)

        # Ajout des propriétés à la liste de données
        data.append([smi_key,mol_rdkit, smi,molwt, logp, tpsa, qed,])

        # Création du sous-répertoire "my_data_{smi_key}"
        working_dir = Path(f'./my_data260524/{smi_key}')
        working_dir.mkdir(parents=True, exist_ok=True)

        # Génération de la représentation 3D de la molécule après l'optimisation
        path_3d_rdkit = working_dir / f'{smi_key}_3d_rdkit.png'
        view3D(mol_rdkit, path_3d_rdkit)

        # Écriture des résultats dans un fichier
        with open(working_dir / 'results.txt', 'w') as f:
            f.write(f'smiles_key: {smi_key}\n')
            f.write(f'SMILE: {smi}\n')
            f.write(f'Poids moléculaire: {molwt}\n')
            f.write(f'LogP: {logp}\n')
            f.write(f'TPSA: {tpsa}\n')
            f.write(f'QED: {qed}\n')

        # Génération du fichier mol_rdkit.xyz et optimisation avec GFN-XTB

df = pd.DataFrame(data, columns=properties)

[12:39:55] UFFTYPER: Unrecognized atom type: Se2+2 (19)


[12:39:55] UFFTYPER: Unrecognized atom type: Se2+2 (9)


[12:39:55] UFFTYPER: Unrecognized atom type: Se2+2 (19)


[12:39:55] UFFTYPER: Unrecognized atom type: Se2+2 (27)


In [5]:
df

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125


####  <a id='toc1_3_'></a>[Utilisation de crest et xTB pour la recherches des conformers](#toc0_)
 <center> <img src = "./Graphics/crest.png" width = "600">
 <img src = "./Graphics/xtb.jpeg" width = "600"> </center> 
 


####  <a id='toc1_3_'></a>[Workflow de Crest et xTB dans la recherches des conformers ](#toc0_)
 <center> <img src = "./Graphics/crestfeatures2.png" width = "600">
 <img src = "./Graphics/workflow.jpeg" width = "600"> </center> 

In [6]:
import xtb

In [7]:
!xtb --version

      -----------------------------------------------------------      
     |                           x T B                           |     
     |                         S. Grimme                         |     
     |          Mulliken Center for Theoretical Chemistry        |     
     |                    University of Bonn                     |     
      -----------------------------------------------------------      

   * xtb version 6.3.3 (71d3805) compiled by 'conda@b85dec0bf610' on 2021-01-07

normal termination of xtb


In [8]:
!crest --version


       |                                            |
       |                 C R E S T                  |
       |                                            |
       |  Conformer-Rotamer Ensemble Sampling Tool  |
       |          based on the GFN methods          |
       |             P.Pracht, S.Grimme             |
       |          Universitaet Bonn, MCTC           |
       Version 2.12,   Thu 19. Mai 16:32:32 CEST 2022
  Using the xTB program. Compatible with xTB version 6.4.0

   Cite work conducted with this code as

   • P.Pracht, F.Bohle, S.Grimme, PCCP, 2020, 22, 7169-7192.
   • S.Grimme, JCTC, 2019, 15, 2847-2862.

   and for works involving QCG as

   • S.Spicher, C.Plett, P.Pracht, A.Hansen, S.Grimme,
     JCTC, 2022, 18 (5), 3174-3189.

   with help from:
   C.Bannwarth, F.Bohle, S.Ehlert, S.Grimme,
   C.Plett, P.Pracht, S.Spicher

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   M

In [9]:
def find_homo_lumo(mf_pyscf, au2ev):
    """Function that returns the HOMO and LUMO index and the gap energy in eV

    Args:
        mf_pyscf (pyscf object): pyscf meam-field object of the molecule to be evaluated.
    """
    # Index of HOMO and LUMO
    lumo_idx = mf_pyscf.mo_occ.tolist().index(0.)
    homo_idx = lumo_idx - 1

    # Calculate the gap Homo-LUMO
    E_HOMO = mf_pyscf.mo_energy[homo_idx]*au2ev
    E_LUMO = mf_pyscf.mo_energy[lumo_idx]*au2ev
    E_g = abs(E_HOMO - E_LUMO)

    return E_HOMO, E_LUMO, E_g

In [10]:
import pandas as pd
import rdkit
from rdkit import Chem
import subprocess as sp
from pyscf import dft,scf,tddft
from pyscf import gto
import re
from pyscf.data import nist
#conversion en eV
au2ev = nist.HARTREE2EV
import dftd4.pyscf as disp
def xtb_calculations(df):
    yesso= ["HOMO-LUMO GAP (eV)", "HOMO Energy (eV)", "LUMO Energy (eV)"]
    dtb_mf = []
    dtb_dft=[]
    dtb_md4=[]
    for i in range(len(df)):
        # Extract the SMILES string for the current row
        smile1 = df.loc[i, "mol_rdkit"]

       # Setting various paths for xtb files
        path_xtb_xyz = working_dir / f'{df.loc[i, "smiles_key"]}_opt.xyz'
        path_xtb_log = working_dir / f'{df.loc[i, "smiles_key"]}_opt.log'
        path_xtb_csv = working_dir / f'{df.loc[i, "smiles_key"]}_xtb.csv'
        path_xtb_png = working_dir / f'{df.loc[i, "smiles_key"]}_opt.png'

        # Check if path exists, if not create it
        if not path_xtb_xyz.exists():
            print(f'path_xtb_xyz does not exist and will be created for {smile1}')

        # Convert RDKit molecule to XYZ format
        mol_rdkit_xyz = Chem.MolToXYZBlock(smile1)
        with open("mol_rdkit.xyz", "w") as f:
            f.write(mol_rdkit_xyz)

        # Geometry pre-optimization using GFn-XTB with different parameters
        sp.run(["xtb", "mol_rdkit.xyz", "--gfn2", "--opt", "vtight", "--parallel 1"],
               stdout=sp.PIPE, stderr=sp.PIPE, text=True, universal_newlines=True)

        # Best conformation search
        sp.run(['crest', 'xtbopt.xyz', '-gff', '-mquick', '--noreftopo'], stdout=sp.DEVNULL, stderr=sp.DEVNULL)

        # xtb geometry optimization of the crest best conformation
        xtb_process = sp.run(["xtb", 'crest_best.xyz', "--gfn2", "--opt", "vtight", "--parallel 1"],
                              stdout=sp.PIPE, stderr=sp.PIPE, text=True, universal_newlines=True)

        # Store the xtbopt.xyz file in path_xtb_xyz
        sp.run(['cp', 'xtbopt.xyz', path_xtb_xyz], stdout=sp.PIPE, stderr=sp.DEVNULL)
        mol_xtb_xyz = path_xtb_xyz.read_text()
        lines =mol_xtb_xyz.strip().split('\n')
        mol_xtb_xyz = '\n'.join(lines[:1] + [''] + lines[2:])

        view3D(mol_xtb_xyz, f'{smi_key} xTB Molecule', fmt='xyz')
        mol_xyz = '\n'.join(mol_xtb_xyz.strip().split('\n')[2:])

        mol_pyscf = gto.Mole(
        atom=mol_xyz,
        charge=0,
        spin = 0,
        basis = "def2-SVP",
        symmetry = True,
        unit = 'Angstrom')
        mol_pyscf.build()
        myhf = scf.RHF( mol_pyscf).run()
        mdf=dft.RKS( mol_pyscf, xc="B3LYP").run()
        md4 = disp.energy(mdf).run()
        Eks_homo, Eks_lumo, Eks_g = find_homo_lumo(myhf,au2ev)
        Eks1_homo, Eks1_lumo, Eks1_g = find_homo_lumo(mdf,au2ev)
        Ekt_homo, Ekt_lumo, Ekt_g = find_homo_lumo( md4,au2ev)

        with open(path_xtb_log, "w") as fl:
            fl.write(xtb_process.stdout)
        

        # Create the xtb results dataframe
        dtb_mf.append([Eks_g,Eks_homo, Eks_lumo])
        dtb_dft.append([Eks1_g,Eks1_homo, Eks1_lumo])
        dtb_md4.append([Ekt_g,Ekt_homo, Ekt_lumo])

    df_xtb_dft = pd.DataFrame(dtb_dft, columns=yesso)
    df_xtb_mf  =pd.DataFrame(dtb_mf, columns=yesso)
    
    df_xtb_md4 = pd.DataFrame(dtb_md4, columns=yesso)
    return df_xtb_mf,df_xtb_dft,df_xtb_md4

In [11]:
df_xtb_mf,df_xtb_dft,df_xtb_md4=xtb_calculations(df)

converged SCF energy = -4892.44492658079




converged SCF energy = -4902.64275100115
converged SCF energy = -4902.75892843126


converged SCF energy = -4249.26768636978
converged SCF energy = -4258.98943474622
converged SCF energy = -4259.10072874179


converged SCF energy = -4571.89263411448
converged SCF energy = -4581.92443214219
converged SCF energy = -4582.04361272475


converged SCF energy = -872.628645787846
converged SCF energy = -878.251519515379
converged SCF energy = -878.339042896933


converged SCF energy = -3956.49705593678
converged SCF energy = -3966.37404322738
converged SCF energy = -3966.50979834868


converged SCF energy = -2314.86389980634
converged SCF energy = -2324.21939729525
converged SCF energy = -2324.36187830757


In [12]:
df_xtb_mf

Unnamed: 0,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV)
0,6.475203,-6.62781,-0.152607
1,6.592576,-6.886531,-0.293955
2,6.417847,-6.783371,-0.365524
3,8.227819,-7.330745,0.897073
4,7.244829,-6.021671,1.223158
5,7.98845,-6.568378,1.420072


In [13]:
df_xtb_dft

Unnamed: 0,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV)
0,2.082559,-5.165246,-3.082687
1,2.144677,-5.311571,-3.166894
2,2.074562,-5.291223,-3.216661
3,3.388855,-5.796285,-2.40743
4,2.695154,-4.564165,-1.86901
5,3.25324,-4.986631,-1.733392


In [14]:
df_xtb_md4

Unnamed: 0,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV)
0,2.082559,-5.165246,-3.082687
1,2.144677,-5.311571,-3.166894
2,2.074562,-5.291223,-3.216661
3,3.388855,-5.796285,-2.40743
4,2.695154,-4.564165,-1.86901
5,3.25324,-4.986631,-1.733392


In [15]:
my_mf=pd.concat([df, df_xtb_mf], axis=1)

In [16]:
my_dft=pd.concat([df, df_xtb_dft], axis=1)

In [17]:
my_mda=pd.concat([df, df_xtb_md4], axis=1)

In [18]:
import pandas as pd

def calibrate_data(my_df):
    # Iterate through each row in the DataFrame
    for i in range(len(my_df)):
        # Extract the current row's data
        homo_lumo_val = my_df.loc[i, "HOMO-LUMO GAP (eV)"]
        HL_range_rest = homo_lumo_val  # Initial value of HL_range_rest

        # Determine the calibrated value of HL_range_rest based on its range
        if 0.8856 <= HL_range_rest <= 3.2627:
            HL_range_rest = 1.0
        elif HL_range_rest < 0.8856:
            HL_range_rest = 0.1144 + homo_lumo_val
        else:
            HL_range_rest = 4.2627 - HL_range_rest
        # Calculate calibrated homo and lumo levels
        homo_cal = my_df.loc[i, "HOMO Energy (eV)"] * 0.90417  -0.56441
        lumo_cal= my_df.loc[i, "LUMO Energy (eV)"] * 0.89573  -1.08427
        HL_range_rest1=abs(homo_cal-lumo_cal)
        # Update the DataFrame with calibrated values
        my_df.loc[i, "Gap_calibrated"] = HL_range_rest1
        my_df.loc[i, "homo_calibrated"] = homo_cal
        my_df.loc[i, "lumo_calibrated"] =abs(lumo_cal)
    return my_df
# Assuming 'df' is your DataFrame containing the data


In [19]:
my_df1=calibrate_data(my_mf)
my_df1

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,6.475203,-6.62781,-0.152607,5.336112,-6.557077,1.220965
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,6.592576,-6.886531,-0.293955,5.44343,-6.791005,1.347574
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,6.417847,-6.783371,-0.365524,5.286049,-6.69773,1.411681
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,8.227819,-7.330745,0.897073,6.911915,-7.19265,0.280735
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,7.244829,-6.021671,1.223158,6.020374,-6.009025,0.011349
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,7.98845,-6.568378,1.420072,6.691072,-6.503341,0.187731


In [20]:
my_dft1=calibrate_data(my_dft)
my_dft1

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,1.389145,-5.23467,3.845525
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,1.446022,-5.366973,3.920952
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,1.383046,-5.348575,3.965529
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,2.56456,-5.805237,3.240677
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,1.932792,-4.691191,2.758399
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,2.436251,-5.073172,2.636921


In [21]:
my_mda1=calibrate_data(my_mda)
my_mda1

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,1.389145,-5.23467,3.845525
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,1.446022,-5.366973,3.920952
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,1.383046,-5.348575,3.965529
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,2.56456,-5.805237,3.240677
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,1.932792,-4.691191,2.758399
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,2.436251,-5.073172,2.636921


In [22]:
def gaussian(x, A, B):
    return A * np.exp(-x** 2 / B)

In [23]:
import numpy as np
# Define parameters for Scharber model
A = 433.11633173034136
B = 2.3353220382662894
Pin = 900.139329284

In [24]:
import pandas as pd
from scipy.stats import norm  # Assuming 'norm' is used for the Gaussian function

def calculate_voc_pce_jsc(my_df, Pin, A, B):
    """
    This function calculates VOC, PCE, and Jsc for each row in the DataFrame and adds them as new columns.

    Args:
        my_df (pandas.DataFrame): The DataFrame containing molecule data with 'homo_calibrated' and 'lumo_calibrated' columns.
        Pin (float): The incident light power density.
        A (float): Gaussian function parameter A.
        B (float): Gaussian function parameter B.

    Returns:
        pandas.DataFrame: The modified DataFrame with VOC, PCE, and Jsc columns.
    """
    for i in my_df.index:
        # Scharber model objective 1: Optimization of donor for phenyl-C61-butyric acid methyl ester (PCBM) acceptors
        
        voc_1 = (abs(my_df.loc[i,"homo_calibrated"]) - abs(-4.3)) - 0.3
        if voc_1 < 0.0:
            voc_1 = 0.0
        lumo_offset_1 = my_df.loc[i, "lumo_calibrated"]  + 4.3
        if lumo_offset_1 < 0.3:
            pce_1 = 0.0
        else:
            jsc_1 = gaussian(my_df.loc[i, "Gap_calibrated"] , A, B)
        if jsc_1 > 415.22529811760637:
            jsc_1 = 415.22529811760637
        pce_1 = 100 * voc_1 * 0.65 * jsc_1 / Pin

        # Scharber model objective 2: Optimization of acceptor for poly[N-90-heptadecanyl-2,7-carbazole-alt-5,5-(40,70-di-2-thienyl-20,10,30-benzothiadiazole)] (PCDTBT) donor
        voc_2 = (abs(-5.5) - abs(my_df.loc[i, "lumo_calibrated"])) - 0.3
        if voc_2 < 0.0:
            voc_2 = 0.0
        lumo_offset_2 = 3.6 + my_df.loc[i, "lumo_calibrated"]
        if lumo_offset_2 < 0.3:
            pce_2 = 0.0
        else:
            jsc_2 = gaussian(my_df.loc[i, "Gap_calibrated"], A, B)
        if jsc_2 > 415.22529811760637:
            jsc_2 = 415.22529811760637
        pce_2 = 100 * voc_2 * 0.65 * jsc_2 / Pin



        # Add separate VOC, PCE, and Jsc for each objective
        my_df.loc[i, "voc_pcbm"] = voc_1
        my_df.loc[i, "jsc_pcbm"] = jsc_1
        my_df.loc[i, "pce_pcbm"] = pce_1

        my_df.loc[i, "voc_pcdtbt"] = voc_2
        my_df.loc[i, "jsc_pcdtbt"] = jsc_2
        my_df.loc[i, "pce_pcdtbt"] = pce_2

    return my_df

In [25]:
my_df2 = calculate_voc_pce_jsc(my_df1, Pin, A, B)
my_df2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,6.475203,-6.62781,-0.152607,5.336112,-6.557077,1.220965,1.957077,0.002194537,0.0003101376,3.979035,0.002194537,0.0006305569
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,6.592576,-6.886531,-0.293955,5.44343,-6.791005,1.347574,2.191005,0.001337235,0.0002115703,3.852426,0.001337235,0.0003720023
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,6.417847,-6.783371,-0.365524,5.286049,-6.69773,1.411681,2.09773,0.002755724,0.0004174351,3.788319,0.002755724,0.0007538517
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,8.227819,-7.330745,0.897073,6.911915,-7.19265,0.280735,2.59265,5.650367e-07,1.05785e-07,4.919265,5.650367e-07,2.007153e-07
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,7.244829,-6.021671,1.223158,6.020374,-6.009025,0.011349,1.409025,7.874508e-05,8.012086e-06,5.188651,7.874508e-05,2.950404e-05
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,7.98845,-6.568378,1.420072,6.691072,-6.503341,0.187731,1.903341,2.045258e-06,2.811047e-07,5.012269,2.045258e-06,7.402629e-07


In [26]:
my_dft2 = calculate_voc_pce_jsc(my_dft1, Pin, A, B)
my_dft2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,1.389145,-5.23467,3.845525,0.63467,189.556227,8.687403,1.354475,189.556227,18.540124
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,1.446022,-5.366973,3.920952,0.766973,176.909063,9.797921,1.279048,176.909063,16.339571
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,1.383046,-5.348575,3.965529,0.748575,190.933725,10.320999,1.234471,190.933725,17.020291
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,2.56456,-5.805237,3.240677,1.205237,25.912019,2.255159,1.959323,25.912019,3.666156
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,1.932792,-4.691191,2.758399,0.091191,87.475771,0.576026,2.441601,87.475771,15.422904
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,2.436251,-5.073172,2.636921,0.473172,34.105662,1.165332,2.563079,34.105662,6.312365


In [27]:
my_mda2 = calculate_voc_pce_jsc(my_mda1, Pin, A, B)
my_mda2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),Gap_calibrated,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,1.389145,-5.23467,3.845525,0.63467,189.556227,8.687403,1.354475,189.556227,18.540124
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,1.446022,-5.366973,3.920952,0.766973,176.909063,9.797921,1.279048,176.909063,16.339571
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,1.383046,-5.348575,3.965529,0.748575,190.933725,10.320999,1.234471,190.933725,17.020291
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,2.56456,-5.805237,3.240677,1.205237,25.912019,2.255159,1.959323,25.912019,3.666156
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,1.932792,-4.691191,2.758399,0.091191,87.475771,0.576026,2.441601,87.475771,15.422904
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,2.436251,-5.073172,2.636921,0.473172,34.105662,1.165332,2.563079,34.105662,6.312365


In [28]:
 for i in my_df2.index:
     my_df2.at[i, 'dif_pce1'] = abs(my_df2.at[i, 'pce_pcbm'] - my1_df.at[i, 'pce_1'])
     my_df2.at[i, 'dif_pce2'] = abs(my1_df.at[i, 'pce_2'] - my_df2.at[i, 'pce_pcdtbt'])
my_df2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,6.475203,-6.62781,-0.152607,...,-6.557077,1.220965,1.957077,0.002194537,0.0003101376,3.979035,0.002194537,0.0006305569,10.80221,15.453783
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,6.592576,-6.886531,-0.293955,...,-6.791005,1.347574,2.191005,0.001337235,0.0002115703,3.852426,0.001337235,0.0003720023,10.79962,15.221724
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,6.417847,-6.783371,-0.365524,...,-6.69773,1.411681,2.09773,0.002755724,0.0004174351,3.788319,0.002755724,0.0007538517,10.79624,15.941964
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,8.227819,-7.330745,0.897073,...,-7.19265,0.280735,2.59265,5.650367e-07,1.05785e-07,4.919265,5.650367e-07,2.007153e-07,1.05785e-07,33.912133
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,7.244829,-6.021671,1.223158,...,-6.009025,0.011349,1.409025,7.874508e-05,8.012086e-06,5.188651,7.874508e-05,2.950404e-05,8.012086e-06,33.889604
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,7.98845,-6.568378,1.420072,...,-6.503341,0.187731,1.903341,2.045258e-06,2.811047e-07,5.012269,2.045258e-06,7.402629e-07,2.811047e-07,33.961094


In [29]:
 for i in my_dft2.index:
     my_dft2.at[i, 'dif_pce1'] = abs(my_dft2.at[i, 'pce_pcbm'] - my1_df.at[i, 'pce_1'])
     my_dft2.at[i, 'dif_pce2'] = abs(my1_df.at[i, 'pce_2'] - my_dft2.at[i, 'pce_pcdtbt'])

In [30]:
 for i in my_mda2.index:
     my_mda2.at[i, 'dif_pce1'] = abs(my_mda2.at[i, 'pce_pcbm'] - my1_df.at[i, 'pce_1'])
     my_mda2.at[i, 'dif_pce2'] = abs(my1_df.at[i, 'pce_2'] - my_mda2.at[i, 'pce_pcdtbt'])

In [31]:
my_dft2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,...,-5.23467,3.845525,0.63467,189.556227,8.687403,1.354475,189.556227,18.540124,2.115121,3.08571
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,...,-5.366973,3.920952,0.766973,176.909063,9.797921,1.279048,176.909063,16.339571,1.001912,1.117475
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,...,-5.348575,3.965529,0.748575,190.933725,10.320999,1.234471,190.933725,17.020291,0.475662,1.077573
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,...,-5.805237,3.240677,1.205237,25.912019,2.255159,1.959323,25.912019,3.666156,2.255159,30.245977
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,...,-4.691191,2.758399,0.091191,87.475771,0.576026,2.441601,87.475771,15.422904,0.576026,18.46673
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,...,-5.073172,2.636921,0.473172,34.105662,1.165332,2.563079,34.105662,6.312365,1.165332,27.64873


In [32]:
 my_mda2

Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,homo_calibrated,lumo_calibrated,voc_pcbm,jsc_pcbm,pce_pcbm,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,...,-5.23467,3.845525,0.63467,189.556227,8.687403,1.354475,189.556227,18.540124,2.115121,3.08571
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,...,-5.366973,3.920952,0.766973,176.909063,9.797921,1.279048,176.909063,16.339571,1.001912,1.117475
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,...,-5.348575,3.965529,0.748575,190.933725,10.320999,1.234471,190.933725,17.020291,0.475662,1.077573
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,...,-5.805237,3.240677,1.205237,25.912019,2.255159,1.959323,25.912019,3.666156,2.255159,30.245977
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,...,-4.691191,2.758399,0.091191,87.475771,0.576026,2.441601,87.475771,15.422904,0.576026,18.46673
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,...,-5.073172,2.636921,0.473172,34.105662,1.165332,2.563079,34.105662,6.312365,1.165332,27.64873


In [33]:
from pathlib import Path
from rdkit.Chem import RDConfig
import os, sys
sys.path.append(os.path.join(RDConfig.RDContribDir, 'SA_Score'))
import sascorer

In [34]:
for i in range(len(my_df2)):
    mol_rdkit = Chem.MolFromSmiles(my_df2.loc[i, 'SMILE'])

    if mol_rdkit is not None:
        # Ajoute les hydrogènes explicites
        mol = Chem.AddHs(mol_rdkit)
        charge = Chem.rdmolops.GetFormalCharge(mol)
        atom_number = mol.GetNumAtoms()
        sas = sascorer.calculateScore(mol)
        my_df2.at[i, 'sas1']=sas
        my_df2.at[i, 'pce_pcbm_sas'] = my_df2.at[i, 'pce_pcbm']- sas
        my_df2.at[i, 'pce_pcdtbt_sas'] = my_df2.at[i, 'pce_pcdtbt'] - sas
        my_df2.at[i, 'dif_pce1_sas'] = abs(my_df2.at[i, 'pce_pcbm_sas'] - my1_df.at[i, 'pce_1'])
        my_df2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'pce_2'] - my_df2.at[i, 'pce_pcdtbt_sas'])
        my_df2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'sas'] - my_df2.at[i, 'sas1'])

        # Génère la conformation 3D initiale de la moléculeet optimisation avec GFN-XTB

my_df2



Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2,sas1,pce_pcbm_sas,pce_pcdtbt_sas,dif_pce1_sas,dif_pce2_sas
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,6.475203,-6.62781,-0.152607,...,3.979035,0.002194537,0.0006305569,10.80221,15.453783,6.387723,-6.387413,-6.387092,17.189937,2.380999
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,6.592576,-6.886531,-0.293955,...,3.852426,0.001337235,0.0003720023,10.79962,15.221724,6.700902,-6.70069,-6.70053,17.500523,2.871681
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,6.417847,-6.783371,-0.365524,...,3.788319,0.002755724,0.0007538517,10.79624,15.941964,6.683025,-6.682608,-6.682272,17.479269,2.961698
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,8.227819,-7.330745,0.897073,...,4.919265,5.650367e-07,2.007153e-07,1.05785e-07,33.912133,7.120532,-7.120532,-7.120532,7.120532,4.917174
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,7.244829,-6.021671,1.223158,...,5.188651,7.874508e-05,2.950404e-05,8.012086e-06,33.889604,7.394805,-7.394797,-7.394776,7.394797,3.305903
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,7.98845,-6.568378,1.420072,...,5.012269,2.045258e-06,7.402629e-07,2.811047e-07,33.961094,7.297881,-7.297881,-7.29788,7.297881,4.029797


In [35]:
for i in range(len(my_dft2)):
    mol_rdkit = Chem.MolFromSmiles(my_dft2.loc[i, 'SMILE'])

    if mol_rdkit is not None:
        # Ajoute les hydrogènes explicites
        mol = Chem.AddHs(mol_rdkit)
        charge = Chem.rdmolops.GetFormalCharge(mol)
        atom_number = mol.GetNumAtoms()
        sas1 = sascorer.calculateScore(mol)
        my_dft2.at[i, 'sas1']=sas1
        my_dft2.at[i, 'pce_pcbm_sas'] = my_dft2.at[i, 'pce_pcbm']- sas1
        my_dft2.at[i, 'pce_pcdtbt_sas'] = my_dft2.at[i, 'pce_pcdtbt'] - sas1
        my_dft2.at[i, 'dif_pce1_sas'] = abs(my_dft2.at[i, 'pce_pcbm_sas'] - my1_df.at[i, 'pce_1'])
        my_dft2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'pce_2'] - my_dft2.at[i, 'pce_pcdtbt_sas'])
        my_dft2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'sas'] - my_dft2.at[i, 'sas1'])

        # Génère la conformation 3D initiale de la moléculeet optimisation avec GFN-XTB

my_dft2



Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2,sas1,pce_pcbm_sas,pce_pcdtbt_sas,dif_pce1_sas,dif_pce2_sas
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,...,1.354475,189.556227,18.540124,2.115121,3.08571,6.387723,2.29968,12.152401,8.502844,2.380999
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,...,1.279048,176.909063,16.339571,1.001912,1.117475,6.700902,3.097019,9.63867,7.702813,2.871681
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,...,1.234471,190.933725,17.020291,0.475662,1.077573,6.683025,3.637973,10.337266,7.158687,2.961698
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,...,1.959323,25.912019,3.666156,2.255159,30.245977,7.120532,-4.865373,-3.454376,4.865373,4.917174
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,...,2.441601,87.475771,15.422904,0.576026,18.46673,7.394805,-6.818779,8.028098,6.818779,3.305903
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,...,2.563079,34.105662,6.312365,1.165332,27.64873,7.297881,-6.132549,-0.985516,6.132549,4.029797


In [36]:
for i in range(len( my_mda2)):
    mol_rdkit = Chem.MolFromSmiles( my_mda2.loc[i, 'SMILE'])

    if mol_rdkit is not None:
        # Ajoute les hydrogènes explicites
        mol = Chem.AddHs(mol_rdkit)
        charge = Chem.rdmolops.GetFormalCharge(mol)
        atom_number = mol.GetNumAtoms()
        sas1 = sascorer.calculateScore(mol)
        my_mda2.at[i, 'sas1']=sas1
        my_mda2.at[i, 'pce_pcbm_sas'] =  my_mda2.at[i, 'pce_pcbm']- sas1
        my_mda2.at[i, 'pce_pcdtbt_sas'] =  my_mda2.at[i, 'pce_pcdtbt'] - sas1
        my_mda2.at[i, 'dif_pce1_sas'] = abs(my_mda2.at[i, 'pce_pcbm_sas'] - my1_df.at[i, 'pce_1'])
        my_mda2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'pce_2'] -  my_mda2.at[i, 'pce_pcdtbt_sas'])
        my_mda2.at[i, 'dif_pce2_sas'] = abs(my1_df.at[i, 'sas'] -  my_mda2.at[i, 'sas1'])

        # Génère la conformation 3D initiale de la moléculeet optimisation avec GFN-XTB

my_mda2



Unnamed: 0,smiles_key,mol_rdkit,SMILE,Poids_Moléculaire,LogP,TPSA,QED,HOMO-LUMO GAP (eV),HOMO Energy (eV),LUMO Energy (eV),...,voc_pcdtbt,jsc_pcdtbt,pce_pcdtbt,dif_pce1,dif_pce2,sas1,pce_pcbm_sas,pce_pcdtbt_sas,dif_pce1_sas,dif_pce2_sas
0,smiles0,<rdkit.Chem.rdchem.Mol object at 0x714468fd5900>,c1ncc(s1)-c1sc(-c2cnc(s2)-c2scc3cc[se]c23)c2nc...,487.518,4.9806,51.56,0.29283,2.082559,-5.165246,-3.082687,...,1.354475,189.556227,18.540124,2.115121,3.08571,6.387723,2.29968,12.152401,8.502844,2.380999
1,smiles1,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ba0>,c1ncc(s1)-c1ccc([se]1)-c1cnc(-c2cccc3cocc23)c2...,465.421,5.0558,64.7,0.33001,2.144677,-5.311571,-3.166894,...,1.279048,176.909063,16.339571,1.001912,1.117475,6.700902,3.097019,9.63867,7.702813,2.871681
2,smiles2,<rdkit.Chem.rdchem.Mol object at 0x714468fd5a50>,c1ncc(s1)-c1ncc(-c2sc(-c3ccccc3)c3[se]ccc23)c2...,481.489,4.9191,51.56,0.299271,2.074562,-5.291223,-3.216661,...,1.234471,190.933725,17.020291,0.475662,1.077573,6.683025,3.637973,10.337266,7.158687,2.961698
3,smiles3,<rdkit.Chem.rdchem.Mol object at 0x714468fd5ac0>,c1c-c2cc3cnc4ccc5ccccc5c4c3cc2-nc1,280.33,5.0894,25.78,0.291519,3.388855,-5.796285,-2.40743,...,1.959323,25.912019,3.666156,2.255159,30.245977,7.120532,-4.865373,-3.454376,4.865373,4.917174
4,smiles4,<rdkit.Chem.rdchem.Mol object at 0x714468fd5c10>,c1sc(-c2cc3c(o2)c2c4c[nH]cc4c4ccccc4c2c2=CCC=c...,480.45,6.8075,28.93,0.22254,2.695154,-4.564165,-1.86901,...,2.441601,87.475771,15.422904,0.576026,18.46673,7.394805,-6.818779,8.028098,6.818779,3.305903
5,smiles5,<rdkit.Chem.rdchem.Mol object at 0x714468fd5dd0>,c1sc(-c2cc3cc4sc5ccc6c[nH]cc6c5c4cc3c3ccccc23)...,461.636,9.7854,15.79,0.234125,3.25324,-4.986631,-1.733392,...,2.563079,34.105662,6.312365,1.165332,27.64873,7.297881,-6.132549,-0.985516,6.132549,4.029797


In [37]:
# Assuming you have a DataFrame named 'df'
my_df2.to_csv('my_hf.csv', index=False)  # Save without index
my_dft2.to_csv('my_dft.csv', index=False)   # Save with index
my_mda2.to_csv('my_mda.csv', index=False)   # Save with index