Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when calculate descriptors from smiles, an error occured. #9

Closed
kexul opened this issue Oct 18, 2019 · 9 comments
Closed

when calculate descriptors from smiles, an error occured. #9

kexul opened this issue Oct 18, 2019 · 9 comments

Comments

@kexul
Copy link

kexul commented Oct 18, 2019

The error log is PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

By the way, thanks for your great job!

@tjkessler
Copy link
Member

Hi @kexul,

Sorry for the late reply!

Could you provide additional information about the command you are executing? A code snippet will help me diagnose the issue further.

@kexul
Copy link
Author

kexul commented Nov 1, 2019

I processed a big csv file which contains about 700000 smiles, you may try it here,
all_smi.zip , the code I used was the same as your example

from padelpy import from_smiles
with open('all_smi.csv', 'rt') as f:
   smi = f.read_line()
   descriptors = from_smiles('smi')
``` .

@tjkessler
Copy link
Member

@kexul,

PaDEL-Descriptor sometimes times out when calculating descriptors for smaller compounds (I'm not entirely sure why, one would think their calculations would be very quick):

descriptors = from_smiles('CCC')

PaDELPy's "from_smiles" function tries three times to calculate the descriptors for a given compound, and if a RuntimeError is encountered all three times, the error you saw is thrown:

PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

By default, if the generation process exceeds 12 seconds, this is seen as a failure. You can try increasing the timeout:

# increase timeout to 30 seconds
descriptors = from_smiles('CCC', timeout=30)

If this doesn't help, I recommend you catch the exception and perform an action to account for it:

try:
    descriptors = from_smiles('CCC')
except RuntimeError:
    # Do something

Let me know if increasing the timeout value helps! If it does, it may justify increasing the default value for the "from_smiles" and "from_mdl" functions.

Best,
Travis

@tjkessler
Copy link
Member

I'm going to go ahead and close this issue due to inactivity. @kexul - keep me updated as to whether any of the methods I outlined above work for you!

@katasanirohith
Copy link

katasanirohith commented Jan 5, 2021

Hey, I have tried increasing the time out to 30 but still facing the same error

raise RuntimeError('PaDEL-Descriptor encountered an error: {}'.format(
RuntimeError: PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

Edit:
I have 200 SMILES in a file, I am reading it and doing the following:

for index, i in enumerate(reader): 
    print(index) 
    descriptors = from_smiles(i[0], timeout=60) 

After 5 molecule it throws the above timeout error.

System specifications:
8 vCPUs
56GB RAM

I think I have found the answer,
The molecule length is huge so its taking more time to process. My suggestion is to increase the time limit to a greater number, as the average time taken for me greater than 30 secs.

@RajaramWalavalkar
Copy link

Hello,
I obtained 1875 desriptors from padelpy package using SMILES as an input. But I am from a non-chemistry background. So Will you please tell me about what exactly those descriptors signify?
For some column headings like nAtoms, nAromatic atoms, nAromatic bonds,etc it's easily understood but for columns names ATS, AATS, ATSC, MATs, GATs,... I am not understanding what does it signify.
Can you help me with this?

@tjkessler
Copy link
Member

@RajaramWalavalkar,

Each descriptor gives a numerical representation of some physical, chemical, or electromechanical aspect of a given compound. For example, "nN" is the number of nitrogen atoms present in the compound, "nC" is the number of carbon atoms present, etc.

Some of the descriptors are somewhat ambiguous - the ATS descriptors are a measurement of autocorrelation between neighboring atoms with respect to a certain weighting, such as mass and charge. More detailed descriptions for each descriptor can be found in a spreadsheet at http://www.yapcwsoft.com/dd/padeldescriptor/ by clicking the "1875" link towards the top of the page.

Best,
Travis

@rishabhiiitd071
Copy link

rishabhiiitd071 commented Jan 17, 2022

Hey, I have tried increasing the time out to 30 but still facing the same error

raise RuntimeError('PaDEL-Descriptor encountered an error: {}'.format(
RuntimeError: PaDEL-Descriptor encountered an error: PaDEL-Descriptor timed out during subprocess call

Edit: I have 200 SMILES in a file, I am reading it and doing the following:

for index, i in enumerate(reader): 
    print(index) 
    descriptors = from_smiles(i[0], timeout=60) 

After 5 molecule it throws the above timeout error.

System specifications: 8 vCPUs 56GB RAM

I think I have found the answer, The molecule length is huge so its taking more time to process. My suggestion is to increase the time limit to a greater number, as the average time taken for me greater than 30 secs.

Please can you specify till how much should one increase the timeout? I am getting same error for 172 smiles, and timeout used was 60.

@Luizerko
Copy link

Luizerko commented Sep 19, 2022

Hello, guys. I had the same problem and I was not able to fully solve it, but here are my two attempts.

First, increasing the number of chunks and decreasing the length of the list of compounds to calculate the descriptors for. I used comp_subset_len = 10 compounds per request. Even though I lost a little bit of optimality doing more requests, I got my script to calculate descriptors for more compounds.

Second, I used the VERY BAD strategy of try/except inside a while True loop. Let i be the chunk number and 60 be the initial timeout value.

while True:
        try:
            descriptors_dict = from_smiles(<list_of_SMILES>[i*comp_subset_len:(i+1)*comp_subset_len], \
                                           timeout=timeout)
            break
            
        except:
            timeout = timeout*2
            print('Doubling timeout')

That strategy did not work, even after got to timeout=240, so I decided to simply skip some SMILES. It was not worth it spending so much time on a few molecules. Maybe the code will work if one is patient enough.

One other suggestion: it would be very nice of the developers if they could share a link to a CSV with some descriptors. I would guess that they have probably tested the library for a bunch of compounds and that they have some files with a lot of descriptors. If this is not the case or sharing such a file will not be possible, forget about it.

Last but not least, thank you very much for the library :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants