Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow when parsing atom count in molecular formula #733

Closed
oatkachenko opened this issue Jun 3, 2021 · 10 comments
Closed

Integer overflow when parsing atom count in molecular formula #733

oatkachenko opened this issue Jun 3, 2021 · 10 comments

Comments

@oatkachenko
Copy link

If molecular formula has huge amount of atoms then parser calculates it improperly.

Molecular formula: H4294967296
AR: 0 atoms
ER: 1 atom

Molecular formula: H4294967298
AR: 2 atoms
ER: 1 atom

@oatkachenko oatkachenko changed the title Integer overflow when parsing atom count in molecular furmula Integer overflow when parsing atom count in molecular formula Jun 3, 2021
@johnmay
Copy link
Member

johnmay commented Jun 3, 2021

That is more atoms than the largest known molecule, I think it's fine.

C169719H270466N45688O52238S911
https://en.wikipedia.org/wiki/Titin

Even if we use a long instead of an integer you are likely to Run out of RAM as the parse actually constructs the atoms.

@oatkachenko
Copy link
Author

I pushed commit before I saw your answer :) Anyway thank you for explanation.

@johnmay
Copy link
Member

johnmay commented Jun 3, 2021

Almost there, but why not just return out of the loop:

while (isDigit(c = next())) {
   res = (10 * res) + (c - '0');
   if (res < 0) return -1;
}

You do the check after the loop but you can overflow back to 0

@oatkachenko
Copy link
Author

In this case, if overflow will happen in the middle of a number - iterator will stay there and we can't get the other part of the string.

@johnmay
Copy link
Member

johnmay commented Jun 3, 2021

If it overflows it should be an error non? "Unexpected input"

@johnmay
Copy link
Member

johnmay commented Jun 3, 2021

So the rest of the string doesn't matter, if someone wants to handle formulas with more than 2 billion hydrogens I think they can write their own parser for that :-)

@oatkachenko
Copy link
Author

If it overflows it should be an error non? "Unexpected input"

I think it should

while (isDigit(c = next())) {
   res = (10 * res) + (c - '0');
   if (res < 0) return -1;
}

btw, it is possible to overflow from positive to positive e.g. 4294967298 will be 429496729 and then 2

So the rest of the string doesn't matter, if someone wants to handle formulas with more than 2 billion hydrogens I think they can write their own parser for that :-)

Absolutely agree:)

@oatkachenko
Copy link
Author

oatkachenko commented Jun 3, 2021

What are the advantages to return 1 if amount of atom is negative value?

MolecularFormulaManipulator#parseIsotope

count = iter.nextUInt();
        if (count < 0)
            count = 1;

Maybe we could return false?

@johnmay
Copy link
Member

johnmay commented Jun 3, 2021

I'd have to remember what the old code did but yes count < 0 should probably just be an error

@johnmay
Copy link
Member

johnmay commented Jan 30, 2022

Fixed via #808

@johnmay johnmay closed this as completed Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants