-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging formulas to normalize them despite sources. Added Notes column #68
Conversation
@JamesJeffryes The output of the travis checks is massive, is there a place where we can find a summary of the issues blocking the merge? |
Generally you have to look at the bottom of the raw log. The build is failing because of this error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are confident that the formulas are correct, I am OK merging this PR despite of the failing build. Changing the formulas won't make any reaction that was balanced unbalenced in practice though they could be examined.
Take a look at cpd00021, cpd00030, cpd00058, cpd00149, and cpd00244. The formulas include the charge. |
Yes, with hindsight, it was a mistake to commit both a massive change (i.e. the addition of a single column) and spot-changes (i.e. the update of different formulas). Is Travis monitoring whether reaction balances are changing ? We should probably clear up why Travis had these 19 reactions listed as "balanced" in the first place. |
In any case, yes, I'm very confident in the formula-parsing-re-merging I did, here are some examples: Updating cpd17447: (SO4. Ca)2. H2O --> H2Ca2O9S2 I was also going to follow this up with the new set of reaction balances too, so we can see which reactions would have a different status, so lets merge, and then review. Finally, this goes back to what I was saying about using pretty-printed JSON objects, it helps see the diffs between individual columns. |
I suspect they were "balanced" because one or more of their compounds had
an unparseable formula. Here's part of the diff of the raw log of your PR
(<) vs the master branch (>) that might help
< Number of reactions with unbalanced equations: 7475
---
Number of reactions with unbalanced equations: 7456
22208,22209c22893,22894
< Reactant atoms:[('H', 4.0), ('O', 7.0), ('P', 2.0)]
< Product atoms:[('H', 4.0), ('O', 12.0), ('P', 4.0)]
---
Reactant atoms:[('H', 3.0), ('O', 4.0), ('P', 1.0)]
Product atoms:[('H', 2.0), ('O', 6.0), ('P', 2.0)]
22309c22994
< Product atoms:[('C', 10.0), ('H', 14.0), ('N', 2.0), ('O', 6.0)]
---
Product atoms:[('C', 5.0), ('H', 7.0), ('N', 1.0), ('O', 3.0)]
22673c23358
< Product atoms:[('C', 44.0), ('H', 68.0), ('N', 4.0), ('O', 34.0)]
---
Product atoms:[('C', 22.0), ('H', 34.0), ('N', 2.0), ('O', 17.0)]
23000,23001c23685,23686
< Reactant atoms:[('C', 34.0), ('H', 49.0), ('N', 7.0), ('O', 47.0), ('P',
2.0), ('S', 5.0)]
< Product atoms:[('C', 62.0), ('H', 91.0), ('N', 9.0), ('O', 82.0), ('P',
2.0), ('S', 10.0)]
---
Reactant atoms:[('C', 22.0), ('H', 30.0), ('N', 6.0), ('O', 30.0), ('P',
2.0), ('S', 3.0)]
Product atoms:[('C', 36.0), ('H', 51.0), ('N', 7.0), ('O', 46.0), ('P',
2.0), ('S', 5.0)]
23023,23030d23707
< Line 05964: rxn06114
< Reactant atoms:[('C', 38.0), ('H', 53.0), ('N', 7.0), ('O', 35.0), ('P',
2.0), ('S', 1.0)]
< Product atoms:[('C', 38.0), ('H', 53.0), ('N', 7.0), ('O', 38.0), ('P',
2.0), ('S', 2.0)]
<
< Line 05965: rxn06115
< Reactant atoms:[('C', 38.0), ('H', 53.0), ('N', 7.0), ('O', 35.0), ('P',
2.0), ('S', 1.0)]
< Product atoms:[('C', 38.0), ('H', 53.0), ('N', 7.0), ('O', 38.0), ('P',
2.0), ('S', 2.0)]
<
23124c23801
< Reactant atoms:[('C', 52.0), ('H', 74.0), ('O', 49.0)]
---
Reactant atoms:[('C', 26.0), ('H', 38.0), ('O', 25.0)]
23160c23837
< Reactant atoms:[('C', 28.0), ('H', 42.0), ('N', 2.0), ('O', 24.0)]
---
Reactant atoms:[('C', 14.0), ('H', 21.0), ('N', 1.0), ('O', 12.0)]
23243,23254d23919
< Line 06068: rxn06225
< Reactant atoms:[('C', 25.0), ('H', 44.0), ('N', 2.0), ('O', 31.0), ('P',
4.0)]
< Product atoms:[('C', 31.0), ('H', 54.0), ('N', 2.0), ('O', 36.0), ('P',
4.0)]
<
< Line 06069: rxn06226
< Reactant atoms:[('C', 27.0), ('H', 47.0), ('N', 3.0), ('O', 31.0), ('P',
4.0)]
< Product atoms:[('C', 35.0), ('H', 60.0), ('N', 4.0), ('O', 36.0), ('P',
4.0)]
<
< Line 06070: rxn06227
< Reactant atoms:[('C', 23.0), ('H', 42.0), ('N', 6.0), ('O', 29.0), ('P',
5.0)]
< Product atoms:[('C', 26.0), ('H', 47.0), ('N', 7.0), ('O', 30.0), ('P',
5.0)]
<
23413c24078
< Product atoms:[('C', 64.0), ('H', 96.0), ('N', 12.0), ('O', 36.0), ('R',
4.0)]
---
Product atoms:[('C', 32.0), ('H', 48.0), ('N', 6.0), ('O', 18.0), ('R',
2.0)]
23448c24113
< Reactant atoms:[('C', 4.0), ('H', 8.0), ('O', 4.0)]
---
Reactant atoms:[('C', 2.0), ('H', 4.0), ('O', 3.0)]
23452c24117
< Reactant atoms:[('C', 4.0), ('H', 10.0), ('N', 1.0), ('O', 2.0), ('R',
1.0)]
---
Reactant atoms:[('C', 2.0), ('H', 6.0), ('N', 1.0), ('O', 1.0), ('R',
1.0)]
23644c24309
< Reactant atoms:[('C', 12.0), ('H', 16.0), ('O', 12.0)]
---
Reactant atoms:[('C', 6.0), ('H', 8.0), ('O', 6.0)]
23789c24454
< Product atoms:[('C', 10.0), ('H', 17.0), ('O', 7.0), ('P', 2.0)]
---
Product atoms:[('C', 5.0), ('H', 9.0), ('O', 7.0), ('P', 2.0)]
23860,23861c24525,24526
< Reactant atoms:[('C', 78.0), ('H', 119.0), ('N', 11.0), ('O', 97.0),
('P', 2.0), ('S', 11.0)]
< Product atoms:[('C', 34.0), ('H', 49.0), ('N', 7.0), ('O', 56.0), ('P',
2.0), ('S', 8.0)]
---
Reactant atoms:[('C', 44.0), ('H', 65.0), ('N', 8.0), ('O', 55.0), ('P',
2.0), ('S', 6.0)]
Product atoms:[('C', 22.0), ('H', 30.0), ('N', 6.0), ('O', 33.0), ('P',
2.0), ('S', 4.0)]
23884,23885c24549,24550
< Reactant atoms:[('C', 28.0), ('H', 42.0), ('N', 2.0), ('O', 22.0)]
< Product atoms:[('C', 56.0), ('H', 84.0), ('N', 4.0), ('O', 62.0), ('S',
6.0)]
---
Reactant atoms:[('C', 14.0), ('H', 21.0), ('N', 1.0), ('O', 11.0)]
Product atoms:[('C', 28.0), ('H', 42.0), ('N', 2.0), ('O', 31.0), ('S',
3.0)]
24011,24014d24675
< Line 06489: rxn06657
< Reactant atoms:[('C', 57.0), ('H', 87.0), ('N', 6.0), ('O', 41.0), ('S',
1.0)]
< Product atoms:[('C', 58.0), ('H', 89.0), ('N', 6.0), ('O', 41.0), ('S',
1.0)]
<
24039,24042d24699
< Line 06511: rxn06679
< Reactant atoms:[('C', 51.0), ('H', 83.0), ('N', 6.0), ('O', 35.0), ('S',
1.0)]
< Product atoms:[('C', 52.0), ('H', 85.0), ('N', 6.0), ('O', 35.0), ('S',
1.0)]
<
24437c25094
< Product atoms:[('C', 61.0), ('H', 84.0), ('N', 2.0), ('O', 60.0), ('P',
2.0)]
---
Product atoms:[('C', 35.0), ('H', 48.0), ('N', 2.0), ('O', 36.0), ('P',
2.0)]
24477c25134
< Product atoms:[('C', 37.0), ('H', 54.0), ('N', 4.0), ('O', 36.0), ('P',
2.0)]
---
Product atoms:[('C', 23.0), ('H', 33.0), ('N', 3.0), ('O', 24.0), ('P',
2.0)]
24641c25298
< Product atoms:[('C', 16.0), ('H', 28.0), ('O', 4.0)]
---
Product atoms:[('C', 8.0), ('H', 14.0), ('O', 2.0)]
24645c25302
< Product atoms:[('C', 8.0), ('H', 12.0), ('O', 4.0)]
---
Product atoms:[('C', 4.0), ('H', 6.0), ('O', 2.0)]
25192,25193c25849,25850
< Reactant atoms:[('C', 66.0), ('H', 96.0), ('N', 9.0), ('O', 75.0), ('P',
2.0), ('S', 7.0)]
< Product atoms:[('C', 38.0), ('H', 53.0), ('N', 7.0), ('O', 38.0), ('P',
2.0), ('S', 2.0)]
---
Reactant atoms:[('C', 38.0), ('H', 54.0), ('N', 7.0), ('O', 44.0), ('P',
2.0), ('S', 4.0)]
Product atoms:[('C', 24.0), ('H', 32.0), ('N', 6.0), ('O', 24.0), ('P',
2.0), ('S', 1.0)]
26195,26198d26851
< Line 09470: rxn09995
< Reactant atoms:[('C', 10.0), ('H', 18.0), ('O', 9.0)]
< Product atoms:[('C', 5.0), ('H', 10.0), ('O', 5.0)]
<
27116c27769
< Reactant atoms:[('C', 10.0), ('H', 18.0), ('O', 9.0)]
---
Reactant atoms:[('C', 5.0), ('H', 10.0), ('O', 5.0)]
27868c28521
< Reactant atoms:[('C', 12.0), ('H', 16.0), ('O', 12.0)]
---
Reactant atoms:[('C', 6.0), ('H', 8.0), ('O', 6.0)]
29524c30177
< Reactant atoms:[('C', 52.0), ('H', 74.0), ('O', 49.0)]
---
Reactant atoms:[('C', 26.0), ('H', 38.0), ('O', 25.0)]
29881c30534,30538
< Product atoms:[('C', 32.0), ('H', 42.0), ('N', 10.0), ('O', 34.0), ('P',
4.0)]
---
Product atoms:[('C', 26.0), ('H', 34.0), ('N', 10.0), ('O', 28.0), ('P',
4.0)]
Line 12641: rxn13807
Reactant atoms:[('C', 6.0), ('H', 8.0), ('O', 6.0)]
Product atoms:[('C', 12.0), ('H', 16.0), ('O', 12.0)]
30252,30253c30909,30910
< Reactant atoms:[('C', 12.0), ('H', 16.0), ('O', 14.0)]
< Product atoms:[('C', 12.0), ('H', 16.0), ('O', 12.0)]
---
Reactant atoms:[('C', 6.0), ('H', 8.0), ('O', 7.0)]
Product atoms:[('C', 6.0), ('H', 8.0), ('O', 6.0)]
30341c30998
< Product atoms:[('H', 8.0), ('O', 24.0), ('P', 8.0)]
---
Product atoms:[('H', 4.0), ('O', 12.0), ('P', 4.0)]
30375,30378d31031
< Line 13339: rxn15088
< Reactant atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
< Product atoms:[('C', 6.0), ('H', 12.0), ('O', 6.0)]
<
30400c31053
< Reactant atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
---
Reactant atoms:[('C', 6.0), ('H', 12.0), ('O', 6.0)]
30404c31057
< Reactant atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
---
Reactant atoms:[('C', 6.0), ('H', 12.0), ('O', 6.0)]
30408c31061
< Reactant atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
---
Reactant atoms:[('C', 6.0), ('H', 12.0), ('O', 6.0)]
30476c31129
< Reactant atoms:[('C', 52.0), ('H', 74.0), ('O', 49.0)]
---
Reactant atoms:[('C', 26.0), ('H', 38.0), ('O', 25.0)]
30535,30538d31187
< Line 13657: rxn15434
< Reactant atoms:[('C', 5.0), ('H', 10.0), ('O', 7.0), ('P', 2.0)]
< Product atoms:[('C', 10.0), ('H', 18.0), ('O', 7.0), ('P', 2.0)]
<
30601c31250
< Product atoms:[('C', 46.0), ('H', 66.0), ('N', 6.0), ('O', 48.0), ('P',
4.0)]
---
Product atoms:[('C', 32.0), ('H', 45.0), ('N', 5.0), ('O', 36.0), ('P',
4.0)]
30627,30634d31275
< Line 13777: rxn15556
< Reactant atoms:[('C', 12.0), ('H', 21.0), ('O', 14.0), ('P', 1.0)]
< Product atoms:[('C', 6.0), ('H', 11.0), ('O', 9.0), ('P', 1.0)]
<
< Line 13778: rxn15557
< Reactant atoms:[('C', 15.0), ('H', 22.0), ('N', 2.0), ('O', 17.0), ('P',
2.0)]
< Product atoms:[('C', 21.0), ('H', 32.0), ('N', 2.0), ('O', 22.0), ('P',
2.0)]
<
30639,30646d31279
< Line 13780: rxn15559
< Reactant atoms:[('C', 22.0), ('H', 33.0), ('N', 5.0), ('O', 21.0), ('P',
2.0)]
< Product atoms:[('C', 16.0), ('H', 23.0), ('N', 5.0), ('O', 16.0), ('P',
2.0)]
<
< Line 13782: rxn15561
< Reactant atoms:[('C', 15.0), ('H', 22.0), ('N', 2.0), ('O', 17.0), ('P',
2.0)]
< Product atoms:[('C', 21.0), ('H', 32.0), ('N', 2.0), ('O', 22.0), ('P',
2.0)]
<
30669c31302
< Product atoms:[('C', 30.0), ('H', 52.0), ('O', 26.0)]
---
Product atoms:[('C', 18.0), ('H', 32.0), ('O', 16.0)]
30683,30686d31315
< Line 13796: rxn15575
< Reactant atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
< Product atoms:[('C', 6.0), ('H', 12.0), ('O', 6.0)]
<
30703,30706d31331
< Line 13802: rxn15581
< Reactant atoms:[('C', 14.0), ('H', 20.0), ('N', 2.0), ('O', 16.0), ('P',
2.0)]
< Product atoms:[('C', 19.0), ('H', 28.0), ('N', 2.0), ('O', 20.0), ('P',
2.0)]
<
30721c31346
< Product atoms:[('C', 24.0), ('H', 40.0), ('O', 20.0)]
---
Product atoms:[('C', 12.0), ('H', 20.0), ('O', 10.0)]
30731,30734d31355
< Line 13817: rxn15597
< Reactant atoms:[('C', 24.0), ('H', 42.0), ('O', 21.0)]
< Product atoms:[('C', 12.0), ('H', 22.0), ('O', 11.0)]
<
30743,30750d31363
< Line 13820: rxn15600
< Reactant atoms:[('C', 5.0), ('H', 10.0), ('O', 5.0)]
< Product atoms:[('C', 10.0), ('H', 18.0), ('O', 9.0)]
<
< Line 13821: rxn15601
< Reactant atoms:[('C', 5.0), ('H', 10.0), ('O', 5.0)]
< Product atoms:[('C', 10.0), ('H', 18.0), ('O', 9.0)]
<
32656c33269
< Reactant atoms:[('C', 25.0), ('H', 44.0), ('N', 2.0), ('O', 31.0), ('P',
4.0)]
---
Reactant atoms:[('C', 20.0), ('H', 33.0), ('N', 2.0), ('O', 24.0), ('P',
3.0)]
33489c34102
< Product atoms:[('C', 28.0), ('H', 42.0), ('N', 2.0), ('O', 24.0)]
---
Product atoms:[('C', 14.0), ('H', 21.0), ('N', 1.0), ('O', 12.0)]
33645c34258
< Product atoms:[('C', 24.0), ('H', 40.0), ('O', 22.0)]
---
Product atoms:[('C', 12.0), ('H', 20.0), ('O', 11.0)]
44452c45065
< Reactant atoms:[('C', 48.0), ('H', 82.0), ('O', 41.0)]
---
Reactant atoms:[('C', 24.0), ('H', 42.0), ('O', 21.0)]
46252c46865
< Reactant atoms:[('C', 12.0), ('H', 20.0), ('O', 10.0)]
---
Reactant atoms:[('C', 6.0), ('H', 10.0), ('O', 5.0)]
46467,46470d47079
< Line 26977: rxn31251
< Reactant atoms:[('C', 24.0), ('H', 28.0), ('N', 8.0), ('O', 9.0)]
< Product atoms:[('C', 29.0), ('H', 35.0), ('N', 9.0), ('O', 12.0)]
…On Mon, Sep 18, 2017 at 4:11 PM, samseaver ***@***.***> wrote:
Yes, with hindsight, it was a mistake to commit both a massive change
(i.e. the addition of a single column) and spot-changes (i.e. the update of
different formulas).
Is Travis monitoring whether reaction balances are *changing* ? We should
probably clear up why Travis had these 19 reactions listed as "balanced" in
the first place.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIGfOTpmLVDIV6Sk3VXtFwzlH4X-FBIlks5sjtxrgaJpZM4PZKFC>
.
|
basically any rxn id that appears above is a new "unbalanced reaction". I would love switch everything over to pretty print json! I'd have to rewrite the tests but it would be more legible than TSV @mmundy42 do you have a lot code outside this repo that uses the current tsv schema? |
I spot-checked one of these: rxn06114 This happens: The stored formula had a multiplier. However, in my current reaction balancing code (not yet committed), the status of rxn06114 is still "OK". |
Ah-ha, I just found a filter I had in my reaction balancing code, so now I can see that I'm getting the same newly reported imbalances. I'm recording them, and what I will do is I will follow up on them individually to clean up those imbalances and re-commit. |
Yes, I have some code that uses the tsv schema but it would be easy to switch to a JSON schema. One thing to consider is how to handle fields that have no values in tsv files. Would all "records" in the JSON file have a key for every field? Also, it would be good to make sure that fields are consistently defined. For example, in a biomasses file, all of the numeric values should be floats. I'd suggest defining a json-schema for each file and using the jsonschema Python package to validate the files. |
Love the idea of schema validation as part of the tests. I would say |
Sweet! Updating my code to specifically assume most multipliers should default to '1' results in passed checks on reaction balancing. |
I've been using a "null" text string for the TSV schema, I'm ambivalent as to whether we should maintain using a "null" string or a true null value. |
Great! This is how the checks are supposed to work. |
This is a bit of a double-whammy, all formulas updated so that there's no extra characters, parentheses, multipliers, etc. so to normalize across the board, and make it easier when balancing reactions.
In addition, if formulas had multipliers, either explicit or implicit (i.e. '(formula)[xn*]') we take note in the Notes column, as "PO" for polymers.