-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect separators on water synonyms #29
Comments
Hi Andrés, |
Hi Andrés, Sincerely, |
what do you think of adding line = line.replace(';','\t') before this line chemicals/chemicals/identifiers.py Line 370 in c5b1014
could solve the problem temporally? Also, i noticed (by a quick view, nothing exhaustive) that those synonyms separated by ';' are always at the end of the list. Edit: the split |
Hi Andrés, What is hard to do is that the online data has changed so much, I can't even use a diff program to see what changed. Because of that, it's hard to replace the current file with the new one. Do you want to look at it? Sincerely, |
Hi Caleb, Given the old and new versions, i could program a manual diff to see what's changed, I'm gonna start with this and let you know what I found. |
for a preliminar parsing: Oldjulia> CC.load_db!(:inorganic_old2)
[ Info: :inorganic_old2 arrow file not generated, processing...
syms_i = 6326 #amount of synonyms
syms_unique = 6325 # unique elements (there is one element repeated that i have yet find)
(Arrow.Table with 153 rows, 9 columns, and schema:
..... Newjulia> CC.load_db!(:inorganic_new)
[ Info: :inorganic_new database file not found, downloading from https://github.com/CalebBell/chemicals/files/6912649/Inorganic.db.csv
[ Info: :inorganic_new database file downloaded.
[ Info: :inorganic_new arrow file not generated, processing...
syms_i = 9461
syms_unique = 9438
(Arrow.Table with 164 rows, 9 columns, and schema: comparing the differences, by InChI: InChI contained in the old database, not present in the new database
InChI contained in the new database, not present in the old database
|
doing the same thing with the formulas: julia> setdiff(set_new,set_old)
Set{String} with 21 elements:
"Cl3Ru"
"O3Yb2"
"H2O" #water is in new the inorganics database
"AlLaO"
"I2Sm"
"B2Zr"
"H3NaO4P"
"HLi"
"Al6O13Si2"
"Cl2S2"
"As2H12O3"
"CW2"
"C32H16CuN8"
"OPr"
"ClH2Tl"
"H5NO5S"
"C10O10Re2"
"BLiO"
"H2NaO4S"
"BrH2Tl"
"CdF2" julia> setdiff(set_old,set_new)
Set{String} with 11 elements:
"HNa2O4P"
"ClTl"
"H4Si"
"H4Na2O12S3"
"As2O3"
"BrCsO3"
"BrTl"
"H2NaO4P"
"F6H8N2Si"
"F6Na2Si"
"D2Se" |
What is the search string
caustic soda liquid;aquafina;distilled water;hydrogen oxide (h2o);ultrexii ultrapure;
Which chemical in the database do you believe should be found?
its water,but the separators here are wrong
The text was updated successfully, but these errors were encountered: