Error when parsing Silva taxonomies #129

antgonza · 2019-11-22T04:00:50Z

This is more an FYI as this was previously reported in QIIME2: https://forum.qiime2.org/t/qiime-taxa-filter-table-error/3947/10

If you try to use Silva based taxonomies you get this error:

There was an issue with viewing the artifact merged_seqs-taxonomy-silva.qza as QIIME 2 Metadata:

  CategoricalMetadataColumn does not support values with leading or trailing whitespace characters. Column 'Taxon' has the following value: 'D_0__Bacteria;D_1__Gemmatimonadetes;D_2__Gemmatimonadetes;D_3__Gemmatimonadales;D_4__Gemmatimonadaceae;D_5__Gemmatimonas;D_6__uncultured bacterium '

fedarko · 2020-06-15T23:36:56Z

@antgonza I'm working on testing that this error is resolved right now -- do you happen to still have the data that caused this lying around?

I think this addresses biocore#129, at least to the extent that we can handle this without messing with QIIME 2. But I'm not closing it until I can verify (if poss.) that the original error is rectified.

see biocore#129

fedarko · 2020-06-16T18:53:03Z

Ok, updates on this: I can replicate the error @antgonza reported by, in QIIME 2 2019.7, running qiime metadata tabulate on the taxonomy file he was using (thanks for sharing it with me!). In QIIME 2 2019.10, the error is fixed, and the taxonomy file can be viewed as metadata without the error. I confirmed also that generating an Empress visualization from @antgonza's files works without an error in QIIME 2 2019.10.*

This is because the issue causing this problem was fixed in the 2019.10 version of QIIME 2, so I've just updated the README accordingly to request that users use at least that version. Just to be super extra careful, I've also added tests that verify that leading/trailing whitespace (including testing the exact D_5__Gemmatimonas annotation above) are handled properly.

So I think we can close this issue when the feature metadata branch (to which the above changes have been made) is merged in.

* (I had to generate the visualization on barnacle, since my computer killed the Empress python code... when I tried to view the resulting QZV on chrome, my browser wouldn't load it after ~5 minutes so I just gave up. However, the visualization was at least generated successfully. This is more motivation for improving the tool's performance as discussed in #74, #151, etc.)

antgonza · 2020-06-16T19:04:37Z

Thank you @fedarko!

BTW the shared files are from a meta-analysis from 3 host types (rats, mice, human) and only 5 qiita studies ... highlighting the importance of fixing/improving those other issues!

antgonza · 2020-06-16T19:06:31Z

Thinking about this, @fedarko, @ElDeveloper should those performance issues be part of the alpha release?

ElDeveloper · 2020-06-16T20:56:29Z

@esayyari is working on a PR to make loading times faster. We should be able to hear more updates from him soon.

@ElDeveloper

* ENH: Add initial stab at circular layout Needs further inspection and testing, and of course the JS needs to be modified to draw "vertical lines" as for the rect. layout (but now with curves, etc.) but this seems like a decent start * DOC: add details re circ layout alg and TODOs * ENH: draw circ layout arcs (not beziers yet), etc Lotta ugly hacky code in this commit. Will get things working then from there backtrack to make things pretty and well tested before the PR. This is looking actually pretty nice. Will probably need to fix the "shape" of the circular layout though LOL * MNT: don't generate arc info for root (circlayout) * BUG: still store pos data for root * MNT: on 2nd thought draw "arc" for c.layout root consistency will be useful, and we can adjust later as needed also improved js docs for # lines needed * DOC: document c0/c1 stuff for circ layout better * BUG: Actually draw tips for circ layout turns out that the reason things in this layout looked 'incomplete', off, etc. was that xc0 and yc0 info was only being stored for internal nodes, giving the dubious impression that penultimate nodes were tips. With this, remaining circular layout TODOs are 1) drawing fancy curved arcs for internal nodes 2) adjust the UI to get the thick line stuff to work properly with the circular layout 3) clean up code, esp messy portions (like maybe store xc0/xc1 info in the layoutToCoordSuffix object somehow, by making it interpret an array of two suffixes as start and endpoints) 3) add tests * MNT: Don't store root arc info; smooth arcs; docs Improved documentation on *why* attempting to draw an arc for the root is kinda useless. Hopefully shouldn't have to update that again ._. I split up the line segment for the circular layout into two, one connecting an internal node's endpoint with its arc start and another connecting the node endpoint with its arc endpoint. This makes the layout look A LOT better; it's still essentially an approximation, though. What we need to do is hijack the WebGL code to draw bezier curves (starting and ending at the arc start/endpoints, and passing through the internal node endpoint) instead. Uh, hopefully that isn't too difficult? LOL I guess we'll see. * DOC: improve circ layout coordinate docs Addressing some ambiguities, and fixing an error w prev commit docs * DOC: add note re: internal node arcs + cornercases * BUG/MNT: Proper thick lines in c.layout; code chgs I moved the general "corner-computing" code from within Empress.thickenSameSampleLines() to its own function within VectorOps. This allows a lot more code reuse, but I think we could make the code even more compact by allowing _addTriangleCoords() to accept {tL: ..., ...} objects directly. * MNT: Make _addTriangleCoords accept corners obj ... Instead of accepting individual corner coordinates. This makes life a bit easier, or at least makes the code a bit cleaner (since now we can funnel the output from VectorOps.computeBoxCorners() straight into Empress._addTriangleCoords()). Of course it's worth noting that making the circular layout draw the arcs with curved lines/beziers (and not just as the disjointed line segments) will probably mean we'll have to abandon this particular function for that particular use case. for now, tho, this looks pretty nice * for reference, update qzv * ENH: Match up feature metadata w/ tree nodes Progress on #130. Next up is testing this (... and un-breaking the tests in general, my bad). * TST: Un-break rect layout tests The tests were broken due to me renaming lowestchildyr, etc. to lowest_child_yr. Fixed. * TST: Un-break matching tests due to #130 work All that was really needed was just unpacking a third return value (feature metadata). I added a few tests that this value was None, but we'll test this in more depth shortly. * STY: clean JS line-thickening code Some repeated declarations ("var corners", "var x1", etc.) were making jshint angry, so I addressed them (for corners, by declaring it up front at the start of the function in question; for the x1/y1/ etc. stuff, by not even assigning them explicitly to variables and just by passing them directly to the function I was calling). * STY: Minor flake8 fixes The build should be working now 💯 * TST: Add fm for testing; don't name internal nodes the "don't bother naming internal nodes" change comes from the fact that now that step is done after matching, anyway, so there's no point in keeping this line in the tests * TST: add basic feature metadata matching test #130 * TST: test feature metadata mismatch error :) #130 * TST: test partial feature dropping in fm matching done with this part, gonna move on to taxonomy handling / etc. next * ENH: Add taxonomy-splitting func signature/docs (still need to accidentally implement it lol 💯) * ENH: Draft taxonomy splitting code skeleton the actual meat of this function isn't done yet, but this should at least handle most (but not all, Error condition number 3 is still unimplemented) of the funky corner cases. * STY: flake8 line spacing * ENH: validate ; ct. and do tax splitting #130 * TST: add basic tax splitting infrastructure tests * MNT: tax splitting->new module; allow uneven ; cts Lots of (untested) changes here. Need to test this stuff... * BUG/TST: Fix "Level" col bug, and add tests for it tldr code was lowercasing all col names and checking to see if any started with "Level", which obvs wouldn't be the case since "Level" would get turned to "level". Fixed problem. * MNT: Put Level cols at start of fm * TST: add partial tax splitting "good" case test at least we know it works know lol * TST: Finish "good" tax-splitting test * TST: test funky tax splitting corner case * TST/MNT: warn on no-; case and add test * ENH: add UI skeleton for f. metadata coloring #130 * ENH/DOC: add fancy notes for coloring tabs :D:D:D * ENH: Chuck feat. md over to JS; populate selectors So, now we have access to feature metadata in the Empress JS object, and the selectors in the feature metadata coloring tab have the f.m. fields populated there. * MNT: Make feat. md. data transfer simpler has advantage of preserving column order ;) * DOC: Add inline docs to layout + anim. panels * DOC: update QZV for reference * ENH: Add initial, hacky f.m. coloring prototype! :D:D:D:D * STY: fix prettier issues * BUG: Make code work, and fix UI, if no f.m. passed UI could use some extra work -- maybe set the initial "hidden" on the fm button directly from jinja2, if possible. but this at least works? * BUG: remove ref to missing UI ele in JS this makes unchecking the feature metadata checkbox work properly -- what was happening was the fHideChk thing was getting referenced, which (silently from the user perspective ...) crashed the "close" function, which prevented the tree from being reset and redrawn. js, amirite * STY: fix flake8 spacing pbm * DOC: Describe use of f.m. in CLI docs * ENH: color internal nodes with uniform child f.md * Update qzv * circular layout * Add QZV including BOTH circ layout AND fm coloring :D :D :D :D :D :D * Show feature metadata for tips in select-node menu Helps a lot for #130 * ENH: improve node hover menu UI * Consistent table styling, better frozen header col satisfied for now * Show a table of feature metadata in the node hover menu; improve the existing styling in the menu (#166) * Show feature metadata for tips in select-node menu Helps a lot for #130 * ENH: improve node hover menu UI * Consistent table styling, better frozen header col satisfied for now * STY: prettier fixes * TST: add docstrings for f.m. matching tests * ENH: fix minor grammar issue in node menu * MNT: Clean code for name_internal_nodes() - unlabled -> unlabeled - use str.format() rather than % for formatting node names (seems to be preferred in modern python code) * BUG: fix one last "unlabled" * TST / BUG: Explicitly disallow nonunique tip names I thought we were already checking for this, but turns out we weren't. * ENH/TST: warn when internal nodes share name Per advice from @ElDeveloper worth noting that the current check is bugged: it doesn't check the root node's name. We should fix that. * BUG/TST: Also consider root node name for warnings * TST/BUG: raise error if tip name in int node names Another annoying corner case it's good to detect and handle. Note that we purposefully ignore "None"-named nodes, since those will be replaced with unique IDs later. * MNT: Rename name_internal_nodes() more accurately Because: 1) it wasn't just renaming internal nodes 2) it also assigns missing branch lengths (although this is NOT tested as far as I can tell) * MNT: fill_missing_node_data -> _node_names SO: turns out that the validation code i added a while back re: nodes with missing branch lengths was unnecessary, since name_internal_nodes() already handled that. Well, at this point the simplest thing to do is just remove that functionality from name_internal_nodes() (now named fill_missing_node_names()), since the validation should already preclude missing branch lengths. (If this is annoying, we can set things so that missing branch lengths are replaced with 0 or 1 or something, but I personally don't think that's a great idea.) * STY: add extra blank line for flake8 * BUG: Handle EmpressNode* corner case should be a temporary solution, but I don't anticipate this being a problem for the majority of use cases. * MNT: explicitly ignore None-named tips in matching I don't believe this was doing anything before, but it could have ostensibly been a problem if there was a feature named None (like, the object None) in the table. (And I don't *think* that will ever happen since we're getting these tables from QIIME 2, which uses biom-format, which treats IDs as strings.) * TST: test corner-case feature metadata matching Some funky things re: 1) root of the tree being allowed to have fm and 2) duplicate internal nodes being allowed to have fm * STY: minor fixes to _plot.py - Use a safer variable name than "file", which is a reserved keyword in some versions of python: https://stackoverflow.com/a/24942363/10730311 (I doubt that this would cause problems since I'm pretty sure Empress only supports python 3, but we might as well be safe.) - Use more reasonable-looking indentation for a jinja2 call * fix typo in biom table js * MNT: remove logging stmt I left in accidentally * ENH: Add UI skeleton for two fm coloring options * fix a grammar issue * MNT: Refactor to store fmd as tip/int node md This will make implementing the different JS "coloring methods" a lot easier. Need to update the tests. * TST: Verify that taxonomy whtspcs handled right I think this addresses #129, at least to the extent that we can handle this without messing with QIIME 2. But I'm not closing it until I can verify (if poss.) that the original error is rectified. * TST: test specific SILVA annotation from #129 * fix typos * TST: start fixing old match tests re: t/i fm split * TST: fix more matching tests; imprv matching docs ... since i forgot about the 'moving tax splitting to match_inputs()' change i made earlier today :P * TST: Finish un-breaking old matching tests Still need to, like, add on new tests that explicitly verify that the splitting stuff works, but these changes essentially do that already * DOC: bump min q2 version to 2019.10 see #129 * TST: add clarifying comment re: #129 test * Finish up merging stuff * add back index.html gen stuff i accidentally deleted during merging but we need this ._. * temporarily rm circ layout (it'll be added back when #180 is merged in) * MNT: Abstract pretty sample table generation ... And use the new function for both leaf *and* internal node selection menus. The new function is a static method, so it should be pretty simple to test (knock on wood). Need to actually test this though!!!! This addresses one of the TODOs on #169. * Remove #tree-container CSS block: close #168 This was getting on my nerves, so I just knocked it out now ._. * DOC/MNT: clear tableEle HTML in s.tbl.gen; docs * TST: Test tip/int node md cols, and details Crucially, this includes the "just internal nodes" case :) * TST: Un-break core tests ... By adding the tip/int metadata stuff to DICT_A. We should definitely add tests that make sure that this info is properly transmitted at some point, tho. * TST: test feature metadata and _to_dict() * BUG: be consistent w/ "EmpressNode" btwn python/js We might just wanna not special-case this altogether tbh * STY: fix js/py style issues - unbreak travis :D * BUG: enable selecting the root node #169 * update qzv * Fix typo * ENH: Show int node fmd, simplify fm table code Also fixed a few typos ("uniqe" -> "unique") and bolded the warning about duplicate node IDs. The code for this is p messy (although it does demonstrably work :D) so I gotta fix this... * MNT: Abstract fmd table gen code to sep func ... so we can use it for both internal nodes and tips. TODO, need to add docs + tests for this ... * DOC: Add docs for makeFeatureMetadataTable() * DOC: make plugin setup fm usage desc more faithful * BUG: Remove cursor: pointer on text/number inputs I think this should make it clearer to the user that you can type into these fields... * ENH: Add int. node coloring method; refactoring F. metadata coloring method is now configurable!!! only took me forever ._. * STY: js style fixes * STY: fix variable re-defining pbm * ENH: On changing fm method, change checkbox desc. The wording could probably use some work, but this does the job. * MNT: tidy up side panel code a bit * BUG: Use linethickening for fm; readd desc updates (The reason I held off on the line-thickening stuff is it's all "thicken same sample lines" and stuff, but at this point we're hijacking all of the sample coloring stuff to color by whatever.) * DOC: update fm TODO comment * MNT: Abstract out a LOT of side-panel-handler code ... into shared functions between the feature / sample coloring code. This should have the side effect of making testing easier. I think? In any case, the documentation seems a bit nicer :) * DOC: update _updateColoring re funky method param at least justifies why it's kinda wack * BUG: prevent sample/ftr opts simultaneously shown getting that commit title down to 50 characters was a struggle, but you get the idea. This involved adding back some functions to side panel handler, but these are a lot more clear and concise so I don't mind them as much. or maybe i'm just biased because i wrote them lol * ENH: clean node selection menu; improve descs #169 Another thing ticked off for #169. Something i thought of just now - would be good to say "Name" instead of "ID" for nodes maybe? skbio treenode treats these as Names not IDs, and according to wikipedia (https://en.wikipedia.org/wiki/Newick_format) newick seems to mostly consider these as "names" also. * STY: fix JS style; redo qzv * MNT: Rename _validate_data: makes behavior clearer Addresses a comment from @kwcantrell * DOC: Clean up _to_dict fm usage Things should be a lot clearer now. Addresses points brought up in @kwcantrell's review * Clarify more comments in _to_dict * BUG: Reset fm coloring "update" btn in selcallback Addresses a comment from @ElDeveloper * STY: Rename Empress ctx var name for closure This is somewhat more consistent. Addresses an @ElDeveloper comment. * STY: minor prettier thing * MNT: Rename hover-* stuff to menu-* Addresses comment from @kwcantrell * Update empress/support_files/js/vector-ops.js Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * Only use {{ feature_metadata_columns }} once addresses @ElDeveloper comment * Add + update copyright headers to all python files * Update empress/taxonomy_utils.py Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * Simplify max ; ct computation Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> * Avoid unnecessary set() conversions Addresses suggestion from @ElDeveloper * Use @ElDeveloper suggestion for tax splitting code * Update qzv * Revert "Use @ElDeveloper suggestion for tax splitting code" This reverts commit ce8e26c. * ... update qzv from old logic * unbreak circ layout js test * Un-break python test Thanks @esayyari for debugging this! Co-authored-by: kcantrel <kcantrel@ucsd.edu> Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>

ElDeveloper added bug needed before initial release labels Mar 31, 2020

ElDeveloper assigned fedarko Jun 15, 2020

ElDeveloper modified the milestones: Alpha Release, Pre alpha release Jun 15, 2020

fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020

TST: test specific SILVA annotation from biocore#129

f7c5f6c

fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020

DOC: bump min q2 version to 2019.10

b1d51ff

see biocore#129

fedarko added a commit to fedarko/empress that referenced this issue Jun 16, 2020

TST: add clarifying comment re: biocore#129 test

8a9b4fb

antgonza closed this as completed Jun 16, 2020

fedarko mentioned this issue Jun 18, 2020

Add feature metadata support, and some UI / code fixes #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when parsing Silva taxonomies #129

Error when parsing Silva taxonomies #129

antgonza commented Nov 22, 2019

fedarko commented Jun 15, 2020

fedarko commented Jun 16, 2020

antgonza commented Jun 16, 2020

antgonza commented Jun 16, 2020

ElDeveloper commented Jun 16, 2020

Error when parsing Silva taxonomies #129

Error when parsing Silva taxonomies #129

Comments

antgonza commented Nov 22, 2019

fedarko commented Jun 15, 2020

fedarko commented Jun 16, 2020

antgonza commented Jun 16, 2020

antgonza commented Jun 16, 2020

ElDeveloper commented Jun 16, 2020