prepare - work around anndata bug #1260

bkmartinjr · 2020-03-20T19:25:47Z

Work around for the issue scverse/anndata#344

Changes to launch are just lint (black autoformat). See change in prepare.py.

codecov-io · 2020-03-20T19:31:17Z

Codecov Report

Merging #1260 into master will increase coverage by 0.59%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #1260      +/-   ##
==========================================
+ Coverage   61.81%   62.40%   +0.59%     
==========================================
  Files          66       66              
  Lines        4897     4918      +21     
  Branches      374      374              
==========================================
+ Hits         3027     3069      +42     
+ Misses       1778     1757      -21     
  Partials       92       92

Flag	Coverage Δ
#backend	`52.07% <87.50%> (+1.15%)`	⬆️
#frontend	`75.00% <ø> (ø)`
#javascript	`75.00% <ø> (ø)`
#python	`52.07% <87.50%> (+1.15%)`	⬆️
#smokeTestAnnotations	`100.00% <ø> (ø)`
#unitTest	`62.40% <87.50%> (+0.59%)`	⬆️

Impacted Files	Coverage Δ
server/cli/launch.py	`0.00% <ø> (ø)`
server/common/app_config.py	`58.08% <ø> (ø)`
server/cli/prepare.py	`28.16% <86.95%> (+28.16%)`	⬆️
server/data_cxg/cxg_adaptor.py	`36.84% <100.00%> (ø)`
server/common/utils.py	`89.79% <0.00%> (+2.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db7a485...060de64. Read the comment docs.

mweiden · 2020-03-22T06:41:20Z

@bkmartinjr posted a solution here. While I do think it is an improvement to the current implementation, perhaps we should consider using different join character in the function since there are gene names that follow the format [a-zA-Z]+-[0-9]+. This might make the data harder to interpret since the number in the suffix would sometimes be part of the gene name and sometimes indicate a duplicate.

I don't know if gene names follow a specific format, but perhaps it's possible to chose a character not in that format. Perhaps something like . or #?

This might be a @sidneymbell question.

bkmartinjr · 2020-03-22T15:52:10Z

I don't know if gene names follow a specific format, but perhaps it's possible to chose a character not in that format. Perhaps something like . or #?

How would this resolve the underlying algorithmic problem? I elected to use the Scanpy/Anndata standard rather than inventing another.

What I think we need to do is change the algorithm so it is actually guaranteeing uniqueness.

mweiden · 2020-03-22T17:37:17Z

@bkmartinjr

How would this resolve the underlying algorithmic problem?

It wouldn't solve the algorithmic problem. Actually, I'd still push for my PR to go through in the anndata repo. It would simply make the names in the indices we produce more easily interpretable.

I elected to use the Scanpy/Anndata standard rather than inventing another.

Technically, if we can convince ourselves that a char like . or # doesn't occur in standard gene names, you could use the current Scanpy/Anndata release and just pass that char to the join parameter in .var_names_make_unique and .obs_names_make_unique and all names would be unique with one call to the given function.

bkmartinjr · 2020-03-22T17:58:22Z

doesn't occur in standard gene names

There is no guarantee that the names will conform to this standard. And we really can't assume that either. They can be anything.

Temporarily copy code from scverse/anndata#345 until the issue is resolved and released.

bkmartinjr requested a review from mweiden March 20, 2020 19:25

bkmartinjr added the hosted label Mar 20, 2020

mweiden approved these changes Mar 22, 2020

View reviewed changes

bkmartinjr and others added 6 commits March 22, 2020 12:18

work around anndata bug 344

52369ab

fix accidental cut and paste error

5a8a9dd

Use modified make_index_unique function

59e64ba

Temporarily copy code from scverse/anndata#345 until the issue is resolved and released.

Add notes and test for make_index_unique

c72cfbb

Lint fix

80c77ae

Format python

060de64

mweiden force-pushed the bkmartinjr/prepare-fix branch from a5f7465 to 060de64 Compare March 22, 2020 19:18

mweiden merged commit d99b84b into master Mar 22, 2020

mweiden deleted the bkmartinjr/prepare-fix branch March 22, 2020 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare - work around anndata bug #1260

prepare - work around anndata bug #1260

bkmartinjr commented Mar 20, 2020

codecov-io commented Mar 20, 2020 •

edited

mweiden commented Mar 22, 2020 •

edited

bkmartinjr commented Mar 22, 2020 •

edited

mweiden commented Mar 22, 2020 •

edited

bkmartinjr commented Mar 22, 2020

prepare - work around anndata bug #1260

prepare - work around anndata bug #1260

Conversation

bkmartinjr commented Mar 20, 2020

codecov-io commented Mar 20, 2020 • edited

Codecov Report

mweiden commented Mar 22, 2020 • edited

bkmartinjr commented Mar 22, 2020 • edited

mweiden commented Mar 22, 2020 • edited

bkmartinjr commented Mar 22, 2020

codecov-io commented Mar 20, 2020 •

edited

mweiden commented Mar 22, 2020 •

edited

bkmartinjr commented Mar 22, 2020 •

edited

mweiden commented Mar 22, 2020 •

edited