Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare - work around anndata bug #1260

Merged
merged 6 commits into from
Mar 22, 2020
Merged

prepare - work around anndata bug #1260

merged 6 commits into from
Mar 22, 2020

Conversation

bkmartinjr
Copy link
Contributor

Work around for the issue scverse/anndata#344

Changes to launch are just lint (black autoformat). See change in prepare.py.

@bkmartinjr bkmartinjr requested a review from mweiden March 20, 2020 19:25
@codecov-io
Copy link

codecov-io commented Mar 20, 2020

Codecov Report

Merging #1260 into master will increase coverage by 0.59%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1260      +/-   ##
==========================================
+ Coverage   61.81%   62.40%   +0.59%     
==========================================
  Files          66       66              
  Lines        4897     4918      +21     
  Branches      374      374              
==========================================
+ Hits         3027     3069      +42     
+ Misses       1778     1757      -21     
  Partials       92       92              
Flag Coverage Δ
#backend 52.07% <87.50%> (+1.15%) ⬆️
#frontend 75.00% <ø> (ø)
#javascript 75.00% <ø> (ø)
#python 52.07% <87.50%> (+1.15%) ⬆️
#smokeTestAnnotations 100.00% <ø> (ø)
#unitTest 62.40% <87.50%> (+0.59%) ⬆️
Impacted Files Coverage Δ
server/cli/launch.py 0.00% <ø> (ø)
server/common/app_config.py 58.08% <ø> (ø)
server/cli/prepare.py 28.16% <86.95%> (+28.16%) ⬆️
server/data_cxg/cxg_adaptor.py 36.84% <100.00%> (ø)
server/common/utils.py 89.79% <0.00%> (+2.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db7a485...060de64. Read the comment docs.

@mweiden
Copy link
Contributor

mweiden commented Mar 22, 2020

@bkmartinjr posted a solution here. While I do think it is an improvement to the current implementation, perhaps we should consider using different join character in the function since there are gene names that follow the format [a-zA-Z]+-[0-9]+. This might make the data harder to interpret since the number in the suffix would sometimes be part of the gene name and sometimes indicate a duplicate.

I don't know if gene names follow a specific format, but perhaps it's possible to chose a character not in that format. Perhaps something like . or #?

This might be a @sidneymbell question.

@bkmartinjr
Copy link
Contributor Author

bkmartinjr commented Mar 22, 2020

I don't know if gene names follow a specific format, but perhaps it's possible to chose a character not in that format. Perhaps something like . or #?

How would this resolve the underlying algorithmic problem? I elected to use the Scanpy/Anndata standard rather than inventing another.

What I think we need to do is change the algorithm so it is actually guaranteeing uniqueness.

@mweiden
Copy link
Contributor

mweiden commented Mar 22, 2020

@bkmartinjr

How would this resolve the underlying algorithmic problem?

It wouldn't solve the algorithmic problem. Actually, I'd still push for my PR to go through in the anndata repo. It would simply make the names in the indices we produce more easily interpretable.

I elected to use the Scanpy/Anndata standard rather than inventing another.

Technically, if we can convince ourselves that a char like . or # doesn't occur in standard gene names, you could use the current Scanpy/Anndata release and just pass that char to the join parameter in .var_names_make_unique and .obs_names_make_unique and all names would be unique with one call to the given function.

@bkmartinjr
Copy link
Contributor Author

doesn't occur in standard gene names

There is no guarantee that the names will conform to this standard. And we really can't assume that either. They can be anything.

@mweiden mweiden merged commit d99b84b into master Mar 22, 2020
@mweiden mweiden deleted the bkmartinjr/prepare-fix branch March 22, 2020 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants