server refactor #1140

bmccandless · 2020-02-03T20:14:24Z

This PR contains a refactoring to make adding new features easier.

The new features include supporting the tiledb format, and the multi dataset application.

The refactoring includes

Simplifying the directory structure and files.
a class structure to handle annotations (currently one type: AnnotationsLocalFile).
a class to handle application configuration
a class structure to handle matrix data (currently AnndataAdaptor and CxgAdaptor). CxgAdaptor uses tiledb.
Algorithms that were previously dependent on the scanpy anndata object are now generalized to work with an abstract interface.

The multi dataset option is not fully supported yet, and so the option to use it is hidden.
Use "cli launch --dataroot ..."
To access this feature.

All combinations of app single dataset/ app multi dataset and AnndataAdaptor/CxgAdaptor work with all the features, such as annotations, ontologies, diffexp.

This update does not change any behavior. The simply moves code around into a new directory structure. Later patches will add new capabilities.

* refactor reducer to no longer support hover state and hold many labels * refactor to generate centroidCoordinates for all values of a category * create hash for function and memoize export * create button to display all labels for a category * clear state * create label for each thing * calculate on each value * change to in place modification of map * switch to for loop with iterator instead of forEach * use map from centroidLabel instead of creating copy * adapt for map * utilize tarrays * begin documentation * disable centroids if in zoom mode * clean up * persist uncalc coordinates * document * clean up and document * cleanup and document * fix * fix undefined labels and document changes * fix first element skip * fix conditional recalc * rename centroidLabel -> centroidLabels * break out dilation on hover to new reducer * numerous styling changes for readability * change centroid icon * remove colorAccessor from parameters * recalc centroids on world change * make label toggle undoable * remove unused import * highlight labels on hover * remove special characters from svg id * lighten backdrop * only generate new centroids if they pre-exist * fix issue with spaces in catagorical value name * add label buttons to menubar * change reducer to use colorAccessor and have single toggle * fix check to see if svg should be rendered * move svg overlays onto a single svg layer * dilate on label hover * remove logs * allow centroid to update along side regl renders * allow actions to pass through svg if in zoom mode * remove artifact from circle * remove comment * remove disabling of centroid button * fix conditional map to screen * make styling label conditions stricter * prettier * refactor onto master * refactor computePointFlags() to use pointDilation store * notify when viewport changes * move svg attributes out of lasso setup and prevent rerenders/writes * begin playing with transform matrix * first solution for camera interaction * create transform using nested groups * semi-working method using nested groups with transforms * inversely scale text * properly do final transform * cleanup dead / test code * reinstate original functionality * breakout centroid labels labels into separate component * default toggle on for testing * separate lasso and centroid layers * remove unnecessary attributes, working hover * dilation on label hover * fix dilation on scatterplot * add dilation on label hover * break overlay into separate component * make overlay agnostic to children * move label mouse actions to centroidlabels component, add overlay state * remove lasso on switch to camera * disallow user selection * fix reducer * fix subset with continuous color error * reset labels on color by continuous * revert centroids on by default * refactor for nested restructuring * remove update checking * remove unused method * readd deleted hover delay * remove old centroid setup * remove centroid from undoable * cleanup dead code * remove dead code * rollback unnecessary changes * begin adding annotation functionality * add annotation functionality * add reset and undo functionality * change centroids on layout change * don't create label for unassigned * add comment pointing out POI for performance * touch up matrix transform comment * add comment explaining coordinate space and children's assumed space * remove dead code * switch to pure component * connect centroidLabels to redux * clean up camera check and null result * tool tip change * rename centroid toggle and the like * fix the misalignment of buttons, also make blueprint use consistent * fix comment spelling mistakes * introduce variable for cleaner logic expressions and state assignment * alter tooltip text to back color by interaction * remove manual iterator manipulation for forEach() * remove debounce * nit fix * tooltip wording fix * lint

This update does not change any behavior. The simply moves code around into a new directory structure. Later patches will add new capabilities.

Separate annotations from the scanpy_engine. This introduces a new class called AnnotationsLocalFile, which stores annotations on the local filesystem in csv format. When we support different ways to store annotations, those should now be easier to add in. Also, separating out the annotation handling from scanpy_engine will make it easier to add in new data formats like tiledb. There is still plenty of cleanup that can be done regarding the interactions of the rest, adata format, and annotations.

This is a first commit with tiledb. Large portions where adapted from the cxg-tiledb prototype. The app_single now works with both h5ad or tiledb. Parts of the server still needs to be cleanup/refactored, such as diffexp, configuration, annotations, and mabye others.

Added common code for: diffexp layout_to_fbs_matrix data_frame_to_fbs_matrix Added tiledb to the test_api unit test

As part of this merge, I reorganized the ontology code into the Annotations class.

introduce a new class to hold configuration data "AppConfig". Use this to store configuration data and generate results for the config endpoint.

introduce a new class to hold configuration data "AppConfig". Use this to store configuration data and generate results for the config endpoint. follow on to the previous commit, with additional refactoring with the data_engines, and annotations.

You can continue to launch the server that serves a single dataset as normal: cellxgene launch <dataset> ... However, now you can launch a server that can server multiple datasets like this: cellxgene launch-multi <base> Where base can be a directory name, and whatever else the data engines support. For example, the tiledb engine will support s3:// as well. This serve can simultaneously server many different datasets. Point your brower at http://host:port/<dataset> There are still details on packaging and launching the multi server on a cloud environment that need to be worked out.

The lock files are necessary

There seems to be an incompatibility with numpy during the travis build on github for python 3.7. I think this should fix it.

bkmartinjr

Lets use ".cxg" rather than ".tdb" to indicate a cellxgene object collection.

server/data_common/utils.py

server/data_common/driver.py

server/data_tiledb/tiledb_engine.py

server/common/rest.py

server/data_common/driver.py

server/data_tiledb/tiledb_engine.py

server/app_multi/app.py

server/common/errors.py

server/data_scanpy/scanpy_adaptor.py

server/data_common/matrix_loader.py

bkmartinjr

Approval with a couple of notes/suggestions:

there are some naming & small nits I would like to see cleaned up before merge (and I suggest some further smoke testing).
I still don't think we have the Annotation interface quite right, but lets work on that in the future when we better understand the requirements of alternative Annotations back-ends.

Overall, the cleanup and improvement in adapter abstraction is a major win. Really nice work.

And a few other minor renaming issues

sidneymbell

This is awesome work! Much easier to navigate the codebase, great job.

In my testing, I found several bugs with launch:

--max-category-items is not working
--data-root is in the error message for a missing file, but is not defined anywhere
--disable-diffexp is not working
running --var-names works as expected, but running --obs-names foo --experimental-annotations does not yield a CSV with indices from column foo as expected
Unpredictable handling of values when loading annotations from file. In some cases, all missing values were transformed into null instead of showing up as unassigned. In others, I got this mysterious behavior:

LMK if you have questions on these.

I'll test prepare later today or tomorrow.

sidneymbell · 2020-02-18T21:14:10Z

Also, please go through the docs for launch and contributing to ensure there are no changes that need to be made. Thanks.

bkmartinjr · 2020-02-18T21:57:27Z

Also, please go through the docs for launch and contributing to ensure there are no changes that need to be made. Thanks.

our goal for this PR is no changes from end-user viewpoint. So in theory, the docs are right if there are differences.

sidneymbell · 2020-02-18T22:52:07Z

When trying to run cellxgene prepare, I get the following error:
[1] 21324 segmentation fault cellxgene prepare pbmc3k_raw.h5ad

This occurs with every combination of parameters I've tried thus far (including with no parameters).

Using the vanilla pbmc3k dataset from scanpy (will slack you a link).

bmccandless · 2020-02-18T23:13:19Z

Hi Sidney, I'm not seeing the segfault. I get a "Success!" when I run it. Perhaps make sure everything is up to date if you haven't already. pip install -r server/requirements-prepare.txt If that doesn't help, then I can stop by and take a look on your laptop if you have some time.

…

On Tue, Feb 18, 2020 at 2:52 PM Sidney Bell ***@***.***> wrote: When trying to run cellxgene prepare, I get the following error: [1] 21324 segmentation fault cellxgene prepare pbmc3k_raw.h5ad This occurs with every combination of parameters I've tried thus far (including with no parameters). Using the vanilla pbmc3k dataset from scanpy (will slack you a link). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1140?email_source=notifications&email_token=AB325PXITCLEFVPXSXLWZMDRDRRBPA5CNFSM4KPLLMIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMFVE7I#issuecomment-587944573>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB325PTJAKF5TUNUNK2B7ATRDRRBPANCNFSM4KPLLMIA> .

-- Brian McCandless

This enables the load of h5ad files from s3; and list the contents of an s3 bucket for generating the demo index page when dataroot is used.

- Fixed disable-diffexp and max-category-items in the config. - updated error message to remove reference to "dataroot" - fixed user supplied obs_names and annotations

bmccandless · 2020-02-18T23:39:39Z

This is awesome work! Much easier to navigate the codebase, great job.

In my testing, I found several bugs with launch:

--max-category-items is not working

--data-root is in the error message for a missing file, but is not defined anywhere

--disable-diffexp is not working

running --var-names works as expected, but running --obs-names foo --experimental-annotations does not yield a CSV with indices from column foo as expected

Unpredictable handling of values when loading annotations from file. In some cases, all missing values were transformed into null instead of showing up as unassigned. In others, I got this mysterious behavior:

LMK if you have questions on these.

I'll test prepare later today or tomorrow.

These should all be fixed now. The 5th issue may have been due to the bug identified in the 4th issue. If you read in an annotations file that has incorrect indexes, then you will get the "blank" entry like you observed.

…into refactor

The e2e testing revealed a bug with the schema endpoint in the server refactor.

…into refactor

mweiden · 2020-02-20T00:15:04Z

Congrats @bmccandless! 🎉

sidneymbell · 2020-02-21T19:27:54Z

@bmccandless -- Did you do a full test of prepare, with all the parameters? If so, great. Otherwise, this should not go in a release until I do so. I wasn't notified you planned to merge.

bmccandless · 2020-02-21T19:44:50Z

Hi @sidneymbell bell, no I haven't don't a full test of prepare yet, although I did smoke test it. Agreed, we need to full test before next release. The server refactor was holding up other changes for the server, so we (Bruce and I) decided to merge it in with the idea of doing further testing afterwards. Getting it in early has the advantage that it will get more soak time before the next release.

sidneymbell · 2020-02-21T19:52:29Z

@bmccandless -- thank you for informing me. I can now allocate some time to test prepare again. Please note that this blocks release 0.15 until such testing is completed! (cc @bkmartinjr)

Brian McCandless and others added 15 commits January 17, 2020 16:43

refactor server, part 1 (#1116)

3a84ddc

This update does not change any behavior. The simply moves code around into a new directory structure. Later patches will add new capabilities.

pin anndata to 0.6.22post1 (#1122)

422b6d1

refactor server, part 1 (#1116)

e5eecad

This update does not change any behavior. The simply moves code around into a new directory structure. Later patches will add new capabilities.

remove backend native prototype (#1104)

c4d0d1e

server refactor, code reuse between scanpy and tiledb (#1116)

35d826b

Added common code for: diffexp layout_to_fbs_matrix data_frame_to_fbs_matrix Added tiledb to the test_api unit test

Merge branch 'master' into refactor

209749d

As part of this merge, I reorganized the ontology code into the Annotations class.

refactor server, simplify config and schema

3b192ff

introduce a new class to hold configuration data "AppConfig". Use this to store configuration data and generate results for the config endpoint.

add missing tiledb files for the example dataset (#1106)

df7eb5c

The lock files are necessary

server refactor, fix annotations for tiledb engine (#1116)

fbbe16a

server refactor, bump numpy version (#1116)

15619db

There seems to be an incompatibility with numpy during the travis build on github for python 3.7. I think this should fix it.

bkmartinjr self-requested a review February 3, 2020 20:56