Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server refactor #1140

Merged
merged 34 commits into from Feb 19, 2020
Merged

server refactor #1140

merged 34 commits into from Feb 19, 2020

Conversation

bmccandless
Copy link
Contributor

@bmccandless bmccandless commented Feb 3, 2020

This PR contains a refactoring to make adding new features easier.

The new features include supporting the tiledb format, and the multi dataset application.

The refactoring includes

  • Simplifying the directory structure and files.
  • a class structure to handle annotations (currently one type: AnnotationsLocalFile).
  • a class to handle application configuration
  • a class structure to handle matrix data (currently AnndataAdaptor and CxgAdaptor). CxgAdaptor uses tiledb.
  • Algorithms that were previously dependent on the scanpy anndata object are now generalized to work with an abstract interface.

The multi dataset option is not fully supported yet, and so the option to use it is hidden.
Use "cli launch --dataroot ..."
To access this feature.

All combinations of app single dataset/ app multi dataset and AnndataAdaptor/CxgAdaptor work with all the features, such as annotations, ontologies, diffexp.

Brian McCandless and others added 15 commits January 17, 2020 16:43
This update does not change any behavior.
The simply moves code around into a new directory structure.
Later patches will add new capabilities.
* refactor reducer to no longer support hover state and hold many labels

* refactor to generate centroidCoordinates for all values of a category

* create hash for function and memoize export

* create button to display all labels for a category

* clear state

* create label for each thing

* calculate on each value

* change to in place modification of map

* switch to for loop with iterator instead of forEach

* use map from centroidLabel instead of creating copy

* adapt for map

* utilize tarrays

* begin documentation

* disable centroids if in zoom mode

* clean up

* persist uncalc coordinates

* document

* clean up and document

* cleanup and document

* fix

* fix undefined labels and document changes

* fix first element skip

* fix conditional recalc

* rename centroidLabel -> centroidLabels

* break out dilation on hover to new reducer

* numerous styling changes for readability

* change centroid icon

* remove colorAccessor from parameters

* recalc centroids on world change

* make label toggle undoable

* remove unused import

* highlight labels on hover

* remove special characters from svg id

* lighten backdrop

* only generate new centroids if they pre-exist

* fix issue with spaces in catagorical value name

* add label buttons to menubar

* change reducer to use colorAccessor and have single toggle

* fix check to see if svg should be rendered

* move svg overlays onto a single svg layer

* dilate on label hover

* remove logs

* allow centroid to update along side regl renders

* allow actions to pass through svg if in zoom mode

* remove artifact from circle

* remove comment

* remove disabling of centroid button

* fix conditional map to screen

* make styling label conditions stricter

* prettier

* refactor onto master

* refactor computePointFlags() to use pointDilation store

* notify when viewport changes

* move svg attributes out of lasso setup and prevent rerenders/writes

* begin playing with transform matrix

* first solution for camera interaction

* create transform using nested groups

* semi-working method using nested groups with transforms

* inversely scale text

* properly do final transform

* cleanup dead / test code

* reinstate original functionality

* breakout centroid labels labels into separate component

* default toggle on for testing

* separate lasso and centroid layers

* remove unnecessary attributes, working hover

* dilation on label hover

* fix dilation on scatterplot

* add dilation on label hover

* break overlay into separate component

* make overlay agnostic to children

* move label mouse actions to centroidlabels component, add overlay state

* remove lasso on switch to camera

* disallow user selection

* fix reducer

* fix subset with continuous color error

* reset labels on color by continuous

* revert centroids on by default

* refactor for nested restructuring

* remove update checking

* remove unused method

* readd deleted hover delay

* remove old centroid setup

* remove centroid from undoable

* cleanup dead code

* remove dead code

* rollback unnecessary changes

* begin adding annotation functionality

* add annotation functionality

* add reset and undo functionality

* change centroids on layout change

* don't create label for unassigned

* add comment pointing out POI for performance

* touch up matrix transform comment

* add comment explaining coordinate space and children's assumed space

* remove dead code

* switch to pure component

* connect centroidLabels to redux

* clean up camera check and null result

* tool tip change

* rename centroid toggle and the like

* fix the misalignment of buttons, also make blueprint use consistent

* fix comment spelling mistakes

* introduce variable for cleaner logic expressions and state assignment

* alter tooltip text to back color by interaction

* remove manual iterator manipulation for forEach()

* remove debounce

* nit fix

* tooltip wording fix

* lint
This update does not change any behavior.
The simply moves code around into a new directory structure.
Later patches will add new capabilities.
Separate annotations from the scanpy_engine.

This introduces a new class called AnnotationsLocalFile, which stores
annotations on the local filesystem in csv format.  When we support different
ways to store annotations, those should now be easier to add in.  Also,
separating out the annotation handling from scanpy_engine will make it easier
to add in new data formats like tiledb.

There is still plenty of cleanup that can be done regarding the interactions of
the rest, adata format, and annotations.
This is a first commit with tiledb.
Large portions where adapted from the cxg-tiledb prototype.
The app_single now works with both h5ad or tiledb.
Parts of the server still needs to be cleanup/refactored, such as
diffexp, configuration, annotations, and mabye others.
Added common code for:
  diffexp
  layout_to_fbs_matrix
  data_frame_to_fbs_matrix

Added tiledb to the test_api unit test
As part of this merge, I reorganized the ontology code into
the Annotations class.
introduce a new class to hold configuration data "AppConfig".
Use this to store configuration data and generate results for
the config endpoint.
introduce a new class to hold configuration data "AppConfig".
Use this to store configuration data and generate results for
the config endpoint.

follow on to the previous commit, with additional refactoring
with the data_engines, and annotations.
You can continue to launch the server that serves a single dataset as normal:
  cellxgene launch <dataset> ...

However, now you can launch a server that can server multiple datasets like
this:
  cellxgene launch-multi <base>

Where base can be a directory name, and whatever else the
data engines support. For example, the tiledb engine will support s3://
as well.  This serve can simultaneously server many different datasets.
Point your brower at http://host:port/<dataset>

There are still details on packaging and launching the multi server on a cloud
environment that need to be worked out.
There seems to be an incompatibility with numpy during the travis build on
github for python 3.7.  I think this should fix it.
@bkmartinjr bkmartinjr self-requested a review February 3, 2020 20:56
Copy link
Contributor

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use ".cxg" rather than ".tdb" to indicate a cellxgene object collection.

server/common/rest.py Outdated Show resolved Hide resolved
server/common/rest.py Outdated Show resolved Hide resolved
server/app_multi/app.py Outdated Show resolved Hide resolved
server/common/errors.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval with a couple of notes/suggestions:

  • there are some naming & small nits I would like to see cleaned up before merge (and I suggest some further smoke testing).
  • I still don't think we have the Annotation interface quite right, but lets work on that in the future when we better understand the requirements of alternative Annotations back-ends.

Overall, the cleanup and improvement in adapter abstraction is a major win. Really nice work.

@sidneymbell sidneymbell self-requested a review February 18, 2020 20:27
Copy link
Contributor

@sidneymbell sidneymbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work! Much easier to navigate the codebase, great job.

In my testing, I found several bugs with launch:

  • --max-category-items is not working
  • --data-root is in the error message for a missing file, but is not defined anywhere
  • --disable-diffexp is not working
  • running --var-names works as expected, but running --obs-names foo --experimental-annotations does not yield a CSV with indices from column foo as expected
  • Unpredictable handling of values when loading annotations from file. In some cases, all missing values were transformed into null instead of showing up as unassigned. In others, I got this mysterious behavior:
    image

LMK if you have questions on these.

I'll test prepare later today or tomorrow.

@sidneymbell
Copy link
Contributor

Also, please go through the docs for launch and contributing to ensure there are no changes that need to be made. Thanks.

@bkmartinjr
Copy link
Contributor

Also, please go through the docs for launch and contributing to ensure there are no changes that need to be made. Thanks.

our goal for this PR is no changes from end-user viewpoint. So in theory, the docs are right if there are differences.

@sidneymbell
Copy link
Contributor

When trying to run cellxgene prepare, I get the following error:
[1] 21324 segmentation fault cellxgene prepare pbmc3k_raw.h5ad

This occurs with every combination of parameters I've tried thus far (including with no parameters).

Using the vanilla pbmc3k dataset from scanpy (will slack you a link).

@bmccandless
Copy link
Contributor Author

bmccandless commented Feb 18, 2020 via email

Brian McCandless added 2 commits February 18, 2020 15:33
This enables the load of h5ad files from s3; and list the contents of an s3
bucket for generating the demo index page when dataroot is used.
 - Fixed disable-diffexp and max-category-items in the config.
 - updated error message to remove reference to "dataroot"
 - fixed user supplied obs_names and annotations
@bmccandless
Copy link
Contributor Author

This is awesome work! Much easier to navigate the codebase, great job.

In my testing, I found several bugs with launch:

  • --max-category-items is not working
  • --data-root is in the error message for a missing file, but is not defined anywhere
  • --disable-diffexp is not working
  • running --var-names works as expected, but running --obs-names foo --experimental-annotations does not yield a CSV with indices from column foo as expected
  • Unpredictable handling of values when loading annotations from file. In some cases, all missing values were transformed into null instead of showing up as unassigned. In others, I got this mysterious behavior:
    image

LMK if you have questions on these.

I'll test prepare later today or tomorrow.

These should all be fixed now. The 5th issue may have been due to the bug identified in the 4th issue. If you read in an annotations file that has incorrect indexes, then you will get the "blank" entry like you observed.

Brian McCandless added 2 commits February 18, 2020 15:59
The e2e testing revealed a bug with the schema endpoint in
the server refactor.
@bmccandless bmccandless merged commit 907cc63 into master Feb 19, 2020
@mweiden mweiden deleted the bmccandless/server-refactor branch February 20, 2020 00:14
@mweiden
Copy link
Contributor

mweiden commented Feb 20, 2020

Congrats @bmccandless! 🎉

@sidneymbell
Copy link
Contributor

@bmccandless -- Did you do a full test of prepare, with all the parameters? If so, great. Otherwise, this should not go in a release until I do so. I wasn't notified you planned to merge.

@bmccandless
Copy link
Contributor Author

bmccandless commented Feb 21, 2020

Hi @sidneymbell bell, no I haven't don't a full test of prepare yet, although I did smoke test it. Agreed, we need to full test before next release. The server refactor was holding up other changes for the server, so we (Bruce and I) decided to merge it in with the idea of doing further testing afterwards. Getting it in early has the advantage that it will get more soak time before the next release.

@sidneymbell
Copy link
Contributor

@bmccandless -- thank you for informing me. I can now allocate some time to test prepare again. Please note that this blocks release 0.15 until such testing is completed! (cc @bkmartinjr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants