Skip to content

CahanLab/xcell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

133 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

XCell

Interactive web application for exploring and analyzing scRNA-seq and spatial transcriptomics data. Load an h5ad, 10x Genomics h5, Seurat .rds file, 10x CellRanger matrix folder, or prefixed 10x file trio from GEO, visualize cells on a scatter plot, run Scanpy analysis pipelines, and explore results β€” all from your browser.

Screenshot

Installation

XCell uses pixi to manage its environment. A single pixi install provisions the exact Python and Node versions plus every dependency β€” no manual venv, no Node-version juggling, no version troubleshooting.

If you've never installed software from GitHub before, follow every step below in order. Anything in a code block is meant to be pasted into a terminal:

  • macOS β€” open the Terminal app (⌘+Space, type "Terminal", press Enter).
  • Linux β€” open your terminal emulator (GNOME Terminal, Konsole, etc.).
  • Windows β€” open PowerShell (Start menu β†’ type "PowerShell" β†’ Enter).

1. Install Git (once per machine)

Git is the tool that downloads the source code from GitHub.

  • macOS β€” run git --version. If Git isn't installed, macOS will prompt you to install the Command Line Tools; click Install and wait for it to finish.
  • Linux (Debian/Ubuntu) β€” sudo apt-get install git
  • Linux (Fedora) β€” sudo dnf install git
  • Windows β€” download and run the installer from https://git-scm.com/download/win, accepting the defaults.

Verify with:

git --version

Prefer not to use Git? You can also click the green Code button at https://github.com/cahanlab/xcell, choose Download ZIP, then unzip it anywhere on your machine. Skip ahead to step 3.

2. Download XCell from GitHub

Pick a folder where you'd like XCell to live (your home directory is fine) and clone the repository into it:

cd ~                                              # or wherever you want the xcell/ folder created
git clone https://github.com/cahanlab/xcell.git
cd xcell

This creates an xcell/ directory containing the source code. The final cd xcell puts your terminal inside that directory β€” every command from here on must be run from there.

3. Install pixi (once per machine)

pixi is what installs Python, Node, and every project dependency in one shot.

curl -fsSL https://pixi.sh/install.sh | bash      # macOS / Linux
# Windows (PowerShell):  iwr -useb https://pixi.sh/install.ps1 | iex

pixi is a single self-contained binary. It does not require β€” or conflict with β€” an existing conda installation. Close and reopen your terminal after the install so pixi is on PATH, then cd xcell again. Verify with:

pixi --version

4. Set up the project

From inside the xcell/ directory:

pixi install      # creates ./.pixi/ with Python, Node, and all dependencies

This reads pixi.lock, so every platform gets identical, reproducible versions. The first run downloads several hundred MB and can take a few minutes β€” that's normal. You only do this once (or after pulling updates).

5. Launch

XCell runs as two processes: a Python backend and a JavaScript frontend. You'll need two terminal windows, both cd'd into the xcell/ directory.

In the first terminal:

pixi run backend  # FastAPI on http://localhost:8000

In the second terminal:

pixi run dev      # Vite dev server on http://localhost:5173 (installs frontend deps on first run)

Wait until the second terminal prints something like Local: http://localhost:5173/, then open http://localhost:5173 in your browser. Leave both terminals running while you use XCell; press Ctrl+C in each one to stop the servers when done.

6. (Optional) Load your own data

A bundled toy dataset (toy_spatial.h5ad) loads automatically if no data path is specified. To load your own data, set the XCELL_DATA_PATH environment variable when starting the backend:

XCELL_DATA_PATH=/path/to/your/data.h5ad pixi run backend  # also supports .h5 and .rds

Updating to the latest version

From inside the xcell/ directory:

git pull          # fetch the latest code
pixi install      # refresh dependencies if they changed

Then restart the two pixi run commands.

Loading .rds files is optional and needs R with the Seurat and SeuratDisk packages installed separately β€” SeuratDisk is not available as a conda package.

Not using pixi? XCell still installs the classic way (pip install -e backend in a Python 3.10+ venv, npm install in frontend/ on Node 18+). pixi just removes the version-matching guesswork.

Getting Started with Toy Data

The included test_data/toy_spatial.h5ad dataset is a small spatial transcriptomics dataset for exploring XCell's features. Here's a step-by-step walkthrough:

1. Explore the Embedding

  • The center panel shows cells as points at their embedding coordinates (spatial, UMAP, PCA, …). The tab is labeled Embedding; if multiple embeddings are available, switch via the in-plot Embedding dropdown.
  • Pan by clicking and dragging
  • Zoom with scroll wheel
  • Zoom/pan are preserved across in-place data changes (cell delete, filter, normalize, etc.). The camera only re-centers when you explicitly switch embeddings.

2. Color by Metadata

  • Open Cell Manager (left panel)
  • Select a metadata column to color cells by that annotation

3. Select Cells

  • Click the Select button in the toolbar (use the dropdown arrow to choose between Lasso and Polygon tools)
    • Lasso: click and drag to draw a freehand selection
    • Polygon: click to add vertices, double-click to close and select cells inside
  • Hold Shift while selecting to add to the existing selection
  • Checkboxes in the Cell Manager also select/deselect cells by category
  • Rename a category label by double-clicking the label in the expanded category list. Press Enter to commit (or Escape to cancel). Works on Leiden clusters, Contourize results, user annotations β€” any categorical metadata.
  • Merge two or more labels by clicking the β‹― menu in a column header and choosing Merge labels…. Pick the labels to merge, type a new name (or reuse an existing one to fold them in), then click Merge.
  • Selected cells can be masked or deleted

Adjusting the Embedding (optional)

The Adjust toolbar dropdown has three sections:

  • Rotate β€” enter Rotate mode then drag inside the plot to rotate around the data centroid. A live angle badge and a faint orange ring at the pivot show what's happening. Hold Shift to snap to 15Β° increments. The bottom-of-viewport toolbar gives Β±90Β° quick buttons and a precise degree input (Enter to apply).
  • Quilt β€” lasso a cell subset, then drag to translate it (or Shift+drag to rotate it) β€” for stitching together adjacent tissue sections. Arrow keys nudge the selection (Shift+arrow for 10Γ— larger step). Press Ctrl/Cmd+Z (or click "Undo") to revert the last quilt transform.
  • Flip β€” one-shot actions: Flip Horizontal mirrors the embedding left↔right (about the y-axis), Flip Vertical mirrors top↔bottom (about the x-axis). If you're in Quilt mode with cells selected, the flip applies only to those cells.

All adjustments persist on the backend and are saved on h5ad export.

4. Run Preprocessing

  • Open the Scanpy modal (top toolbar)
  • Go to Preprocessing and run in order:
    1. Normalize Total β€” normalize counts per cell
    2. Log1p β€” log-transform the data
    3. Highly Variable Genes β€” identify informative genes

5. Run Cell Analysis

  • In the Scanpy modal, go to Cell Analysis and run in order:

    1. PCA β€” reduce dimensionality
    2. PCA Loadings (optional) β€” scan the top-loading genes on each side of every PC (hover a gene to see its exact loading). If you spot PCs dominated by technical signal (cell cycle, mitochondrial genes, etc.), check them and click Create PC subset to persist a derived embedding (e.g. X_pca_noPC2_5).
    3. Neighbors β€” build cell neighborhood graph (requires PCA). If you created derived subsets in step 2, pick one from the PC source dropdown β€” UMAP and Leiden inherit the choice automatically through the neighbors graph.
    4. UMAP β€” compute 2D embedding (requires Neighbors)
    5. Leiden β€” cluster cells (requires Neighbors)

    Re-running PCA clears all derived PC subsets (with a toast) since their column indices refer to the previous eigenvectors.

6. View Clustering Results

  • In Cell Manager, select the leiden column to color by cluster
  • Switch the embedding to X_umap to see the UMAP layout

7. Color by Gene Expression

  • Open Gene Manager (right panel)
  • If the dataset has alternative gene identifier columns (e.g., gene symbols alongside Ensembl IDs), use the Gene IDs dropdown at the top of the panel to switch
  • Search or browse genes
  • Click a gene to color cells by its expression

Gene Mask

To scope the Gene Panel to a relevant gene universe, click the β‹― button in the Genes panel header and choose Gene mask…. The modal lists all boolean columns in your dataset's .var (for example, highly_variable after running Highly Variable Genes, or spatially_variable after spatial autocorrelation). For each column, choose:

  • Off β€” ignore this column
  • Keep β€” include genes where this column is True
  • Hide β€” exclude genes where this column is True

When you have multiple Keep columns, choose whether to match ANY (union) or ALL (intersection). Hide columns always combine as a union.

The mask applies to the gene browse list, gene search, expanded gene set rows, and gene set score aggregation used for display coloring. It does not apply to analysis operations (Diff Exp, Marker Genes, Gene PCA, etc.) β€” those have their own gene subset dropdowns. The mask is per-dataset and session-only; reloading the page clears it.

8. Gene Sets

  • Create gene sets manually in Gene Manager
  • Import gene lists from files

Curating gene sets into folders

The Manual category at the top of the Gene Panel is the home for gene sets you create by hand. Click + πŸ“ to create a named folder (e.g. "Fig 3 markers"). Inside a folder, click + to add a new empty set, or drag an existing top-level set onto the folder row to move it in. Drag a set back onto the thin strip above the first folder to move it out. Drag sets within the same container to reorder them.

Each gene set and folder row has a β‹― button with secondary actions. On a gene set row, that's where you find Pin and Cluster genes. On a manual folder row, that's where you find Pin and Export (JSON/GMT/CSV).

Use the Pin/Unpin option in the β‹― menu on any set or folder to float it to the top of its container. Pinning works in every category β€” including auto-generated ones β€” and survives moving a set between folders.

The Export β–Έ option in the β‹― menu on any manual folder lets you export just that folder's gene sets to JSON, GMT, or CSV. Filename defaults to the sanitized folder name. JSON round-trips via the existing Import modal.

Use the πŸ‘ button on a category header to hide a whole category from view (useful when an analysis has filled Gene Clusters or Differential Expression with results you're done with). A N hidden β–Έ footer appears at the bottom of the Gene Panel β€” click it and then Unhide to bring a category back.

Tip: double-click any gene set name or manual folder name to rename it inline.

Sub-clustering a gene set

Any gene set with at least 4 genes can be sub-clustered by expression pattern. Click the β‹― button on a gene set row and choose Cluster genes…. Pick a method (Hierarchical or K-means), a number of clusters K (default 3), and a cell context ("All cells", "Current selection" if you've lasso-picked some cells, or "Annotation category" to restrict to specific categorical values in a .obs column). Clicking Run creates a new folder in Gene Clusters named after the source set, containing one gene set per cluster. Re-running with different K or a different cell context appends another folder so you can compare runs side by side.

Selecting cells by expression threshold

You can select cells based on a gene's expression or a gene set score without needing to eyeball the scatter plot:

  1. In the Gene Panel, click the β‹― menu on any gene row or gene set row and choose Select cells….
  2. The modal opens and the scatter plot switches to expression coloring for that source. An interactive histogram of the values is shown.
  3. Pick a threshold mode (Above, Below, or Between) and drag the red cutoff line(s). The match counter updates live.
  4. Choose an action:
    • Update selection replaces, adds to, or intersects with your current lasso selection.
    • Label cells creates a new annotation column with high/low labels for the cells in the chosen context (current selection or all cells). On success, click Open Diff Exp β–Έ to immediately run differential expression between the two groups.

Typical workflow for "find DEGs by expression state in a region": lasso a region β†’ β‹― β†’ Select cells… on a gene β†’ drag the threshold β†’ Label cells β†’ Open Diff Exp.

9. Compare Cell Groups

  • Open the Analyze modal (top toolbar) β†’ Cell Analysis β†’ Compare Cells
  • Select an .obs column (e.g., leiden) from the dropdown
  • Check 2 or more groups to compare:
    • 2 checked β†’ pairwise differential expression
    • 3+ checked β†’ one-vs-rest marker gene analysis
  • Set Top N genes and click Run
  • You can also use lasso selection: select cells β†’ Set as Group 1 / Set as Group 2 β†’ click Compare in the comparison bar

10. Trajectory Analysis

  • Draw lines on the scatter plot
  • Click the gear icon on a shape in the Shapes panel to open Line Tools
  • Under Gene Association, configure:
    • Test against: position along line or distance from line
    • Gene subset: filter to highly variable genes or other boolean columns
    • Spline knots: number of interior knots for the B-spline model (default 5; higher = more flexible fit)
    • FDR: significance threshold (default 0.05)
    • Max genes/direction (or /module when clustering is on): cap on genes returned
    • Cluster genes into modules (default off): when checked, significant genes are grouped by expression profile shape (increasing, decreasing, peak, trough, complex); when unchecked, only positive/negative lists are returned
  • Click Find Associated Genes to run the analysis
  • In the results modal, use the Filters bar to refine results interactively: adjust min RΒ², min amplitude, max FDR, or toggle pattern types (increasing, decreasing, peak, trough, complex)
  • Click Add to Gene Sets in the results modal to save the genes β€” each run creates its own folder in the Line Association category of the Gene Panel (one set per module if clustering is on, or a single combined Associated genes set if clustering is off)
  • Click Download CSV in the results modal to export stats (gene, f_stat, pval, fdr, r_squared, amplitude, direction) for every gene tested β€” a ranked-list suitable for GSEA or other external analyses

Multi-section / replicate analysis

  • Draw a line on each tissue section representing the same biological axis
  • For each line, select cells (via lasso or clicking a category value in the Cells panel) and click + to associate them with the line
  • Check the lines to include using the checkboxes that appear on lines with projected cells
  • Click Find Associated Genes in the action bar
  • In the multi-line modal, toggle direction per line if needed (arrow button) and set analysis parameters
  • Results pool cells across all lines for a single, higher-powered analysis

Combine neighbor graphs for spatially-aware clustering

  • After computing both Neighbors (Cell Analysis) and Spatial Neighbors (Spatial Analysis), open Analyze β†’ Cell Analysis β†’ Combine Neighbors
  • Select two or more graphs and set their weights (default: equal weights; weights are normalized to sum to 1)
  • Click Combine graphs β€” the combined graph becomes the default connectivities slot
  • Run Leiden (or UMAP) afterward and clustering/embedding will reflect both graphs, encouraging spatially neighboring cells to cluster together when the spatial graph is weighted in

11. Run Gene Analysis

  • In the Scanpy modal, go to Gene Analysis:
    1. Build Gene Graph β€” compute gene-gene similarity
    2. Cluster Genes β€” group genes by expression pattern

12. Spatial Contouring

  • Select genes in the Gene Panel (click individual genes or use a gene set)
  • Open the Scanpy modal, go to Spatial Analysis > Contourize
  • Adjust smoothing sigma, contour levels, and grid resolution as needed
  • Click Run β€” a new categorical column appears in the Cell Panel
  • Color cells by the contour column to visualize spatial expression zones

Combining Spatial Sections

To compare the same tissue across timepoints (or any cross-sample analysis), you can load 2+ spatial-transcriptomics h5ads into one dataset:

  • Click File β†’ Combine spatial sections… in the toolbar
  • In the load modal, switch the mode toggle to Combine sections (already set when you arrive via the menu)
  • Click .h5ad files in the browser to add them to the list β€” each file gets an editable label (defaults to the filename stem)
  • Adjust the gap (% of mean section width) and the slot to load into
  • Click Combine N sections β€” sections are placed left-to-right along the spatial x-axis with the configured gap; a new sample categorical .obs column tags each cell with its source file label
  • The combined dataset behaves like any other β€” color by sample to see the layout, run Compare Cells across timepoints, etc.

Notes:

  • Genes = intersection of the input files' var indices. Use Gene IDs swap in the Gene Panel beforehand if your files use different identifier columns.
  • v1 supports .h5ad only. For .rds / 10x files, load them once via single-file Load and export as h5ad first.
  • Per-file UMAPs/PCAs are dropped β€” re-run PCA/UMAP via the Scanpy modal on the combined data.

13. Load a Second Dataset

  • Click Load in the toolbar β€” the modal shows a sidebar with quick-access locations (Home, Desktop, Documents, Downloads) and recently loaded files, plus breadcrumb path navigation for clicking any ancestor directory
  • Choose Secondary from the "Load into" dropdown
  • Browse or enter the path to a second h5ad, h5, rds file, 10x matrix folder, or prefixed 10x file trio and click Load
  • A dataset switcher dropdown appears in the header β€” switch between Primary and Secondary to compare datasets
  • Click the Split button to view both datasets side by side
  • Click on either plot to make it the active dataset β€” the Cell and Gene panels update accordingly
  • Each plot has its own embedding selector, legend, and independent pan/zoom

14. Export Results

  • Click Export in the toolbar to download annotations and results

Customizing default parameters

xcell ships with hardcoded defaults for every form in the Scanpy modal, the Line Association dialog, and the Display Settings panel (e.g. filter_cells β†’ min genes = 25, point size = 3). To change these without touching code, drop a YAML (or JSON) file at ~/.xcell/config.yaml β€” or set XCELL_CONFIG_PATH to point somewhere else. A sample is included at docs/config.example.yaml.

Shape is a nested mapping matching the form namespace β€” only include keys you want to override, everything else falls back to the built-in default:

scanpy:
  filter_cells:
    min_genes: 15       # was 25
  neighbors:
    n_neighbors: 20     # was 15

line_association:
  fdr_threshold: 0.1    # was 0.05
  cluster_genes: true   # was false

display:
  point_size: 4               # was 3
  point_opacity: 0.7          # was 0.85
  background_color: '#000000' # was '#1a1a2e'
  color_scale: magma          # was viridis
  clip_percentile: 0.5        # was 1.0
  gene_set_aggregation: median # was mean

A backend restart is required to pick up edits. Verify what was loaded by hitting GET /api/config/defaults; unknown keys are silently ignored. Display defaults are applied to every dataset slot at startup and re-applied on each fresh dataset load β€” you can still tweak any value in the Display Settings panel for the current session.

Session persistence

Most changes you make in a session survive on the backend process: deleted cells, transformed embeddings, computed PCA / neighbors / UMAP / Leiden, drawn lines, and β€” as of this version β€” your gene sets (categories, folders, individual sets). If the browser tab accidentally reloads, the gene panel is rehydrated from the server. Restarting the backend still clears everything; persist important sets via the Gene Panel export controls before shutting down.

Features

  • Interactive scatter plot β€” deck.gl-powered visualization with pan, zoom, lasso selection
  • Cell Manager β€” browse/color by metadata, mask/delete cells
  • Gene Manager β€” search genes, create gene sets, import gene lists
  • Scanpy integration β€” run preprocessing, cell analysis (PCA, Neighbors, UMAP, Leiden), gene analysis, spatial analysis (contourize), and differential expression directly in the browser. Long-running operations (gene neighbors, spatial neighbors, spatial autocorrelation, contourize, line gene association) can be cancelled mid-run without corrupting session data.
  • Trajectory analysis β€” draw lines and associate genes with spatial trajectories
  • Quilt mode β€” lasso and rearrange tissue pieces: drag to translate, shift+drag to rotate, flip to reflect selected cell subsets
  • Display settings β€” adjust point size, opacity, colormaps, bivariate coloring, and an optional coordinate grid behind the plot (with data-coordinate tick labels along the bottom/left axes for visual reference and troubleshooting)
  • Highlight overlay β€” stack one or more colored layers on top of the active coloring without replacing it. Each layer is either a gene-set expression threshold (above / below / between, with a draggable histogram cutoff) or a frozen cell-set mask (current selection or category value). Useful for marking e.g. epithelium in green while keeping bivariate coloring on the rest.
  • Figure builder β€” compose multi-panel publication figures from a cell selection (or the full dataset). Each panel renders the same cells colored independently (single gene, gene set, bivariate two-gene-set, or metadata column), with its own color scale and title. Per-figure point size, opacity, background, and optional NΓ—N grid overlay are shared so panels stay visually consistent. Per-panel "show highlight layers" toggle blends the dataset's current Highlight overlays into the panel. Shared pan/zoom keeps panels aligned. Export to PNG at 1×–4Γ— DPI from the new Figure tab.
  • Multi-dataset support β€” load two datasets (h5ad, h5, rds, 10x matrix folders, or prefixed 10x file trios from GEO), switch between them, or view side by side in split mode
  • Export β€” download annotations and analysis results

Project Structure

xcell/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ xcell/
β”‚   β”‚   β”œβ”€β”€ main.py          # FastAPI app entry point
β”‚   β”‚   β”œβ”€β”€ adaptor.py       # DataAdaptor class (wraps AnnData)
β”‚   β”‚   β”œβ”€β”€ diffexp.py       # Differential expression
β”‚   β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”‚   └── toy_spatial.h5ad  # Bundled toy dataset
β”‚   β”‚   └── api/
β”‚   β”‚       └── routes.py    # REST API endpoints
β”‚   └── pyproject.toml       # Python dependencies
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.tsx           # Main app component
β”‚   β”‚   β”œβ”€β”€ store.ts          # Zustand state management
β”‚   β”‚   β”œβ”€β”€ main.tsx          # Entry point
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ ScatterPlot.tsx        # deck.gl scatter plot
β”‚   β”‚   β”‚   β”œβ”€β”€ CellPanel.tsx          # Cell metadata manager
β”‚   β”‚   β”‚   β”œβ”€β”€ GenePanel.tsx          # Gene browser / gene sets
β”‚   β”‚   β”‚   β”œβ”€β”€ ScanpyModal.tsx        # Scanpy analysis pipeline UI
β”‚   β”‚   β”‚   β”œβ”€β”€ DiffExpModal.tsx       # Differential expression
β”‚   β”‚   β”‚   β”œβ”€β”€ LineAssociationModal.tsx # Trajectory analysis
β”‚   β”‚   β”‚   β”œβ”€β”€ DisplaySettings.tsx    # Visualization settings
β”‚   β”‚   β”‚   β”œβ”€β”€ ShapeManager.tsx       # Shape/selection tools
β”‚   β”‚   β”‚   └── ImportModal.tsx        # Gene list import
β”‚   β”‚   └── hooks/
β”‚   β”‚       └── useData.ts    # Data fetching hooks
β”‚   β”œβ”€β”€ package.json          # Node dependencies
β”‚   └── vite.config.ts        # Vite configuration
β”œβ”€β”€ README.md
test_data/
β”œβ”€β”€ toy_spatial.h5ad          # Toy dataset for testing
└── generate_toy.py           # Script to regenerate toy data

Architecture

  • Backend: FastAPI + AnnData + Scanpy, serving data and running analysis via REST API
  • Frontend: React + TypeScript + Vite + deck.gl + Zustand for state management
  • Data flow: h5ad file β†’ DataAdaptor β†’ REST API β†’ React hooks β†’ deck.gl visualization
  • API docs: Available at http://localhost:8000/docs when the backend is running

About

web app for analysis and visualization of spatial transcriptomics (ST) and single cell RNA-seq (scRNA-seq) data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors