/ geosketch Public

Geometry-preserving random sampling

# brianhie/geosketch

Switch branches/tags
Nothing to show

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

## Files

Failed to load latest commit information.
Type
Name
Commit time

# Geometric sketching

## Overview

`geosketch` is a Python package that implements the geometric sketching algorithm described by Brian Hie, Hyunghoon Cho, Benjamin DeMeo, Bryan Bryson, and Bonnie Berger in "Geometric sketching compactly summarizes the single-cell transcriptomic landscape", Cell Systems (2019). This repository contains an example implementation of the algorithm as well as scripts necessary for reproducing the experiments in the paper.

## Installation

You should be able to install from PyPI:

``````pip install geosketch
``````

## API example usage

Parameter documentation for the geometric sketching `gs()` function is in the source code at the top of `geosketch/sketch.py`.

For an example of usage of `geosketch` in R using the `reticulate` library, see `example.R`. WARNING: The indices returned by `geosketch` are 0-indexed, but R uses 1-indexing, so the `one_indexed` parameter should be set to `TRUE` when called from R.

Here is example usage of `geosketch` in Python. First, put your data set into a matrix:

``````X = [ sparse or dense matrix, samples in rows, features in columns ]
``````

Then, compute the top PCs:

```# Compute PCs.
from fbpca import pca
U, s, Vt = pca(X, k=100) # E.g., 100 PCs.
X_dimred = U[:, :100] * s[:100]```

Now, you are ready to sketch!

```# Sketch.
from geosketch import gs
N = 20000 # Number of samples to obtain from the data set.
sketch_index = gs(X_dimred, N, replace=False)

X_sketch = X_dimred[sketch_index]```

## Examples

``````wget http://cb.csail.mit.edu/cb/geosketch/data.tar.gz
tar xvf data.tar.gz
``````

### Visualizing sketches of a mouse brain data set

We can visualize a large data set of cells from different regions of the mouse brain collected by Saunders et al. (2018).

To visualize the sketches obtained by geometric sketching and other baseline algorithms, download the data using the commands above and then run:

``````python bin/mouse_brain_visualize.py
``````

This will output PNG files to the top level directory visualizing different sketches produced by different algorithms, including geometric sketching.

## Algorithm implementation

For those interested, the algorithm implementation is available in the file `geosketch/sketch.py`.

## Questions

For questions, please use the GitHub Discussions forum. For bugs or other problems, please file an issue.

Geometry-preserving random sampling

## Releases 6

geosketch v1.2 Latest
Jan 26, 2021

## Packages 0

No packages published

•
•
•