Skip to content

Releases: Munfred/wormcells-data

packer2019taylor2020bendavid2021_scvi_model_v0.11.0

03 Jun 18:47
3e6b937
Compare
Choose a tag to compare

Model for scvi-tools v0.11.0 integrating three C. elegans datasets.

More information at https://wormbase.github.io/single-cell/

Short Name Total cells Method h5ad Summary Article/preprint Original Data Notes
Taylor 2020 100,955 10x v2/v3 Download at Caltech Data L4 larvae neurons selected via flow cytometry Molecular topography of an entire nervous system. GSE136049 CeNGEN website Shiny R app to explore the data
Ben-David 2021 55,508 10x v2 Download at Caltech Data L2 larvae Whole-organism mapping of the genetics of gene expression at cellular resolution biorxiv 2020. PRJNA658829 Gene count matrix was kindly provided by the authors on request
Packer 2019 89,701 10x v2 Download at Caltech Data Several timepoints of embryo development A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution Science 2019. GSE126954 VisCello app for data exploration

Taylor et al: Cengen 2020 data release 100955 cells

08 Jan 03:34
a67a0c9
Compare
Choose a tag to compare

Data from the Cengen 2020 preprint https://www.biorxiv.org/content/10.1101/2020.12.15.422897v1

These are the counts as outputted by cellranger without the soupX modifications. The 100955 barcodes that were labeled as cells were retained, including neuron and non neuron.


AnnData object with n_obs × n_vars = 100955 × 46911
    obs: 'dropbox_id', 'counts', 'experiment_code', 'cell_type', 'tissue'

print(adata.obs.head())
                           dropbox_id  counts experiment_code    cell_type  \
1806-ST-1-AAACCTGAGAGACGAA  1806-ST-1      65           Pan-1  Unannotated   
1806-ST-1-AAACCTGAGGTAAACT  1806-ST-1     367           Pan-1          AVF   
1806-ST-1-AAACCTGAGGTAGCCA  1806-ST-1    1792           Pan-1          AVH   
1806-ST-1-AAACCTGAGTAACCCT  1806-ST-1    1229           Pan-1          RIA   
1806-ST-1-AAACCTGAGTACGCGA  1806-ST-1    1401           Pan-1          AUA   

                                 tissue  
1806-ST-1-AAACCTGAGAGACGAA  Unannotated  
1806-ST-1-AAACCTGAGGTAAACT       Neuron  
1806-ST-1-AAACCTGAGGTAGCCA       Neuron  
1806-ST-1-AAACCTGAGTAACCCT       Neuron  
1806-ST-1-AAACCTGAGTACGCGA       Neuron  


print(adata.var.head())

bendavid2020

15 Sep 01:56
Compare
Choose a tag to compare

bendavid2020.h5ad

AnnData object with n_obs × n_vars = 55508 × 20138
    obs: 'experiment', 'neuronal_subtype', 'barcode', 'study', 'cell_class', 'cell_type'
    var: 'wbps_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_id', 'external_transcript_id', 'wormbase_locus', 'wormbase_gseq', 'gene_short_name', 'gene_name'

obs entries look like so:

                        experiment neuronal_subtype                cell_class  \
barcode                                                                         
F4_1_TGTAACGGTTAGCTAC-1       F4_1              nan                 Intestine   
F4_1_GGCAGTCCAGCCTATA-1       F4_1              nan                 Intestine   
F4_1_AAGTACCGTCATCCCT-1       F4_1              nan             Somatic Gonad   
F4_1_AAGATAGTCCCTCTAG-1       F4_1              nan                 Intestine   
F4_1_ACCAAACCAGCTGTAT-1       F4_1              nan  Pharynx and Arcade Cells   

                                        cell_type  
barcode                                            
F4_1_TGTAACGGTTAGCTAC-1                 Intestine  
F4_1_GGCAGTCCAGCCTATA-1                 Intestine  
F4_1_AAGTACCGTCATCCCT-1             Somatic Gonad  
F4_1_AAGATAGTCCCTCTAG-1                 Intestine  
F4_1_ACCAAACCAGCTGTAT-1  Pharynx and Arcade Cells

var entries look like so:

                  wbps_gene_id chromosome_name  start_position  end_position  \
gene_id                                                                        
WBGene00010957  WBGene00010957           MtDNA             113           549   
WBGene00010958  WBGene00010958           MtDNA             549           783   
WBGene00010959  WBGene00010959           MtDNA            1763          2635   
WBGene00010960  WBGene00010960           MtDNA            2634          3235   
WBGene00010961  WBGene00010961           MtDNA            3389          4269   

                strand external_gene_id external_transcript_id wormbase_locus  \
gene_id                                                                         
WBGene00010957       1           nduo-6               MTCE.3.1         nduo-6   
WBGene00010958       1           ndfl-4               MTCE.4.1         ndfl-4   
WBGene00010959       1           nduo-1              MTCE.11.1         nduo-1   
WBGene00010960       1            atp-6              MTCE.12.1          atp-6   
WBGene00010961       1           nduo-2              MTCE.16.1         nduo-2   

               wormbase_gseq gene_short_name gene_name  
gene_id                                                 
WBGene00010957        MTCE.3          nduo-6    nduo-6  
WBGene00010958        MTCE.4          ndfl-4    ndfl-4  
WBGene00010959       MTCE.11          nduo-1    nduo-1  
WBGene00010960       MTCE.12           atp-6     atp-6  
WBGene00010961       MTCE.16          nduo-2    nduo-2 

packer2019.h5ad

09 Mar 04:35
fc107a0
Compare
Choose a tag to compare

Packer 2019 C. elegans 10xv2 data

Original article:
A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution
https://science.sciencemag.org/content/365/6459/eaax1971.long

Data on GEO:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126954

the is annotated with the following obs (one example shown)

index                                   AAACCTGAGACAATAC-300.1.1
cell                                    AAACCTGAGACAATAC-300.1.1
n.umi                                                       1630
time.point                                           300_minutes
batch                                      Waterston_300_minutes
Size_Factor                                              1.02319
cell.type                                       Body_wall_muscle
cell.subtype                                      BWM_head_row_1
plot.cell.type                                    BWM_head_row_1
raw.embryo.time                                              360
embryo.time                                                  380
embryo.time.bin                                          330-390
raw.embryo.time.bin                                      330-390
lineage                                                 MSxpappp
passed_initial_QC_or_later_whitelisted                      True

eyal.h5ad

05 Sep 02:22
fc107a0
Compare
Choose a tag to compare

Data from the preprint Whole-organism mapping of the genetics of gene expression at cellular resolution by
Ben-David, James Boocock, Longhua Guo, Stefan Zdraljevic, Joshua S. Bloom, and Leonid Kruglyak

https://doi.org/10.1101/2020.08.23.263798
https://www.biorxiv.org/content/10.1101/2020.08.23.263798v1

AnnData object with n_obs × n_vars = 55508 × 20138
    obs: 'Batch', 'Size_Factor', 'cell_type', 'neuronal_subtype', 'total', 'barcode', 'doublet'
    var: 'wbps_gene_id', 'chromosome_name', 'start_position', 'end_position', 'strand', 'external_gene_id', 'external_transcript_id', 'wormbase_locus', 'wormbase_gseq', 'gene_short_name', 'gene_name'
adata.obs['Batch'].value_counts()
F4_5    11633
F4_4    11464
F4_2    11424
F4_1    11336
F4_3     9651
adata.obs['cell_type'].value_counts()
Hypodermis                    13219
Body Wall Muscle               9630
Intestine                      4859
Pharynx and Arcade Cells       2842
Seam Cells                     2717
Glia                           2034
Germline                       1373
Coelomocytes                   1151
Somatic Gonad                   951
VA                              872
DD_VD                           705
Pharyngeal Gland Cells          697
VB                              622
Excretory Gland                 586
Unknown                         528
AVK                             498
GLR                             486
Vulval Precursor Cells          471
DA                              465
Sex Myoblast                    463
Excretory Cells                 460
SIA_SIB                         333
XXX                             313
RMH                             267
Sphincter and Anal Muscles      261
ALM_PLM_AVM_PVM                 254
PVP                             254
Unknown_glut_2                  246
AIB                             240
RIC                             233
RIF                             224
AIZ                             203
AVL                             203
AVJ                             193
Unknown_touch                   181
ALN_PLN_SDQ                     179
M2                              178
RIA                             172
AQR_PQR_URX                     171
CAN                             171
AVH                             165
BDU                             163
IL2_DV                          162
AVF                             161
RME                             160
DVB                             159
PVQ                             159
Unknown_ACh_3                   156
AIN                             152
AFD                             148
AIA                             148
PVD                             145
AIM                             143
I5                              137
PDE                             137
PHB                             136
OLL_URY                         135
I2_I3                           131
ADA                             130
PHA                             130
MC                              123
RMD                             108
DVC                             107
ADE                             105
AVD_PVC                         104
Unknown_3                       102
RIM                             102
RMG                              98
AVG                              97
Unknown_2                        95
PVT                              95
ASI_ASJ                          92
RIG                              91
Unknown_1                        90
CEM                              85
URA                              85
MI                               83
AUA                              71
AWA                              70
Unknown_4                        70
LUA                              69
ADL                              69
IL1                              61
I1                               60
ALA                              57
PVR                              52
ASK                              45
AWC                              42
BAG                              40
AWB                              40
ASER                             35
OLQ                              34
M1                               31
ADF                              31
DVA                              24
ASH                              23
IL2_LR                           22
FLP                              21
ASG                              17

First 5 entries of adata.obs and adata.var:

print(adata.obs.head())
  Batch  Size_Factor                 cell_type neuronal_subtype   total  \
0  F4_1   102.532200                 Intestine              nan  104863   
1  F4_1    60.046012                 Intestine              nan   61411   
2  F4_1    60.384321             Somatic Gonad              nan   61757   
3  F4_1    51.692898                 Intestine              nan   52868   
4  F4_1    59.001750  Pharynx and Arcade Cells              nan   60343   

                   barcode  doublet  
0  F4_1_TGTAACGGTTAGCTAC-1    False  
1  F4_1_GGCAGTCCAGCCTATA-1    False  
2  F4_1_AAGTACCGTCATCCCT-1    False  
3  F4_1_AAGATAGTCCCTCTAG-1    False  
4  F4_1_ACCAAACCAGCTGTAT-1    False  

                  wbps_gene_id chromosome_name  start_position  end_position  \
wormbase_gene                                                                  
WBGene00010957  WBGene00010957           MtDNA             113           549   
WBGene00010958  WBGene00010958           MtDNA             549           783   
WBGene00010959  WBGene00010959           MtDNA            1763          2635   
WBGene00010960  WBGene00010960           MtDNA            2634          3235   
WBGene00010961  WBGene00010961           MtDNA            3389          4269   

                strand external_gene_id external_transcript_id wormbase_locus  \
wormbase_gene                                                                   
WBGene00010957       1           nduo-6               MTCE.3.1         nduo-6   
WBGene00010958       1           ndfl-4               MTCE.4.1         ndfl-4   
WBGene00010959       1           nduo-1              MTCE.11.1         nduo-1   
WBGene00010960       1            atp-6              MTCE.12.1          atp-6   
WBGene00010961       1           nduo-2              MTCE.16.1         nduo-2   

               wormbase_gseq gene_short_name gene_name  
wormbase_gene                                           
WBGene00010957        MTCE.3          nduo-6    nduo-6  
WBGene00010958        MTCE.4          ndfl-4    ndfl-4  
WBGene00010959       MTCE.11          nduo-1    nduo-1  
WBGene00010960       MTCE.12           atp-6     atp-6  
WBGene00010961       MTCE.16          nduo-2    nduo-2  

Packer 2019 Taylor2019 Cao2019 data wangle 2020-03-30

03 Apr 22:53
fc107a0
Compare
Choose a tag to compare

VAE trained on full data with scVI v0.6.1 (works on v0.6.3)
New data wrangle with packer labels to include cell_plot_type as cell_type

The wormcells-data-2020-03-30.h5ad anndata file is provided with the following entries:

AnnData object with n_obs × n_vars = 191138 × 22761 
    obs: 'barcode', 'cell_subtype', 'cell_type', 'embryo_time', 'embryo_time_bin', 'experiment', 'lineage', 'numi', 'passed_qc', 'raw_embryo_time', 'raw_embryo_time_bin', 'size_factor', 'study', 'time_point', 'tissue_type'
    var: 'gene_name', 'gene_description'

The first and last entries of the data for each study can be printed with this snippet

import anndata
import pandas as pd
adata = anndata.read('wormcells-data-2020-03-30.h5ad')
pd.concat([adata.obs[adata.obs['study'] == 'cao'].head(1).T,
           adata.obs[adata.obs['study'] == 'cao'].tail(1).T,
           adata.obs[adata.obs['study'] == 'packer'].head(1).T,
           adata.obs[adata.obs['study'] == 'packer'].tail(1).T,
           adata.obs[adata.obs['study'] == 'taylor'].head(1).T,
           adata.obs[adata.obs['study'] == 'taylor'].tail(1).T],
           axis=1)

It looks as below. Note that the display is transposed for convenience, the entries in first column below and the anndata obs names

	0-cao	35986-cao	0-packer	89700-packer	0-taylor	65449-taylor
barcode	A01_A02_AACTACCGAC	B02_B42_TTCTACGCCA	AAACCTGAGACAATAC-300.1.1	TGGGCGTTCAGGCCCA-b02	acr2_AAACCCAAGATCGCTT-1	u3_TTTGTCATCTTCGGTC-1
cell_subtype	nan	nan	BWM_head_row_1	nan	nan	nan
cell_type	hyp_4_to_7_bin_3_around_L2_molt	Intestine_far_posterior	BWM_head_row_1	nan	Unknown_NT	VB
embryo_time	NaN	NaN	380	265	NaN	NaN
embryo_time_bin	nan	nan	330-390	210-270	nan	nan
experiment	L2_experiment_1	L2_experiment_2	Waterston_300_minutes	Murray_b02	acr-2	unc-3
lineage	nan	nan	MSxpappp	nan	nan	nan
numi	NaN	NaN	1630	1132	NaN	NaN
passed_qc	nan	nan	True	True	nan	nan
raw_embryo_time	NaN	NaN	360	260	NaN	NaN
raw_embryo_time_bin	nan	nan	330-390	210-270	nan	nan
size_factor	NaN	NaN	1.02319	0.70682	NaN	NaN
study	cao	cao	packer	packer	taylor	taylor
time_point	nan	nan	300_minutes	mixed	nan	nan
tissue_type	nan	nan	Body_wall_muscle	nan	Neuron	Neuron

In the variables, Gene annotations include WormBase short gene descriptions, for example the first 5 entries look like:

               gene_id	gene_name	gene_description
0	WBGene00000001	aap-1           Exhibits protein kinase binding activity. Involved in dauer larval development; determination of adult lifespan; and insulin receptor signaling pathway. Localizes to the phosphatidylinositol 3-kinase complex. Human ortholog(s) of this gene implicated in several diseases, including astroblastoma; carcinoma (multiple); endometrial cancer (multiple); primary immunodeficiency disease (multiple); and type 2 diabetes mellitus. Is expressed in intestine and neurons. Orthologous to several human genes including PIK3R3 (phosphoinositide-3-kinase regulatory subunit 3).
1	WBGene00000002	aat-1		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Is expressed in several structures, including excretory system; gonadal sheath cell; nervous system; pharynx; and rectal gland cell. Orthologous to several human genes including SLC7A8 (solute carrier family 7 member 8).
2	WBGene00000003	aat-2		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to several human genes including SLC7A7 (solute carrier family 7 member 7).
3	WBGene00000004	aat-3		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Orthologous to human SLC7A5 (solute carrier family 7 member 5) and SLC7A8 (solute carrier family 7 member 8).
4	WBGene00000005	aat-4		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to human SLC7A6 (solute carrier family 7 member 6) and SLC7A7 (solute carrier family 7 member 7).

taylor2019

09 Dec 20:44
eeaddf2
Compare
Choose a tag to compare

H5AD file with data from Taylor et al BIORXIV 2019;
"Expression profiling of the mature C. elegans nervous system by single-cell RNA-Sequencing"

https://www.biorxiv.org/content/10.1101/737577v2

https://doi.org/10.1101/737577

The cell annotations have the following structure (2 entries included as example)

barcode	acr2_AAACCCAAGATCGCTT-1	acr2_AAACCCAAGTCATAGA-1
barcode	acr2_AAACCCAAGATCGCTT-1	acr2_AAACCCAAGTCATAGA-1
experiment	acr-2	acr-2
tissue	Neuron	Neuron
neuron_type	Unknown_NT	VB

AnnData has the following entries

AnnData object with n_obs × n_vars = 65450 × 21393 
    obs: 'barcode', 'experiment', 'tissue', 'neuron_type'
    var: 'gene_id', 'gene_symbol'

Packer and Taylor data

06 Feb 07:19
eeaddf2
Compare
Choose a tag to compare

Concatenated C. elegans data from Packer 2019 (89k cells) and Taylor 2019 (65k cells) together with pre-trained scVI model.

Packer 2019
A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution
https://science.sciencemag.org/content/365/6459/eaax1971.long

Taylor 2019
"Expression profiling of the mature C. elegans nervous system by single-cell RNA-Sequencing"

https://www.biorxiv.org/content/10.1101/737577v2

Concatenated anndata with Cao 2017, Packer 2019 and Taylor 2019 data

11 Feb 04:31
eeaddf2
Compare
Choose a tag to compare
AnnData object with n_obs × n_vars = 191138 × 22761 
    obs: 'barcode', 'cell_type', 'embryo_time', 'embryo_time_bin', 'experiment', 'lineage', 'numi', 'passed_qc', 'plot_cell_type', 'raw_embryo_time', 'raw_embryo_time_bin', 'size_factor', 'study', 'time_point', 'tissue_type'
    var: 'gene_id', 'gene_name', 'gene_description'

Cell from each study

packer    89701
taylor    65450
cao       35987

Cells in each experiment (batches)

L2_experiment_1                  35480
Waterston_400_minutes            25875
Waterston_300_minutes            17168
eat-4                            12743
Murray_b01                       12129
acr-2                            11719
Waterston_500_minutes_batch_2    11589
Waterston_500_minutes_batch_1    10532
Murray_r17                        9363
Pan                               9216
unc-3                             6165
tph-1_ceh-10                      4810
ift-20                            4056
cho-1_1                           3849
cho-1_2                           3471
unc-47_2                          3123
Murray_b02                        3045
ceh-34                            2648
nmr-1                             2389
unc-47_1                          1261
L2_experiment_2                    507

cao2017

10 Feb 23:43
eeaddf2
Compare
Choose a tag to compare

h5ad file with data for 36k C. elegans cells from two sci-rna-seq experiments published in Cao et al, Science 2017 (https://doi.org/10.1126/science.aam8940)

Breakdown of cells per experiment:

L2_experiment_1    35480
L2_experiment_2      507

Head of cell annotation as example:

              barcode       experiment                        cell_type
0  A01_A02_AACTACCGAC  L2_experiment_1  hyp_4_to_7_bin_3_around_L2_molt
1  A01_A02_AACTACGGCT  L2_experiment_1                              ASI
2  A01_A02_AACTATTATA  L2_experiment_1                           mu_sph
3  A01_A02_AAGACGGCCA  L2_experiment_1                         Germline
4  A01_A02_AAGTTGCCAT  L2_experiment_1                         Germline

Gene annotations include WormBase short gene descriptions, for example:

               gene_id	gene_name	gene_description
0	WBGene00000001	aap-1           Exhibits protein kinase binding activity. Involved in dauer larval development; determination of adult lifespan; and insulin receptor signaling pathway. Localizes to the phosphatidylinositol 3-kinase complex. Human ortholog(s) of this gene implicated in several diseases, including astroblastoma; carcinoma (multiple); endometrial cancer (multiple); primary immunodeficiency disease (multiple); and type 2 diabetes mellitus. Is expressed in intestine and neurons. Orthologous to several human genes including PIK3R3 (phosphoinositide-3-kinase regulatory subunit 3).
1	WBGene00000002	aat-1		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Is expressed in several structures, including excretory system; gonadal sheath cell; nervous system; pharynx; and rectal gland cell. Orthologous to several human genes including SLC7A8 (solute carrier family 7 member 8).
2	WBGene00000003	aat-2		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to several human genes including SLC7A7 (solute carrier family 7 member 7).
3	WBGene00000004	aat-3		Contributes to L-amino acid transmembrane transporter activity. Involved in amino acid transmembrane transport. Localizes to the amino acid transport complex. Orthologous to human SLC7A5 (solute carrier family 7 member 5) and SLC7A8 (solute carrier family 7 member 8).
4	WBGene00000005	aat-4		Predicted to have L-amino acid transmembrane transporter activity. Predicted to be involved in amino acid transmembrane transport. Predicted to localize to the integral component of membrane. Human ortholog(s) of this gene implicated in lysinuric protein intolerance. Orthologous to human SLC7A6 (solute carrier family 7 member 6) and SLC7A7 (solute carrier family 7 member 7).