This repository contains the Matlab, R and Python code used to analyze data and generate figures in Cadwell et al., Cell type composition and circuit organization of clonally related excitatory neurons in the juvenile mouse neocortex, eLife (2020).
Copyright 2020 C. R. Cadwell
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
For quantitative analysis of clones and connectivity data, we used the Matlab implementation of DataJoint, which utilizes a relational database model for organizing, populating, and querying data. The DataJoint schemas are archived at atlab/commons and the critical tables for our analyses are described below.
Analysis of gene expression data was performed in R Bioconductor using custom software and previously developed packages including
Modeling of the cortical circuit was done in Python using Jupyter Notebook.
For efficiency, the data were stored at intermediated stages of analysis as .mat, .txt, or .csv files.
Files related to Figure 1
Reconstruction of clones across slices (analyses used for Figure 1)
Tile scan Z-stacks of entire coronal sections were first maximially projected using the commercial acquisition software for the microscope. The positions of labeled cells were annotated using the following Matlab-based custom software:
Segmentation.mThis code selects one maximally projected coronal section at a time, and has the user manually outline the contours of the cortex, and mark the positions of cortical neurons by presenting small patches of the cortex area. The positions of annotated cortical neurons are saved to a separate file.
showImages.mThis code shows all annotated coronal sections for an entire mouse brain, including the outlines of the cortex and positions of the neurons identified above. The user can scroll through the slices to see how individual clones appear on adjacent sections. These images are aligned manually across slices to visualize reconstructed clones shown in Figure 1B,C and Figure 1-supplement 1A,B.
Quantification of clones at P10 and E12.5
CountCells.mWhile active, this code will count the number of annotated neurons within an area selected by the user, while viewing an annotated coronal section.
CloneQuantificaiton.matSaved variables used to generate Figure 1D-F and Figure 1-supplement 1C,D.
Figure1D-J S1CD.mCode to generate Figure 1D-F,I,J and Figure 1-supplement 1C,D
Files related to Figure 2
Quality control of single-cell RNA-seq data, visualization using t-SNE, and generalized linear models to predict layer or region from gene expression data
Figure2C-H S1 S2.rtfCode to run in R Bioconductor to generate panels for Figure 2C-H, Figure 2-supplement 1 and Figure 2-supplement 2.
annotations.txtInput data needed to run ananlysis and generate figure panels using above script in R.
- All other files in this folder are final or intermediate outputs of the above R script.
Files related to Figure 3
t-SNE projection of Patch-seq data onto reference atlas and transcriptomic cluster assignment of each cell
rnaseqTools.pyUseful functions for RNA-seq analysis used copied from other repositories of dkobak.
microcolumns.ipynbPython notebook for t-SNE projection and transcipitomic cluster assignment.
Contains the raw data (normalized logcounts,
counts) and metadata for each of the 206 samples included in our Patch-seq dataset. Metadata includes the following pieces of information about each cell:
genesGene names for the data included in
labelIndicates whether the cell was labeled by a fluroscent indicator (
positive) or not (
layerLayer position of the cell.
regionBrain region, if known (
V1= primary visual cortex,
SS1= primary somatosensory cortex).
sampleUnique sample ID.
sliceSlice number (numbering restarted for each animal).
subjectUnique animal ID.
Contains our t-SNE projection data for the reference dataset from Tasic et al.2018.
allentsneContains the x and y t-SNE corrdinates for each cell in the reference atlas. The third column is the cluster ID.
allentsneNamesNames of cell clusters for each cell in the reference atlas.
allentsneColorRGB values for each cell in the reference atlas.
Contains the t-SNE projection data for mapping our Patch-seq dataset onto the reference atlas.
cProjContains the x and y t-SNE coordinates for each cell in our Patch-seq dataset. The third column is a measure of uncertainty of the mapping (see Methods section of paper for how this is computed, larger values indicate greater uncertainty).
Shows the best matching transcriptomic cluster in the reference atlas for each cell in our Patch-seq dataset. Cluster names and cluster IDs are the same as those used in Tasic et al., 2018.
className of the best-matched transcriptomic cluster for each Patch-seq cell.
classIDCluster ID of the best-matched transcriptomic cluster for each Patch-seq cell.
Script for generating figure panels in Figure 3, Figure 3-supplement 1, and Figure 3-supplement 2.
Files related to Figures 4 and 5 and Table 1
Analysis of layer-specific connectivity rates (Figures 4 and 5)
Sort.mCode for sorting connectivity data into layer-specific groups, a 3x3 matrix representing each layer combination.
Groups.matConnections sorted into layer-specific groups.
allCounts.matSummary of number of connections, with a 3x3 matric for each layer combination in each of the following categories:
biConnR: Related pairs with bidirectional connections.
biConnU: Unrelated pairs with bidirectional connections.
biUnconnR: Related pairs without bidirenctional connections.
biUnconnU: Unrelated pairs without bidirectional connections.
connR: Related pairs with connection.
connU: Unrelated pairs with connection.
unconnR: Related pairs without connection.
unconnU: Unrelated pairs without connection.
Simple model of connectivity (Figures 4G and 5E)
docker-compose installed. All files are in the folder
docker-compose up. This will start a jupyter notebook server and a mysql server in docker files.
- Open your browser and go to
- Open the notebook
Main.ipynbin the browser and execute all cells (SHIFT-ENTER).
- This will generate the file
Analysis of connectivity using distance-matched controls (Figure 5 - supplement 2)
Resample.mCode for generating resampled data.
ResampledData.matResampled data generated using
TwoSided.mCode to generate two-sided p-values for resampled data.
TwoSided.matTwo sided p-values generated using
Power analysis (Figure 5 - Supplement 1)
PowerAnalysis.rtfCode used for power analysis in R.
Prlare output of 'PowerAnalysis.rtf`.
Figures4DEF5CDFS2S3.mCode to generate Figure 4D-G, Figure 5C-F, Figure 5-supplement 2, Figure 5-supplement 3, and generalized linear model shown in Table 1.
Figure4G5E.mCode used for plotting panel panels 4G and 5E.
Figure5S1Code to generate Figure 5 - supplement 1.
Custom-written functions used by multiple files above:
Computes the Chi-squared test statistic and p-value.
DataJoint database structure
Detailed table definitions can be found at atlab/commons