Skip to content

Renthal-Lab/painseq-multiome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DRG Multiomic & Enhancer Activity Pipeline

This repository stores code for comprehensive single-cell multiomic processing for Mouse and Human Dorsal Root Ganglion (DRG) data, building in DRG based Gene Regulatory Network (GRN) inference via SCENIC+, and training Predicting Accessibility In Nociceptors-net (PAIN-net),.


πŸ“‚ Repository Structure

1. PAIN-net (Predicting Accessibility In Nociceptors-net)

PAIN-net is a 1D Residual Neural Network (ResNet) designed to model the relationship between genomic sequence and cell-type-specific activity.

  • Multi-Task Learning: The model predicts both Differential Expression (DE) $Log_2$ Fold Changes (primary task) and auxiliary ATAC-seq signal intensity (auxiliary task).

  • Architecture: Features a stem Conv1D layer followed by 5 dilated residual blocks to capture long-range DNA interactions across 800bp input sequences.

  • Weighted Loss Strategy: Implements inverse-frequency weighting plus explicit boosting (e.g., for C-PEP, Mrgprd, and Calca+Sstr2) to handle class imbalances.

  • Custom Loss: Combines Mean Squared Error (MSE) and negative Pearson correlation to optimize for both magnitude and trend.

2. multiome/ (Species-Specific Processing)

This directory contains Level 1 processing pipelines for reconstructing and integrating single-cell data.

🐭 Mouse (Naive mDRG)

Processes 18 multiome libraries focusing on the RNA modality.

  • Workflow: Performs Seurat v5 integration using CCA (Canonical Correlation Analysis) to correct batch effects.

πŸ‘€ Human (hDRG)

Reprocesses raw fragment files for 32 human DRG libraries to generate a new Signac object.

  • Workflow: Performs Seurat v5 integration using CCA (Canonical Correlation Analysis) to correct batch effects.

3. SCENIC+ (Regulatory Network Inference)

A stepwise workflow to generate region sets and topic models from Signac/Seurat objects for GRN inference.

Step Script Purpose
1 Step1_writepycisTopicInputs.R Exports peak matrix, cells, and metadata from Signac for pycisTopic compatibility.
2 Step2_writeSignacDAPsBEDs.R Exports cluster-specific Differential Accessibility Peaks (DARs) as BED files.
3 Step3_create_custom_cisTarget_database.sh Builds a custom cisTarget motif database restricted to consensus peaks.
4 Step4_Run_pycisTopic.py Runs MALLET topic modeling, binarizes topics (Otsu/Top-3k), and exports BEDs.
5 Step5_RunSCENICPlus.sh Initializes and runs the SCENIC+ Snakemake pipeline.

πŸš€ Recommended Execution Plan

  1. Parallel Group 1 (Export): Run Step1_writepycisTopicInputs.R and Step2_writeSignacDAPsBEDs.R to generate initial matrices and BED files.
  2. Parallel Group 2 (Modeling): Build the custom motif database (Step 3) and run pycisTopic modeling (Step 4).
  3. Final (Inference): Edit config.yaml to include species-specific resources (e.g., mm10 for mouse, hg38 for human) and run the SCENIC+ Snakemake pipeline.

⚠️ Tips & Pitfalls

  • Genome Consistency: Ensure peaks, blacklist files, reference FASTAs, and chromsizes match your species (mm10 vs hg38).
  • Barcode Reconciliation: In Step 4, ensure the ___cisTopic suffix is removed from cell names to match RNA AnnData barcodes for multiomic integration.
  • Compute Intensity: Steps 3 and 4 are CPU-heavy and should be run as batch jobs on an HPC system.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published