# scRNA-seq and scATAC-seq integration analysis
2026.02.20 @carushi 

## Preprocess
* Download scRNA-seq and scATAC-seq metadata from GEO
    * https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273197
* scRNA-seq data
    * GSE273197_scrna_X.mtx.gz (normalized)
    * GSE273197_scrna_obs.csv.gz
    * GSE273197_scrna_var.csv.gz
    * GSE273197_scrna_X_raw.mtx.gz (raw count)
    * GSE273197_scrna_obs_raw.csv.gz
    * GSE273197_scrna_var_raw.csv.gz
* scATAC-seq data
    * GSE273197_scatac_snap_obj_clust.rds.gz
* Integrated data
    * GSE273197_scrna_atac_integrated.h5Seurat
* Place the files under data/geo_metadata

## Preprocess 2
* Get whitelist data provided by 10x Genomics
    * scRNA-seq: 3M-february-2018.txt.gz
    * scATAC-seq: 737K-cratac-v1.txt
    * See https://kb.10xgenomics.com/s/article/115004506263-What-is-a-barcode-inclusion-list-formerly-barcode-whitelist
* Download CellMarker 2.0 annotation file
    * Open Cell_marker_All.xlsx using Excel
    * Save as txt file
    * nkf -Lu all_cell_marker.txt > all_cell_marker_lu.txt (for UNIX)
    * mv all_cell_marker_lu.txt ../data/gene/
    * run parse_cell_marker.R
* Download Mat (gene set) files
    * Download from xxxxxx and plate them under /data/gene/ 

In [None]:
grep "\"CD[0-9]" 16-90.merged.gtf | grep "\tgene\t" | cut -d"\t" -f 9- | cut -d" " -f6 | sed 's/\"//g' | sed 's/;//g' > cd_gene.txt
grep chrM 16-90.merged.gtf | grep "\ttranscript" | cut -d"\t" -f9 | cut -d" " -f2 | sed s/\"//g | sed s/\;//g > chrM_gene.txt"


## scRNA-seq analysis (original)
* Run STARsolo for alignment

In [None]:
# bash ../scripts/alignment/mapping/alignment_solo.sh
# bash ../scripts/alignment/mapping/alignment_solo_refseq.sh # Refseq genes
# bash ../scripts/alignment/mapping/alignment_solo_quad.sh   # use 16-90 personal genome annotation 
bash ../scripts/alignment/mapping/alignment_solo_cd4.sh    # change CD4 annotation from Ensembl to Refseq -> Used for downstream analysis

## scRNA-seq analysis by Scanpy
* Start from here
* gunzip ../data/geo_metadata/ files
* Run scRNA_scanpy.ipynb (please follow Data setup 2)
* scarmadillo_filt_scobj.pyn and scarmadillo_filt_modf_scobj.pyn will be created

## Data conversion
* Convert scRNA-seq data into csv and create seurat_scrna_integrated.rds

In [None]:
python ../scripts/alignment/mapping/write_scanpy_obj.py scarmadillo_filtered_modf_scobj.pyn scarmadillo_filtered
Rscript ../scripts/alignment/mapping/read_scRNAseq.R

## scATAC-seq analysis (original)
### Run STARsolo alignment and SnapTools


In [None]:
bash ../scripts/alignment/mapping/alignment_scatac_by_star.sh
# bash ../scripts/alignment/mapping/alignment_scatac_by_star_relaxed.sh

In [None]:
bash ../scripts/alignment/snaptools_preprocess.sh
# bash ../scripts/alignment/snaptools_preprocess_pub.sh
# bash ../scripts/alignment/snaptools_preprocess_relaxed.sh

## scATAC-seq analysis by SnapATAC
### Set up conda environments
* genome - insntall MACS2, snapatac, and HOMER at ~/miniconda/env/genome/bin
* You can change this setting defined in multi_sc_base.R

In [None]:
Rscript ../scripts/sc_clustering/snap_basic_stats.r make_list file_list.txt
Rscript ../scripts/sc_clustering/snap_basic_stats.r plot file_list.txt 500 2.35

### Relaxed setting
abolished

In [None]:
# Rscript ../scripts/sc_clustering/snap_basic_stats.r make_list file_list.txt
# Rscript ../scripts/sc_clustering/snap_basic_stats.r plot file_list.txt 1500 2.35

In [None]:
### Filter out unreliable cells
Rscript ../scripts/sc_clustering/snap_atac_clustering.R snap_obj_filtered.rds

In [None]:
Rscript ../scripts/sc_clustering/snap_obj_analysis.R snap_obj_clust.rds

## Seurat CCA integration and cell type annotation


In [None]:
### Clustering and annotation
Rscript ../scripts/sc_clustering/coembedding_for_cell_typing.R

In [None]:
Rscript ../scripts/sc_clustering/seurat_obj_analysis.R final_pseudo_bulk_atac.rds
Rscript ../scripts/sc_clustering/seurat_obj_analysis.R final_pseudo_bulk_rna.rds
Rscript ../scripts/sc_clustering/seurat_obj_analysis.R final_coembed.rds

## Extra

In [None]:
Rscript extract_barcode.R
bash run_cellsnp.sh