sc-eQTL

This repository contains the scripts to parse the single-cell RNA-sequencing (scRNA-seq) data and perform downstream analyses like variance partitioning and eQTL mapping from the manuscript [Refining the resolution of the yeast genotype-phenotype map using single-cell RNA-sequencing](BioRxiv link). The supplementary files "Matrix_gene_expression_barcodes_1_to_9000.csv" and "Matrix_gene_expression_barcodes_9001_to_18233.csv" contain the expression profile of the single cells. The gene names are listed in the file "Table_expressed_genes.csv". As for the file "Table_single_cell_barcodes_mapping_to_reference_panel_strains_0_based_index.csv", it contains the single cell assignment to the reference panel strains (0-based index). In the latter table, the column best_match indicates the 0-based index of the closest reference panel strains to each single cell while the column significant_best_match represents the same information but with missing values (NA) when the relatedness between the single cell and its closest strain is not statistically significant (N'Guessan et al., 2023).

Pipeline

I. Expression count

To obtain the expression profile of each single cell, filter out noisy single cells (empty droplets, low number of reads, etc) and obtain the read mapping that allows to perform the allele count, run the following scripts in the specified order:

submit_cellCOUNT_BY_ref.sh: In this script file, replace the arguments values strating with "$" by the appropriate value described at https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count where $BY-reference is the path to the reference transcriptome (e.g. https://github.com/arnaud00013/sc-eQTL/tree/main/BY_reference). Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_cellCOUNT_RM_ref.sh: In this script file, replace the arguments values strating with "$" by the appropriate value described at https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count#cr-count where $RM-reference is the path to the reference transcriptome (e.g. https://github.com/arnaud00013/sc-eQTL/tree/main/RM_reference). Replace #HEADER with the allocated ressources if you are running the script on a server.

II. Creating single-cell imputed and corrected genotypes

To create single-cell imputed and corrected genotypes from raw single-cell reads count, run the following scripts in the specified order:

submit_Estimate_perrs_HMM.sh: Make sure to run the script in the directory sc-eQTL/II_scRNA-seq_genotyping/!!! Replace the following arguments in the script with the corresponding value: $workspace_path (path to the directory containing the file indincating the list of barcodes and the data/ sub-directory, which containts the bam file), $bam_filename, $list_of_barcodes_filename, $number_of_minimum_mismatch_within_the_same_read_for_index_swapping, $minimum_coverage_per_site and $number_of_cpus. Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_scRNAseq_genotyping_pipeline.sh: Make sure to run the script in the directory sc-eQTL/II_scRNA-seq_genotyping/!!! Replace the following arguments in the script with the corresponding value: $Nb_cells (the number of barcodes or cells) and $Nb_cpus (number of cpus allocated for this task). Replace #HEADER with the allocated ressources if you are running the script on a server.

III. Genotype analysis

To determine the closest reference panel strain to each of the single cell and to estimate the number of breakpoints per genotype, run the following scripts in the specified order:

submit_get_uncorrected_gen_dist_mtx.sh: Make sure to run the script in the directory sc-eQTL/III_Genotype_analysis/!!! Replace the following arguments in the script with the corresponding value: $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I), $Nb_cpus (number of cpus allocated for this task) and $Number_of_subsampes_for_lineage_assignment (recommended value is 500). The reference panel genotype can be downloaded at https://datadryad.org/stash/dataset/doi:10.5061/dryad.1rn8pk0vd. You can then search "Import reference panel genotypes" in the python script generate_dist_cell_to_lineage_uncorrected_gen.py and edit the code to import the reference panel genotypes with the correct file names. Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_get_corrected_gen_dist_mtx.sh: Make sure to run the script in the directory sc-eQTL/III_Genotype_analysis/!!! Replace the following arguments in the script with the corresponding value: $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I), $Nb_cpus (number of cpus allocated for this task) and $Number_of_subsampes_for_lineage_assignment (recommended value is 500). The reference panel genotype can be downloaded at https://datadryad.org/stash/dataset/doi:10.5061/dryad.1rn8pk0vd. You can then search "Import reference panel genotypes" in the python script generate_dist_cell_to_lineage_corrected_gen.py and edit the code to import the reference panel genotypes with the correct file names. Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_Count_nb_breakpoints.sh: Make sure to run the script in the directory sc-eQTL/III_Genotype_analysis/!!! Replace the following arguments in the script with the corresponding value: $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I) and $Nb_cpus (number of cpus allocated for this task). Replace #HEADER with the allocated ressources if you are running the script on a server.

IV. Variance partitioning

To evaluate the association between the genotype, gene expression and the phenotype, run the following scripts in the specified order:

submit_Varpart_Pheno_vs_Genotype_and_Expression.sh: Make sure to run the script in the directory sc-eQTL/IV_variance_partitioning/!!! Replace the following arguments in the script with the corresponding value: $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I) and $Nb_cpus (number of cpus allocated for this task). Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_generate_pca_expression_partitions.sh: Make sure to run the script in the directory sc-eQTL/IV_variance_partitioning/!!! Replace the following arguments in the script with the corresponding value: $nb_expression_pcs_partitions (This script allows to form groups/partitions of expression principal components and this argument determine the number of partitions selected by the user), $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I) and $Nb_cpus (number of cpus allocated for this task). Replace #HEADER with the allocated ressources if you are running the script on a server.
submit_Varpart_Pheno_vs_Genotype_and_Expression.sh: This script performs the correlation between a single partition of expression PCs and the corresponding genotypes. Make sure to run the script in the directory sc-eQTL/IV_variance_partitioning/!!! Replace the following arguments in the script with the corresponding value: $the_ind_partition (0-based index of the partition), $nb_expression_pcs_partitions (Total number of expression PCs partitions), $nb_expression_PCs (Number of expression principal components explaining 99% of expression variance), $workspace (the path to the directory containing the data/ sub-directory), $cellranger_outs_folder (the path to the cellranger output directory generated at the step I) and $Nb_cpus (number of cpus allocated for this task). Replace #HEADER with the allocated ressources if you are running the script on a server.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
BY_reference		BY_reference
III_Genotype_analysis		III_Genotype_analysis
II_scRNA-seq_genotyping		II_scRNA-seq_genotyping
IV_variance_partitioning		IV_variance_partitioning
I_scRNA-seq_expression_count		I_scRNA-seq_expression_count
RM_reference		RM_reference
V_sc_eQTL_mapping		V_sc_eQTL_mapping
Matrix_gene_expression_barcodes_1_to_9000.csv.tar.gz		Matrix_gene_expression_barcodes_1_to_9000.csv.tar.gz
Matrix_gene_expression_barcodes_9001_to_18233.csv.tar.gz		Matrix_gene_expression_barcodes_9001_to_18233.csv.tar.gz
README.md		README.md
Table_expressed_genes.csv		Table_expressed_genes.csv
Table_single_cell_barcodes_mapping_to_reference_panel_strains_0_based_index.csv		Table_single_cell_barcodes_mapping_to_reference_panel_strains_0_based_index.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sc-eQTL

Pipeline

I. Expression count

II. Creating single-cell imputed and corrected genotypes

III. Genotype analysis

IV. Variance partitioning

About

Releases

Packages

Languages

arnaud00013/sc-eQTL

Folders and files

Latest commit

History

Repository files navigation

sc-eQTL

Pipeline

I. Expression count

II. Creating single-cell imputed and corrected genotypes

III. Genotype analysis

IV. Variance partitioning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages