4. eQTL mapping

Notes

Docker

The Docker Engine should be running. You should run docker run -v $PWD:$PWD -w $PWD -it dgarrimar/eqtlmapping to work within the the dgarrimar/eqtlmapping container. It will be pulled from Docker Hub if you don't have it locally yet. You can review the Docker slides and hands-on if you have any doubt.

Your working directory

You will work within this directory. You can simply clone my GitHub repo and go to it (note you may also have it already within your Ubuntu image from the previous sessions). The directory contains a bin folder with the scripts you will need for your pipeline (remember to give them execution permissions!), an input folder for the unprocessed and processed input files, a tmp folder for intermediate files and a result folder for the plots and other result files.

Your pipeline

Use the nano text editor to create a bash script named run.sh, where you are going to store all the commands needed in your pipeline. Use # to add comments in the script explaining what each command does and to answer the corresponding questions. Remember that you can use the command echo to print messages in the screen.

## Author: Your Name
## Date: DD/MM/YYYY
## eQTL Hands-On

## Task 1

echo "Running command1 to do X"

command1            # This command does X 
                    # Answer to Q1: we need to set the option --opt 
echo -e "\tdone!"

# [...]

Once done, make sure that you can run bash run.sh to reproduce exactly all your commands.

1. cis eQTL mapping

1.1. Input data and software

For eQTL mapping we need gene expression and genotype data from a set of individuals. We may also have covariate information (gender, age, inferred covariates, etc.). In terms of computation, it requires performing thousands of linear regressions (as well as hypothesis testing for the β's and a multiple testing correcction) in a very fast manner.

Task 1: Download the genotype VCFand the corresponding .tbi index:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf{.gz,.gz.tbi} --directory-prefix input/unprocessed/1000g

Now have a look to the input data: gene expression (RPKM) of all annotated human genes obtained from the GEUVADIS project and genotype data from the 1000G project (chr22), both with their corresponding metadata; and also to QTLtools, one of the standard methods for eQTL mapping and the one that we will use during this hands-on.

1.2. Input data preprocessing

We need to modify the input files to select genes and variants of interest, perform data transformations (e.g. gene expression normalization) and adapt them to the format required by QTLtools.

First, let's prepare the genotype file!

Task 2: Subset the input VCF file, keeping only biallelic SNPs and indels with MAF ≥ 0.05 and those sample IDs for which we have both expression and genotype data (for duplicated IDs keep only the first). To do that, we will use bcftools. Besides, discard those variants with less than 10 individuals per genotype group.

 
# Get GEUVADIS samples from the metadata
cut -f1 input/unprocessed/geuvadis/geuvadis.metadata.txt | sed '1d' | sort | uniq > tmp/geuvadis.samples.txt 

# Subset the VCF (common samples, biallelic SNPs and indels, MAF >= 0.05, no duplicates)
bcftools view -v snps,indels -m 2 -M 2 -q 0.05:minor -S tmp/geuvadis.samples.txt -Ob input/unprocessed/1000g/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz | bcftools norm -d all -Oz -o tmp/genotypes.chr22.vcf.gz

# Subset the VCF so that there are at least 10 individuals per genotype group and compress it (for indexing we require 'bgzip' compression)
filter.genotype.py -t 10 -g <(zcat tmp/genotypes.chr22.vcf.gz) | bgzip > input/processed/genotypes.chr22.vcf.gz

# Index the VCF
tabix -p vcf input/processed/genotypes.chr22.vcf.gz

Q1: What do the bcftools options employed mean? Q2: How many variants do you get in input/processed/genotypes.chr22.vcf.gz? Q3: How many samples do you have before and after subsetting? Hint: The genotype file contains information from many samples, but only a subset of them has gene expression data in the GEUVADIS project. Note that the GEUVADIS metadata contains duplicated sample IDs, as gene expression in GEUVADIS is calculated from paired-end RNA-Seq data. You can use bcftools to print the sample IDs and count them.

Now let's parse the GEUVADIS gene expression file to include the fields required by QTLtools: chr, TSS position, length and strand, for each gene. Where can we get this information?

Task 3: Use the GENCODE human gene annotation to obtain the information above. Q1: Which version of GENCODE is GEUVADIS using? Q2: To which genome assembly does this annotation correspond? Q3: How many protein coding genes are annotated in the last version (v29)? Hint: Check GEUVADIS paper. To run the scripts in bin without the need of bash script.sh or ./bin/script.sh add the bin folder to the PATH environment variable. Q4: Which command do you use to do this?

# Set the variable 'release' to the version of GENCODE used in GEUVADIS (e.g. `release=99`) and download the corresponding GENCODE annotation
release=
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_$release/gencode.v$release.annotation.gtf.gz
mv gencode.v$release.annotation.gtf.gz input/unprocessed/gencode/gencode.annotation.gtf.gz

# Obtain a BED file from the GTF, selecting just the 'protein coding' and 'lincRNA' genes
zcat input/unprocessed/gencode/gencode.annotation.gtf.gz | grep "gene_type \"protein_coding\"\|gene_type \"lincRNA\"" | gtf2bed.sh > tmp/gencode.annotation.bed

# The BED file should look like this:
# chr1    34553   36081   ENSG00000237613.2       .       -
# chr1    69090   70008   ENSG00000186092.4       .       +
# chr1    89550   91105   ENSG00000239945.1       .       -

Note that we implicitly have the information that we wanted. Q5: But how to get the TSS positions and the gene lengths from it? Q6: to which BED coordinates would correspond the GTF coordinates chr1 10 20? Why? Q7: Why do we need to use tmpfile below? Hint Let's consider the TSS as the first base of the (first transcript of the) gene (5' -> 3'). Note BED positions are 0-indexed and GTF positions are 1-indexed (some extra help here).

# Compute gene lengths 
awk 'BEGIN{OFS="\t"}{print $1,$2,$3,$4,$3-$2,$6}' tmp/gencode.annotation.bed > tmpfile; mv tmpfile tmp/gencode.annotation.bed

# Compute TSS positions. Note that for genes in the '+' strand, the TSS is the start position, and for genes in the '-' strand it is the end position!
awk 'BEGIN{OFS="\t"}{if($6=="+"){print $1,$2,$2+1,$4,$5,$6}else{print $1,$3-1,$3,$4,$5,$6}}' tmp/gencode.annotation.bed > tmpfile; mv tmpfile tmp/gencode.annotation.bed

# Remove 'chr' from chromosome names (-i option to modify the file 'in place')
sed -i "s/^chr//" tmp/gencode.annotation.bed

Almost there! But we still need to add the expression value on each sample to match the format required by QTLtools. Besides, do you remember the chromosome names in the VCF? They should match the ones in the expression file!

Task 4: Obtain the gene expression file in the format required by QTLtools. Hint Remember that the gene expression file contains data for all genes in the annotation, while the BED file that you just created only contains PC and lincRNA.

# Join the bed file with the expression file
# (Both files should be row-ordered by gene ID. Column order and header are lost in the output file)
join -1 4 -2 1 -t $'\t' <(sort -k4,4 tmp/gencode.annotation.bed) <(zcat input/unprocessed/geuvadis/geuvadis.gene_expr_rpkm.tsv.gz | sort -k1,1) > tmp/joint.tsv

# Subset chr22 (same as the VCF file)
awk '$2==22' tmp/joint.tsv > tmp/joint.chr22.tsv

# Recover the column order, sort rows by chr and start position (WARNING: this command may not work within the docker container for WSL users)
paste <(awk 'BEGIN{OFS="\t"}{print $2,$3,$4,$1,$5,$6}' tmp/joint.chr22.tsv) <(cut -f1-6 --complement tmp/joint.chr22.tsv) | sort -k1,1V -k2,2n > tmp/joint.chr22.bed

# Recover the header
cat <(zcat input/unprocessed/geuvadis/geuvadis.gene_expr_rpkm.tsv.gz | head -1 | sed "s/TargetID/#chr\tstart\tend\tgene\tlength\tstrand/") tmp/joint.chr22.bed > tmp/genes.chr22.rpkm.bed

The last step in gene expression pre-processing is gene expression normalization. We will also remove lowly expressed genes accross all the samples. Q1: Of all genes considered, which have lower expression levels, protein-coding or lincRNA? Q2: Why do we need gene expression to be normal? Q3: How would you check that quantile normalization worked? Q4: and that gene expression of a gene follows a normal distribution?

# Run gene expression normalization (quantile normalization + gene expression to normal distribution)
# Filter out genes with less than 0.1 RPKM in 50% of the samples
normalize.R -i tmp/genes.chr22.rpkm.bed -o tmp/genes.chr22.norm.bed

# Compress and index the final gene expression file
bgzip tmp/genes.chr22.norm.bed
tabix -p bed tmp/genes.chr22.norm.bed.gz
mv tmp/genes.chr22.norm.bed.gz* input/processed

Task 5: Check that normalization worked! Hint: Plot the distribution of the values in rows (genes) and columns (samples). Compare the plots before and after normalization. Q1: What can you see?

# Before:
# That one is yours ;)

# After:
check.norm.R -i input/processed/genes.chr22.norm.bed.gz -o result/plots/check.norm.pdf

1.3. Covariates

In eQTL mapping and other linear regression analyses, it is common to remove the effect of variables (either numerical or categorical) that are not of interest or that may be potential confounding factors, including them as covariates in the model. Usual covariates are gender, age, batch effects...

Task 6: Review the genotype and gene expression metadata to identify potential covariates for your analysis. Q1: Which ones would you select? Hint: Maybe you find useful this small oneliner:

head -1 <file> | awk 'BEGIN{FS=OFS="\t"}{for (i=1;i<=NF;i++) {print i, $i}}'
# 1 <field1>
# 2 <field2>
# [...]

Principal Component Analysis (PCA) is a useful technique to visualize high-dimensional data (such as our gene expression or genotype matrices), and can help us to identify outlier samples or groups of samples with similar patterns of expression or genotypes.

Task 7: Let's use the pca tool in QTLtools to obtain PCs of our expression and genotype data. Q1: What do the parameters employed mean? Q2: Which information do the output files contain? Hint: Review QTLtools documentation.

# Expression PCA
QTLtools pca --bed input/processed/genes.chr22.norm.bed.gz --scale --center --out result/expression 

# Genotypes PCA
QTLtools pca --vcf input/processed/genotypes.chr22.vcf.gz --scale --center --maf 0.05 --distance 50000 --out result/genotypes

Now let's plot the first two PCs in each case:

# Note the input should coincide with the output of QTLtools pca on each case
pcaPlot.R -i result/expression -o result/plots/expression.pca.pdf
pcaPlot.R -i result/genotypes -o result/plots/genotypes.pca.pdf

Q3: What can you observe in the plots?

You can color the samples by the different values of a field in the corresponding metadata. For example, let's plot the first two genotype PCs and color the points by the super_pop field. Try other combinations! Q4: With this information, which covariates seem more relevant to explain the variability in the data?

pcaPlot.R -i result/genotypes --metadata input/unprocessed/1000g/1000g.phase3_metadata.txt --color super_pop --out result/plots/genotypes.pca.super_pop.pdf

We can also try to determine which fraction of the total variance in the expression data is explained by each potential covariate:

# Generate a common metadata with info about the population, gender and laboratory.
join -j 1 -t $'\t' <(sort -k1,1 input/unprocessed/1000g/1000g.phase3_metadata.txt) <(cut -f1,20 input/unprocessed/geuvadis/geuvadis.metadata.txt | sort -k1,1 | uniq) > tmp/metadata.txt

# Set names for the new metadata
sed -i '1s/^/sampleID\tpop\tsuper_pop\tgender\tlab\n/' tmp/metadata.txt

# Build a linear model and plot the contribution of each factor in the metadata to the total variance
var_partition.R -i input/processed/genes.chr22.norm.bed.gz -m tmp/metadata.txt --formula "~ (1|gender) + (1|pop) + (1|lab)" -o result/plots/vp.pdf

Q5: Which are the factors that explain more variance?

It may also happen that the metadata does not give us much information, or even that we do not have metadata at all! In these cases, it is usual to infer hidden covariates ussing techniques such as PEER or PCA itself.

Task 8: Use PEER to infer hidden covariates from the expression matrix. Q1: How much variance do they explain? On average is it more or less than the explained by the known factors?

# Compute 10 PEER factors
peer.R -i input/processed/genes.chr22.norm.bed.gz -p 10 -o tmp/peer.tsv

# Check how much variance do the first 5 PEER explain in comparison with the known factors
var_partition.R -i input/processed/genes.chr22.norm.bed.gz -m <(paste tmp/peer.tsv tmp/metadata.txt) -f "~ (1|pop) + (1|lab) + PEER1 + PEER2 + PEER3 + PEER4 + PEER5" -o result/plots/vp.peer.pdf

Finally generate a covariate file in the format required by QTLtools.

# 'Rscript -e' is just a trick to run an R script without opening an interactive R session in the console. ;)
join -j 1 -t $'\t' tmp/metadata.txt tmp/peer.tsv  | Rscript -e 'write.table(t(read.table(file("stdin", open = "r", blocking = T), h = F)), file = "input/processed/covariates.tsv", quote = F, sep = "\t", col.names = F, row.names = F)'

# Compress it
gzip input/processed/covariates.tsv

1.4. cis eQTL mapping (nominal pass)

Now we are ready to map expression QTLs! Let's run a nominal pass. The command below performs, for each phenotype (i.e. gene expression), linear regressions between it and the genotypes of the variants in a window of 1Mb around the TSS.

QTLtools cis --vcf input/processed/genotypes.chr22.vcf.gz --bed input/processed/genes.chr22.norm.bed.gz --cov input/processed/covariates.tsv.gz --nominal 0.01 --out result/nominals.txt

Task 9: Run a nominal pass reporting all p-values in (0,1] and check the output file nominals.txt. Have a look at QTLtools documentation for a detailed description of the options and the output format. Which information contains each field in the ouptut file? Q1: Are there pairs genotype-phenotype with exactly the same p-value and effect size (β)? How is this possible?

Have a look at the p-value distribution. Q2: What do you observe?

pvdist.R -i result/nominals.txt --col 12 -o result/plots/pvdist.pdf

Calculate LD between a pair of variants with exactly the same p-value and effect size using PLINK. Q3: Which SNPs did you select? What do you observe?

plink --ld <snp_id1> <snp_id2> --vcf input/processed/genotypes.chr22.vcf.gz --out tmp/ld2

1.5. cis eQTL mapping (permutation pass)

Let's run now a permutation pass. The command below performs 1000 permutations of the phenotypes and carries out linear regressions on the permuted data analogously to the nominal pass. The distribution of permuted p-values for each phenotype would be used later on to correct nominal p-values.

QTLtools cis --vcf input/processed/genotypes.chr22.vcf.gz --bed input/processed/genes.chr22.norm.bed.gz --cov input/processed/covariates.tsv.gz --permute 1000 --out result/permutations.txt

Task 10: Run a permutation pass and check the output file permutation.txt. Have a look at QTLtools documentation for a detailed description of the output format. Hint: This can be done faster using a 4 threads:

for j in $(seq 1 16); do
  echo "cis --vcf input/processed/genotypes.chr22.vcf.gz --bed input/processed/genes.chr22.norm.bed.gz --cov input/processed/covariates.tsv.gz --permute 1000 --chunk $j 16 --out result/permutations_$j.txt"
done | xargs -P4 -n14 QTLtools
cat result/permutations_*.txt > result/permutations.txt; rm result/permutations_*.txt

Check that the beta approximation p-values look like the permutation p-values in R. First open R by typing (guess what!) R:

R

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

and then run:

p <- read.table("result/permutations.txt")                                                      # Read input file
pdf("result/plots/pv-correlation.pdf",  paper = 'a4r', width = 9, height = 6)                   # Open PDF device
plot(p[, 18], p[, 19], xlab = "pv (perm)", ylab = "pv (beta)")                                  # Plot p-values
abline(0, 1, col = "red")                                                                       # Add red line 1=1
plot(-log10(p[, 18]), -log10(p[, 19]), xlab = "-log10 pv (perm)", ylab = "-log10 pv (beta)")    # Repeat in -log10 space to check the behaviour of the small p-values.
abline(0, 1, col = "red")
dev.off()                                                                                       # Close device
quit("no")                                                                                      # Exit R

1.6. Multiple testing correction

In eQTL mapping i) multiple genetic variants are tested per per phenotype, and ii) multiple phenotypes are tested genome-wide. We will use phenotype permutations to correct for the former, and FDR to correct for the latter.

Task 11: Use the script mtc.R to perform multiple testing correction through different methods: i) Bonferroni on all tests, ii) FDR on all tests, iii) two-level: permutation + FDR. Set α to 0.05 (default). Q1: How many significant eQTLs do we find in each case in comparison with the nominal pass? Hint: Run Rscript mtc.R --help. Note the output file now has a header, and \t as column separator.

mtc.R -n <nominal pass output> -p <permutation pass output> --method <method> --alpha <alpha> --out <output file>

Store i) and ii) results in folder tmp. Store iii) results in results/eqtls.tsv.

Task 12: Now have a look at your results! Use eQTLviewer.R to make a plot for the top 10 eQTLs

eQTLviewer.R -i <(head -n 10 result/eqtls.tsv) -g input/processed/genotypes.chr22.vcf.gz -e input/processed/genes.chr22.norm.bed.gz -o result/plots/eQTLs_head.pdf --verbose

2. eQTL functional analysis

Once we have a set of significant eQTLs, our goal is to study their functional characteristics.

Task 13: Use the Ensembl Regulation data to assess the enrichment of eQTLs in annotated functional features.

Download and parse the Ensembl Regulation dataset:

# Download from ftp server
rsync -av rsync://ftp.ensembl.org/ensembl/pub/grch37/release-86/regulation/homo_sapiens/AnnotatedFeatures.gff.gz input/unprocessed/ensembl

# Get chr, start, end and feature name in BED format
zcat input/unprocessed/ensembl/AnnotatedFeatures.gff.gz | awk 'BEGIN{FS=OFS="\t"}{print $1, $4-1, $5, $9}' | sed -r 's/Name=([^;]+);.*/\1/' | grep -v '^GL' | sort -V > tmp/ERD.bed

# Merge overlapping features of the same type 
# e.g. chr1 100 200 feat1            chr1 100 300 feat1
#      chr1 150 300 feat1     =>     chr1 100 250 feat2
#      chr1 100 250 feat2
for feat in $(cut -f4 tmp/ERD.bed | sort | uniq); do 
  bedtools merge -i <(grep -Fw $feat tmp/ERD.bed) -c 4 -o distinct
done > input/processed/ERD.collapsed.bed

# Remove 'chr' from chromosome names (-i option to modify the file 'in place')
sed -i "s/^chr//" input/processed/ERD.collapsed.bed

Perform the enrichment of top eQTLs:

for feat in $(cut -f4 input/processed/ERD.collapsed.bed | sort | uniq); do 
  QTLtools fenrich --qtl <(sed '1d' result/eqtls.tsv | awk '{print $9, $10-1, $10, $8, $1, "."}') --tss tmp/gencode.annotation.bed  --bed <(grep -Fw $feat input/processed/ERD.collapsed.bed) --out tmp/enrich.txt > /dev/null; echo "$(cat tmp/enrich.txt) $feat" 
done | grep -Fwv inf | grep -Fwv nan > result/enrichments.txt

plot.enrich.R -i result/enrichments.txt -o result/plots/enrich.pdf

Q1: Which are the top enriched features? Which kind of factors are they? Q2: What does an odds ratio lower than one mean?

Task 14: Now use the Variant Effect Predictor (VEP) to assess the impact of the eQTL variants. Q1: Which kind of consequences have they, according to the VEP? In which proportion? Q2: How many eQTLs are high impact variants? Which consequences are related to those high impact variants? Q3: Out of all high impact variants, how many of them are falling in acceptor splice sites of protein coding genes? Hint: Use the command below to get the variants that you will upload to the VEP.

sed '1d' result/eqtls.tsv | cut -f8 | sort | uniq > tmp/eqtls_snps.tsv

Task 15: And what about eGenes? Perform a GO enrichment to learn more about their function.

# Generate a list of sGenes
cut -f1 result/eqtls.tsv | sed '1d' | sed 's/\..\+//' | sort | uniq > tmp/egenes.txt

# We will use as background all the genes (PC and lincRNA) in chr22
awk '{if($1==22) print $4}' tmp/gencode.annotation.bed | sed 's/\..\+//' | sort | uniq > tmp/bg.txt

Use GOrilla to perform the enrichment. Select the options Two unranked lists of genes (target and background lists) and upload egenes.txt and bg.txt. Select ontology All and check Show output also in REVIGO. Q1: In which biological processes are your eGenes enriched? Which molecular functions and components correspond to those processes? Hint: When you have many GO terms, REVIGO is a very useful tool to visualize them.

3. eQTL sharing

Map eQTLs for different populations and compute eQTL sharing (π₁, jaccard index). Note: We will skip this analysis.

4. eQTL and GWAS co-localization

In input/unprocessed/gwas you will find the file gwas.catalog.hg19.bed, which contains a parsed version of the GWAS Catalog. We will use the Regulatory Trait Concordance (RTC) method to co-localize our eQTLs with the GWAS variants in the Catalog.

Task 16: Perform a co-localization analysis.

# Generate input files for QTLtools rtc
grep -Fwf <(cut -f1 result/eqtls.tsv ) result/permutations.txt > tmp/rtc_input
cut -f4,7 input/unprocessed/gwas/gwas.catalog.hg19.bed > tmp/gwas_trait

# Download the file 'hotspots_b37_hg19.bed' from QTLtools website
wget http://jungle.unige.ch/QTLtools_examples/hotspots_b37_hg19.bed --directory-prefix tmp

# Remove 'chr' from chromosome names (-i option to modify the file 'in place')
sed -i 's/^chr//' tmp/hotspots_b37_hg19.bed

# Run RTC
QTLtools rtc --vcf input/processed/genotypes.chr22.vcf.gz --bed input/processed/genes.chr22.norm.bed.gz --cov input/processed/covariates.tsv.gz --hotspot tmp/hotspots_b37_hg19.bed --gwas-cis tmp/gwas_trait tmp/rtc_input --out result/rtc.txt

Explore the output file. Review QTLtools documentation to know which information contains each column. Q1: How many pairs of variants have a RTC value above 0.9? Q2: For each pair, we have a GWAS hit and an eQTL. Can you find one example so that the gene to which the eQTL is associated is relevant for the trait/disease to which the GWAS variant is associated. Explore the literature and the biological databases that you know to gather more information. Q3: Which consequences, according to the variant effect predictor, do these co-localized eQTL variants have?

5. eQTL fine-mapping

CAVIAR (CAusal Variants Identification in Associated Regions) uses LD structure to model the observed marginal test statistics for each eGene as following a multivariate normal distribution (MVN). Applying this model, CAVIAR can define a credible set containing all causal variants with probability ρ.

Prepare input files for CAVIAR:

# Generate the ID/Z-scores input file. Select your favourite gene (e.g. gene=ENS00000000000.0).
# Set k (number of variants) to 50
gene=
compZscore.R --gene $gene --nominal result/nominals.txt -k 50 --output tmp/$gene.rs_z

# Generate the LD matrix 
plink --r square --snps $(cut -f1 tmp/$gene.rs_z) --vcf input/processed/genotypes.chr22.vcf.gz --out tmp/$gene

Task 17: Obtain a credible set of causal variants with probability ρ=0.95.

CAVIAR -z tmp/$gene.rs_z -l tmp/$gene.ld -o result/$gene

Explore the output files. Q1: How many variants are there in the credible (ρ=0.95) set? For each of these variants, which is the probability to be causal? Q2: Which are the p-values and effect sizes of these variants? How are they in comparison to the p-values and effect sizes of other variants tested for the same gene? Q3: Which consequences, according to the variant effect predictor, do these variants have?

Task 18: Generate a LocusZoom plot. Use as 'SNP' any of the colocalized or fine-mapped variants. Hint: You can generate the file that you should upload to LocusZoom as follows (remember the separator in the output file is (space), not \t):

# Define the gene corresponding to the co-localized or fine-mapped variants of interest
gene= 
cat <(echo "MarkerName P.value") <(grep $gene result/nominals.txt | cut -d " " -f8,12) > tmp/metal.$gene

Task 19: Make an empty folder called eQTL_HandsOn in your home directory. Copy to it your run.sh script, and the result folder with its content. Add, commit and push these changes to a GitHub repository with the same name. Hint: You may have to compress large files using gzip.

Contact

Diego Garrido Martín
diego.garrido@crg.eu
Computational Biology of RNA Processing
Centre for Genomic Regulation (CRG)
Doctor Aiguader, 88 - 08003 BARCELONA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly