**Script to analyze [deconstructSigs](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0893-4)**

deconstructSigs aims to determine the contribution of known mutational processes to a tumor sample. 
By using deconstructSigs, one can:

1. Determine the weights of each mutational signature contributing to an individual tumor sample

2. Plot the reconstructed mutational profile (using the calculated weights) and compare to the original input sample

In [None]:
# 1. You need download these packages to run deconstructSigs
source("https://bioconductor.org/biocLite.R")
biocLite("deconstructSigs")
biocLite("BSgenome.Hsapiens.UCSC.hg19")
biocLite("GenomeInfoDb")

In [24]:
# 2. Once installed, the packages can be loaded.
library("deconstructSigs")
library("BSgenome.Hsapiens.UCSC.hg19")
library("GenomeInfoDb")

The most basic initial input to the deconstructSigs package consists of a data frame containing
the mutational data for a tumor sample set. 
This structure must contain the genomic position and base change for each mutation,
as well as a sample identifier. The output of ANNOVAR is used for this analysis.
First column - Sample 
Second column - Chromosome
Third column - Start
Fourth column - Ref
fifth column - Alt
The file that use in this example you can download in 
[AllPatientsMutSig.csv](https://github.com/Martinez-Gregorio-Hector/workflow_to_analysis_WES/tree/master/data/MutationalSignature)

In [25]:
# setwd("/data/Lab13/Hec_prov/") - Your working directory
# Upload your file eithe in CSV or TXT 
AllPatients <- read.csv("AllPatientsMutSig.csv", sep = ",", header = TRUE)
head(AllPatients)


Sample,Chr,Start,Ref,Alt
<fct>,<fct>,<int>,<fct>,<fct>
TNBC038,chr1,13657,AG,-
TNBC038,chr1,664652,G,A
TNBC038,chr1,672081,C,T
TNBC038,chr1,977157,CGGCCAGTGCCAGGGTCGAGGTGGGCGGCTCCCCCGGGGGAGGGCTG,-
TNBC038,chr1,1007432,GC,AT
TNBC038,chr1,1242424,GAG,-


Using the function mut.to.sigs.input, the mutational data for a set of tumors is converted to an n-row and 96-columns data frame where n is the number of samples present. 
Each column respresents how frequently a mutation is found within each trinucleotide context.

**mut.to.sigs.input()**

In [26]:
# Convert to deconstructSigs input
sigs.input <- mut.to.sigs.input(mut.ref = AllPatients, 
                                sample.id = "Sample", 
                                chr = "Chr", 
                                pos = "Start", 
                                ref = "Ref", 
                                alt = "Alt")
head (sigs.input)

Unnamed: 0_level_0,A[C>A]A,A[C>A]C,A[C>A]G,A[C>A]T,C[C>A]A,C[C>A]C,C[C>A]G,C[C>A]T,G[C>A]A,G[C>A]C,⋯,C[T>G]G,C[T>G]T,G[T>G]A,G[T>G]C,G[T>G]G,G[T>G]T,T[T>G]A,T[T>G]C,T[T>G]G,T[T>G]T
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
TNBC038,15,5,4,10,2,2,7,10,7,12,⋯,8,3,2,3,12,4,4,2,12,11
TNBC037,10,1,1,6,4,4,3,5,8,11,⋯,6,3,2,3,4,2,3,2,2,6
TNBC035,1,1,0,2,1,0,0,2,0,0,⋯,1,0,0,0,0,0,2,0,1,1
TNBC034,3,2,1,2,2,3,4,1,3,1,⋯,1,2,2,1,0,0,2,2,0,3
TNBC033,9,2,2,10,6,4,6,6,9,14,⋯,6,3,2,1,6,1,3,1,5,7


The output from mut.to.sigs.input can then be used as input to whichSignatures. Alternatively, a user can generate their own input data frame using calculated mutation information for each trinucleotide context per sample.

The function whichSignatures takes these two inputs (tumor.ref, signatures.ref) and uses an iterative approach to determine weights to assign to each signature in order to best recontruct the mutational profile of the input tumor sample.
An additional parameter to whichSignatures will dictate how any further normalization is done. This parameter, tri.counts.method, is originally set to 'default', which does not result in further normalization.

If tri.counts.method is set to 'exome', the input data frame is normalized by number of times each trinucleotide context is observed in the exome.

**whichSignatures()**

In [22]:
rownames(sigs.input)

In [23]:
colnames(sigs.input)

In [None]:
# Loop for Signatures

for (i in sigs.input){
AllPatients = whichSignatures(tumor.ref = sigs.input, 
                    signatures.ref = signatures.cosmic, 
                    sample.id = "TNBC037", 
                    contexts.needed = TRUE,
                    tri.counts.method = 'exome')  
}

In [35]:
# Your results also could be save eithe csv or txt 
write.csv(AllPatients, file = "PruebaTNBC037_v.0.1.csv") 