*Ewing et al (2020) Structural variants at the BRCA1/2 loci are a common source of homologous repair deficiency in high grade serous ovarian carcinoma.*

# Notebook 6 - Investigating samples with low HRDetect scores by category of BRCA1/2 mutation

This notebook investigates whether the bimodal distribution of HRDetect scores in samples with BRCA1/2 short variants or deletions can be explained by other features such as deletion length.

## Load summary data and classify samples by mutational category

In [2]:
sampleInfo<-read.table("~/Desktop/BRCA1_BRCA2_SVs_paper/Manuscript/Intermediate_data/SampleInformation_full.txt",sep="\t",header=T,stringsAsFactors=F)
rownames(sampleInfo)<-sampleInfo[,1]

sampleInfo[sampleInfo$BRCAstatus=="None" & sampleInfo$BRCA1_pro_meth==1,"BRCAstatus"]<-"BRCA1 promoter methylation"
sampleInfo[sampleInfo$BRCAstatus=="BRCA1 promoter methylation","BRCAstatus_compound"]<-"BRCA1 promoter methylation"

In [3]:
dat<-sampleInfo[sampleInfo$BRCA1_pro_meth!=1,]


## Define mutational categories

In [4]:
#Germline SNV
    hrd_germSNV<-dat[(dat$BRCAstatus=="Germline SNV") & dat$BRCAstatus_SV=='SV absent' ,c("Sample","HRDetect")]
    df_germ<-data.frame(MutCat="Germline SNV only",HRD=hrd_germSNV,Col="SNV")

#Somatic SNV
    hrd_somSNV<-dat[(dat$BRCAstatus=="Somatic SNV") & dat$BRCAstatus_SV=='SV absent',c("Sample","HRDetect")]
    df_som<-data.frame(MutCat="Somatic SNV only",HRD=hrd_somSNV,Col="SNV")

#Single BRCA1 deletion
    hrd_singledel_brca1<-dat[(dat$BRCAstatus_SV=="Single deletion" & dat$BRCAstatus=="Deletion overlapping exon (LOF)" &
        dat$BRCA1status=="Deletion overlapping exon (LOF)"),c("Sample","HRDetect")]
    df_brca1del<-data.frame(MutCat="Single deletion at BRCA1",HRD=hrd_singledel_brca1,Col="Deletion")

#Single BRCA2 deletion
    hrd_singledel_brca2<-dat[(dat$BRCAstatus_SV=="Single deletion" & dat$BRCAstatus=="Deletion overlapping exon (LOF)" &
        dat$BRCA2status=="Deletion overlapping exon (LOF)"),c("Sample","HRDetect")]
    df_brca2del<-data.frame(MutCat="Single deletion at BRCA2",HRD=hrd_singledel_brca2,Col="Deletion")

#Double deletion
    hrd_doubledel<-dat[(dat$BRCAstatus_SV=="Double deletion" & dat$BRCAstatus=="Deletion overlapping exon (LOF)"),c("Sample","HRDetect")]
    df_doubledel<-data.frame(MutCat="Double deletion",HRD=hrd_doubledel,Col="Deletion")

#SNV+deletion (same gene)
    hrd_same<-dat[((dat$BRCAstatus_compound=="SNV + deletion (same gene)")), c("Sample","HRDetect")]
    df_snvdelsame<-data.frame(MutCat="Compound same",HRD=hrd_same,Col="SNV + deletion")

#SNV+deletions (both)
    hrd_both<-dat[((dat$BRCAstatus_compound=="SNV + deletions (both genes)")), c("Sample","HRDetect")]
    df_snvdelboth<-data.frame(MutCat="Compound both",HRD=hrd_both,Col="SNV + deletion")

#BRCA1 inversion
    hrd_brca1inv<-dat[(dat$BRCA1status=="Inversion spanning gene (INV_SPAN)" & dat$BRCA2status=="None") ,c("Sample","HRDetect")]
    df_brca1inv<-data.frame(MutCat="BRCA1 inversion",HRD=hrd_brca1inv,Col="Non-deletion SV")

#BRCA2 duplication
    hrd_brca2dups<-dat[(dat$BRCA2status=="Duplication spanning gene (COPY_GAIN)" & dat$BRCA1status=="None") ,c("Sample","HRDetect")]
    df_brca2dup<-data.frame(MutCat="BRCA2 duplication",HRD=hrd_brca2dups,Col="Non-deletion SV")

#Methylation
    hrd_methyl<-sampleInfo[sampleInfo$BRCAstatus == "BRCA1 promoter methylation",c("Sample","HRDetect")]
    df_methyl<-data.frame(MutCat="Promoter methylation",HRD=hrd_methyl,Col="Methylation")

#Combine
hrd_df<-rbind(df_germ,df_som,df_brca1del,df_brca2del,df_doubledel,df_snvdelsame,df_snvdelboth,df_brca1inv,df_brca2dup,df_methyl)

colnames(hrd_df)<-c("MutCat","Sample","HRDetect","Col")


## Investigating SNVs with low HRDetect scores

In [10]:
lowSNVs<-hrd_df[hrd_df$HRDetect<0.7 & hrd_df$Col=="SNV",]
print(sampleInfo[lowSNVs[,"Sample"],])

             Sample Cohort   BRCAstatus BRCA1status  BRCA2status BRCAstatus_SV
DO29980     DO29980     DO Germline SNV        None Germline SNV     SV absent
SHGSOC043 SHGSOC043     SH Germline SNV        None Germline SNV     SV absent
SHGSOC051 SHGSOC051     SH Germline SNV        None Germline SNV     SV absent
SHGSOC059 SHGSOC059     SH Germline SNV        None Germline SNV     SV absent
SHGSOC011 SHGSOC011     SH  Somatic SNV Somatic SNV         None     SV absent
          BRCA1status_SV BRCA2status_SV BRCA1status_compound
DO29980        SV absent      SV absent                 None
SHGSOC043      SV absent      SV absent                 None
SHGSOC051      SV absent      SV absent                 None
SHGSOC059      SV absent      SV absent                 None
SHGSOC011      SV absent      SV absent             Excluded
          BRCA2status_compound BRCAstatus_compound Double_del BRCA1_LOH
DO29980               Excluded            Excluded       <NA>         1
SHGSOC043       

For SHGSOC043 there's no evidence of LOH. For SHGSOC011 with a somatic variant it has a HRDetect score very close to the threshold so possibly borderline? For the other three there's no evidence of the tumour having a lower proportion of WT supporting reads than the normal. Loss of the variant allele?

These variants are 2 germline frameshift (SHGSOC043, SHGSOC051) and 2 germline missense in BRCA2 (DO29980, SHGSOC059) and 1 somatic frameshift in BRCA1 (SHGSOC011).

## Deletions at BRCA1

In [21]:
Dels<-hrd_df[hrd_df$MutCat=="Single deletion at BRCA1",]
df<-sampleInfo[Dels[,"Sample"],]
df$HRD<-ifelse(df$HRDetect>0.7,"HRD","HRP")
df$whichBRCA<-ifelse(df$BRCA1status=="Deletion overlapping exon (LOF)","BRCA1","BRCA2")


In [22]:
vars_toconsider<-c("Facets_WGD_score","AvPloidy","Mutational_load","SV_load","CNV_load","BRCA1_VST")
cat<-c("WGD")

In [23]:
for (i in vars_toconsider){
    print(i)
    print(wilcox.test(df[,i] ~ df$HRD))
}

[1] "Facets_WGD_score"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 18, p-value = 0.1212
alternative hypothesis: true location shift is not equal to 0

[1] "AvPloidy"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 17, p-value = 0.1818
alternative hypothesis: true location shift is not equal to 0

[1] "Mutational_load"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 11, p-value = 0.9091
alternative hypothesis: true location shift is not equal to 0

[1] "SV_load"


“cannot compute exact p-value with ties”


	Wilcoxon rank sum test with continuity correction

data:  df[, i] by df$HRD
W = 10, p-value = 1
alternative hypothesis: true location shift is not equal to 0

[1] "CNV_load"


“cannot compute exact p-value with ties”


	Wilcoxon rank sum test with continuity correction

data:  df[, i] by df$HRD
W = 18, p-value = 0.1065
alternative hypothesis: true location shift is not equal to 0

[1] "BRCA1_VST"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 6, p-value = 1
alternative hypothesis: true location shift is not equal to 0



In [24]:
df_brca1<-df[df$whichBRCA=="BRCA1",]
wilcox.test(df_brca1$BRCA1_VST ~ df_brca1$HRD)


	Wilcoxon rank sum test

data:  df_brca1$BRCA1_VST by df_brca1$HRD
W = 6, p-value = 1
alternative hypothesis: true location shift is not equal to 0


## Deletions at BRCA2

In [28]:
Dels<-hrd_df[hrd_df$MutCat=="Single deletion at BRCA2",]
df<-sampleInfo[Dels[,"Sample"],]
df$HRD<-ifelse(df$HRDetect>0.7,"HRD","HRP")
df$whichBRCA<-ifelse(df$BRCA1status=="Deletion overlapping exon (LOF)","BRCA1","BRCA2")

In [29]:
vars_toconsider<-c("Facets_WGD_score","AvPloidy","Mutational_load","SV_load","CNV_load","BRCA2_VST")
cat<-c("WGD")

In [31]:
for (i in vars_toconsider){
    print(i)
    print(wilcox.test(df[,i] ~ df$HRD))
}

[1] "Facets_WGD_score"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 9, p-value = 0.9048
alternative hypothesis: true location shift is not equal to 0

[1] "AvPloidy"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 9, p-value = 0.9048
alternative hypothesis: true location shift is not equal to 0

[1] "Mutational_load"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 20, p-value = 0.01587
alternative hypothesis: true location shift is not equal to 0

[1] "SV_load"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 16, p-value = 0.1905
alternative hypothesis: true location shift is not equal to 0

[1] "CNV_load"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 9, p-value = 0.9048
alternative hypothesis: true location shift is not equal to 0

[1] "BRCA2_VST"

	Wilcoxon rank sum test

data:  df[, i] by df$HRD
W = 5, p-value = 0.8
alternative hypothesis: true location shift is not equal to 0



In [32]:
df_brca2<-df[df$whichBRCA=="BRCA2",]
wilcox.test(df_brca2$BRCA2_VST ~ df_brca2$HRD)


	Wilcoxon rank sum test

data:  df_brca2$BRCA2_VST by df_brca2$HRD
W = 5, p-value = 0.8
alternative hypothesis: true location shift is not equal to 0


## Non-deletion SVs

In [40]:
Nondels<-hrd_df[hrd_df$MutCat=="BRCA1 inversion"|hrd_df$MutCat=="BRCA2 duplication",]
df<-sampleInfo[Nondels[,"Sample"],]
df$HRD<-ifelse(df$HRDetect>0.7,"HRD","HRP")
df$whichBRCA<-ifelse(df$BRCA1status=="None","BRCA2",ifelse(df$BRCA2status=="None","BRCA1","both"))


In [43]:
table(df$HRD,df$BRCAstatus)

     
      Duplication spanning gene (COPY_GAIN) Inversion spanning gene (INV_SPAN)
  HRD                                     1                                  3
  HRP                                     5                                  3

In general, looks like HRD samples are those with inversions at BRCA1 and HRP are those with duplications at BRCA2. However, inversions at BRCA1 are split evenly between HRD and HRP.