# Compare the results of hierarchical clustering between HM27 and HM450

This note commpares the hierarchical clustering difference between HM27 and HM450 array by computing the rand index between the membership of the probes that are common to both arrays.

Given two partitions of a set $X = \{X_1, \dots, X_r\}$ and $Y = \{Y_1, \dots, Y_s\}$, the rand index is defined as follows:

$$R = \frac{a + b}{\binom{n}{2}}$$

where $a$ is the number of pairs of elements in S that are in the same subset in X and in the same subset in Y, $b$ is the number of pairs of elements in S that are in different subsets in X and in different subsets in Y, and $n$ is the total number of elements in S.

Basically, rand index represents the frequency of occurrence of agreements over all possible pairs of elements in the set S.

In [57]:
source('helperFunctions.R')

In [5]:
load('../data/180413/HM27/resultsMcentersubset/clusterInfo.RData')
HM27.cluster <- clustering$cluster # the probes are aggregated based on the gene id
length(HM27.cluster) 

In [6]:
load('../data/180413/HM450/vbsrMcenterScaleData/clusterInfo.RData')
HM450.cluster <- clustering$cluster
length(HM450.cluster)

In [7]:
# find intersections between two set of probes and subset these probes and compute the rand index
intersect.probes <- intersect(names(HM27.cluster), names(HM450.cluster))
HM27.intersect <- HM27.cluster[intersect.probes]
HM450.intersect <- HM450.cluster[intersect.probes]
length(HM27.intersect)

In [8]:
length(unique(HM27.intersect))
length(unique(HM450.intersect))

In [9]:
HM450.intersectOrdered <- HM450.intersect[names(HM27.intersect)]
randIdx <- rand.index(HM450.intersectOrdered, HM27.intersect)

In [10]:
print(randIdx)  

[1] 0.878243


The agreement of the clustering on the common probes are high between HM27 and HM450

## Extract the probe and gene id for each methylation cluster

In [11]:
load('../data/processed/HM27.RData')

In [12]:
geneid <- data.HM27[,seq(2)]
head(geneid)

Unnamed: 0,ID,Gene.Symbol
1,cg00000292,ATP2A1
2,cg00002426,SLMAP
3,cg00003994,MEOX2
4,cg00005847,HOXD3
7,cg00008493,COX8C;KIAA1409
8,cg00008713,IMPA2


In [17]:
i <- 1
cluster1 <- ExtractClusterProbes(HM27.cluster, geneid, i)
head(cluster1)

[1] "The number of probes is 33 and the number of genes is 33"


id,gene
cg00128877,MKNK1
cg00381076,C18orf54
cg00560119,STXBP1
cg00840516,HYAL2
cg02564523,ORAI2
cg02668984,PDK2


In [37]:
merged1 <- MergeHGNCDescription(cluster1)
head(merged1)

“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

id,gene,description
cg00128877,MKNK1,MAP kinase interacting serine/threonine kinase 1 [Source:HGNC Symbol;Acc:HGNC:7110]
cg00381076,C18orf54,chromosome 18 open reading frame 54 [Source:HGNC Symbol;Acc:HGNC:13796]
cg00560119,STXBP1,syntaxin binding protein 1 [Source:HGNC Symbol;Acc:HGNC:11444]
cg00840516,HYAL2,hyaluronoglucosaminidase 2 [Source:HGNC Symbol;Acc:HGNC:5321]
cg02564523,ORAI2,ORAI calcium release-activated calcium modulator 2 [Source:HGNC Symbol;Acc:HGNC:21667]
cg02668984,PDK2,pyruvate dehydrogenase kinase 2 [Source:HGNC Symbol;Acc:HGNC:8810]


In [40]:
WriteCSV(merged1, path='../data/180413/HM27/resultsMcentersubset/cluster1.csv')

## Putting this all together

In [42]:
WriteClusterInfo(HM27.cluster, geneid, '../data/180413/HM27/resultsMcentersubset/')

[1] "The number of probes for cluster 1 is 33 and the number of genes is 33"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster1.csv"
[1] "The number of probes for cluster 2 is 26 and the number of genes is 26"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster2.csv"
[1] "The number of probes for cluster 3 is 78 and the number of genes is 78"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster3.csv"
[1] "The number of probes for cluster 4 is 29 and the number of genes is 29"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster4.csv"
[1] "The number of probes for cluster 5 is 15 and the number of genes is 15"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster5.csv"
[1] "The number of probes for cluster 6 is 12 and the number of genes is 12"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster6.csv"
[1] "The number of probes for cluster 7 is 26 and the number of genes is 26"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster7.csv"
[1] "The number of probes for cluster 8 is 36 and the number of genes is 36"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster8.csv"
[1] "The number of probes for cluster 9 is 27 and the number of genes is 27"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster9.csv"
[1] "The number of probes for cluster 10 is 31 and the number of genes is 31"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster10.csv"
[1] "The number of probes for cluster 11 is 83 and the number of genes is 83"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster11.csv"
[1] "The number of probes for cluster 12 is 13 and the number of genes is 13"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster12.csv"
[1] "The number of probes for cluster 13 is 22 and the number of genes is 22"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster13.csv"
[1] "The number of probes for cluster 14 is 22 and the number of genes is 22"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster14.csv"
[1] "The number of probes for cluster 15 is 36 and the number of genes is 36"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster15.csv"
[1] "The number of probes for cluster 16 is 53 and the number of genes is 53"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster16.csv"
[1] "The number of probes for cluster 17 is 25 and the number of genes is 25"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster17.csv"
[1] "The number of probes for cluster 18 is 3 and the number of genes is 3"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster18.csv"
[1] "The number of probes for cluster 19 is 9 and the number of genes is 9"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster19.csv"
[1] "The number of probes for cluster 20 is 11 and the number of genes is 11"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/resultsMcentersubset//cluster20.csv"


## Write the same information in the predictors

In [50]:
load('../data/180413/HM27/vbsrresultsMcentersubset/cluster1.RData')

In [54]:
MergeHGNCDescription(coef)

“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

gene,coef,description
ADPRH,-0.3957416,ADP-ribosylarginine hydrolase [Source:HGNC Symbol;Acc:HGNC:269]
C7orf46,-0.1575272,
CYTH2,0.4603937,cytohesin 2 [Source:HGNC Symbol;Acc:HGNC:9502]
KLF11,-0.1665144,Kruppel like factor 11 [Source:HGNC Symbol;Acc:HGNC:11811]
SH3PXD2A,-0.2671733,SH3 and PX domains 2A [Source:HGNC Symbol;Acc:HGNC:23664]


In [53]:
base.name

In [58]:
WritePredictorInfo('../data/180413/HM27/vbsrresultsMcentersubset/', 20)

“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster1Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster2Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster3Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster4Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster5Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster6Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster7Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster8Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster9Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster10Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster11Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster12Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster13Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster14Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster15Predictor.csv"
[1] "No predictor selected for cluster 16"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster17Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster18Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster19Predictor.csv"


“Column `gene`/`hgnc_symbol` joining factor and character vector, coercing into character vector”

[1] "Wrote csv to ../data/180413/HM27/vbsrresultsMcentersubset//cluster20Predictor.csv"
