# TCGA-COAD Batch Effects
```
pi:ababaian
files: ~/Crown/data2/tcga_batch/
start: 2019 08 08
complete : 2019 08 22
```
## Introduction

Batch effects effect RNA-seq library prep. In the lab we have determined that even lot numbers of RT enzyme (SSIII) can alter the VAF at macp.1248. To test if the 'hypo-modification' is simply a batch effect, the VAF for each RNA-seq library will be taken and broken down into distinct batches by when the samples were prepared/sequenced.

The null hypothesis is that there is no hypo-modification and the effect is simply a batch-effect in TCGA-COAD, therefore the per-batch variance of the normal samples will equal the per-batch variance of the CRC samples.


### TCGA Data Collection Overview
(Taken from https://bioinformatics.mdanderson.org/public-software/tcga-batch-effects/)


"Below is a quick overview of the various steps involved in the TCGA Data Collection Process, each of which could potentially introduce systematic errors. Relevant terms are highlighted in bold and explanatory links have been provided wherever possible.

- Biospecimens consisting of Tumor and Normal tissue samples and clinical metadata are collected from patients donors at various Tissue Source Sites (TSS). Each TSS is identified by its unique TSS ID.
- These biospecimens are transported to TCGA Biospecimen Core Resources (BCR) laboratories, which ensure these specimens meet the TCGA biospecimen criteria . Specimens of sufficient quality are cataloged, processed and stored for analysis. Any patient identifying information is removed during the process.
- Specimens are grouped into batches consisting of a fixed set of patient cases of the same disease study. Each batch is uniquely determined by the first shipment of a group of analytes from the Biospecimen Core Resource.
- Analytes are transported to sequencing centers on various ShipDates and processed into usable data on plates each with its own unique PlateID.
- This data are then sent to TCGA Genome Characterization Centers and Genome Sequencing Centers (CGCCandGSC) for interpretation.
- TCGA Genome Characterization Centers analyze many of the genetic changes involved in cancer including how the genome is rearranged or how gene expression changes in tumors compared to normal cells.
- High-throughput TCGA Genome Sequencing Centers identify the changes in DNA sequence associated with specific types of cancer.
- The information that is generated by the TCGA Research Network is centrally managed at the TCGA Data Coordinating Center and made available on the TCGA Data Portal by release date.
"


## Materials and Methods

TCGA batch information was downloaded from: [BatchEffectsViewer App](https://bioinformatics.mdanderson.org/BatchEffectsViewer)

- TCGA-COAD Disease and  RNA-seq Counts workflow was selected
- The file `GDC.2018.11.20.1200..current._COAD_RNASeq.counts.zip` was downloaded (press "Download Archive")

![TCGA-COAD Batch Effects](../../data2/tcga_batch/plot/BatchEffectViewer_COAD.png)


- In that zip was the `BatchData.tsv` file listing sample-batch

## Results / Discussion

Ran `crc_batch.Rmd` script

### VAF by Batch Effect

Sepearte VAF values by batch to see differences in mean.


There are clear batch effects in the data as expected. Overall hypo-macp is observed across each data-set (relative to intra-batch controls) which rules out that this is simply a batch-specific effect. This does raise a cautionary flag that the pan-TCGA data needs to be normalized by batch for reliable measurment, especially in the large cancer cohorts without normal controls (i.e. DLBCL).

![VAF by BatchId](../../data2/tcga_batch/plot/macpVAF_batch.png)

![VAF by date](../../data2/tcga_batch/plot/macpVAF_date.png)

![VAF by plate](../../data2/tcga_batch/plot/macpVAF_plate.png)


### VAF and CMS reanalyzed

Breakdown CMS classification per batch, and look at VAF values between CMS groups


The batches `116.77.0`, `76.78.0`, `138.78.0`, `89.77.0` have a high number of samples, show consistency and normal controls within batch to test if there is a correlation between CMS group and hypo-macp.

Batch `116.77.0` has significant difference between groups by ANNOVA at p = 0.0143 (unadjusted). Tukey follow-up shows no single comparison as significant. Other batches do not replicate this finding although may trend towards decrease in CMS2.

![VAF/CMS 1](../../data2/tcga_batch/plot/macpVAF.cms.batch_116.77.0.png) 
![VAF/CMS 2](../../data2/tcga_batch/plot/macpVAF.cms.batch_76.78.0.png)
![VAF/CMS 3](../../data2/tcga_batch/plot/macpVAF.cms.batch_138.78.0.png)
![VAF/CMS 4](../../data2/tcga_batch/plot/macpVAF.cms.batch_89.77.0.png)


#### All Stat Tests


```
[1] "Batch: 76.78.0"
[1] ""
            Df Sum Sq Mean Sq F value Pr(>F)
cms          4 0.0881 0.02202   0.818  0.526
Residuals   24 0.6461 0.02692               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff        lwr       upr     p adj
CMS2-CMS1   0.068785980 -0.2494036 0.3869755 0.9674370
CMS3-CMS1  -0.087372505 -0.4820381 0.3072931 0.9645496
CMS4-CMS1   0.006826659 -0.3154164 0.3290698 0.9999963
NOLBL-CMS1 -0.068913107 -0.4380889 0.3002627 0.9808993
CMS3-CMS2  -0.156158485 -0.4743480 0.1620311 0.6055123
CMS4-CMS2  -0.061959321 -0.2840500 0.1601314 0.9212515
NOLBL-CMS2 -0.137699087 -0.4236615 0.1482633 0.6221757
CMS4-CMS3   0.094199164 -0.2280439 0.4164423 0.9081439
NOLBL-CMS3  0.018459398 -0.3507164 0.3876352 0.9998858
NOLBL-CMS4 -0.075739766 -0.3662058 0.2147262 0.9372712

[1] "Batch: 29.82.0"
[1] ""
[1] "Batch: 28.85.0"
[1] ""
            Df Sum Sq  Mean Sq F value Pr(>F)
cms          4 0.0122 0.003061    0.16  0.956
Residuals   23 0.4393 0.019100               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff        lwr       upr     p adj
CMS2-CMS1   0.093092963 -0.3288315 0.5150174 0.9644745
CMS3-CMS1   0.074944411 -0.3967815 0.5466703 0.9893656
CMS4-CMS1   0.099843578 -0.3476749 0.5473620 0.9630428
NOLBL-CMS1  0.117432076 -0.3393145 0.5741787 0.9394179
CMS3-CMS2  -0.018148552 -0.2765234 0.2402263 0.9995529
CMS4-CMS2   0.006750615 -0.2042116 0.2177128 0.9999804
NOLBL-CMS2  0.024339113 -0.2055516 0.2542299 0.9977602
CMS4-CMS3   0.024899166 -0.2734465 0.3232448 0.9991176
NOLBL-CMS3  0.042487665 -0.2695297 0.3545050 0.9940785
NOLBL-CMS4  0.017588499 -0.2564595 0.2916365 0.9996872

[1] "Batch: 116.77.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)  
cms          4 0.09716 0.024289   3.522 0.0143 *
Residuals   43 0.29659 0.006897                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                  diff           lwr        upr     p adj
CMS2-CMS1  -0.10621445 -0.2273518618 0.01492296 0.1104153
CMS3-CMS1  -0.01438454 -0.1419086440 0.11313956 0.9976072
CMS4-CMS1  -0.04530513 -0.1697259910 0.07911573 0.8369462
NOLBL-CMS1  0.03145403 -0.1412144225 0.20412248 0.9850197
CMS3-CMS2   0.09182991 -0.0007761007 0.18443592 0.0529221
CMS4-CMS2   0.06090932 -0.0273744662 0.14919311 0.3005356
NOLBL-CMS2  0.13766848 -0.0110859164 0.28642288 0.0816604
CMS4-CMS3  -0.03092059 -0.1277820977 0.06594092 0.8919668
NOLBL-CMS3  0.04583857 -0.1081614148 0.19983856 0.9140904
NOLBL-CMS4  0.07675916 -0.0746810917 0.22819941 0.6040186

[1] "Batch: 41.78.0"
[1] ""
            Df Sum Sq Mean Sq F value Pr(>F)
cms          4 0.0775 0.01938   0.941  0.449
Residuals   43 0.8854 0.02059               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                    diff        lwr        upr     p adj
CMS2-CMS1   0.0294009311 -0.1266027 0.18540457 0.9829979
CMS3-CMS1  -0.0316366816 -0.2953310 0.23205759 0.9969585
CMS4-CMS1  -0.0314182987 -0.2178783 0.15504171 0.9888202
NOLBL-CMS1 -0.0820907842 -0.2622283 0.09804670 0.6942069
CMS3-CMS2  -0.0610376128 -0.3180550 0.19597977 0.9605463
CMS4-CMS2  -0.0608192298 -0.2377107 0.11607227 0.8632758
NOLBL-CMS2 -0.1114917153 -0.2817056 0.05872221 0.3513788
CMS4-CMS3   0.0002183829 -0.2763465 0.27678327 1.0000000
NOLBL-CMS3 -0.0504541026 -0.3227964 0.22188817 0.9840482
NOLBL-CMS4 -0.0506724855 -0.2491743 0.14782935 0.9491098

[1] "Batch: 123.74.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)
cms          4 0.01309 0.003273   0.418  0.794
Residuals   25 0.19563 0.007825               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                    diff        lwr       upr     p adj
CMS2-CMS1  -0.0009391152 -0.1318641 0.1299858 1.0000000
CMS3-CMS1   0.0755654040 -0.1327349 0.2838657 0.8221582
CMS4-CMS1   0.0189240978 -0.1155332 0.1533814 0.9934816
NOLBL-CMS1  0.0360572754 -0.1267786 0.1988931 0.9649794
CMS3-CMS2   0.0765045192 -0.1265875 0.2795965 0.8016395
CMS4-CMS2   0.0198632129 -0.1063750 0.1461014 0.9900446
NOLBL-CMS2  0.0369963905 -0.1191217 0.1931144 0.9554819
CMS4-CMS3  -0.0566413062 -0.2620282 0.1487456 0.9251111
NOLBL-CMS3 -0.0395081286 -0.2644982 0.1854819 0.9849767
NOLBL-CMS4  0.0171331776 -0.1419588 0.1762252 0.9976797

[1] "Batch: 138.78.0"
[1] ""
            Df Sum Sq Mean Sq F value Pr(>F)
cms          4 0.0588 0.01469   1.238   0.31
Residuals   41 0.4866 0.01187               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff         lwr       upr     p adj
CMS2-CMS1   0.056247518 -0.10037390 0.2128689 0.8425680
CMS3-CMS1   0.031966062 -0.13415604 0.1980882 0.9814615
CMS4-CMS1   0.099121436 -0.03931365 0.2375565 0.2645385
NOLBL-CMS1  0.051120559 -0.13085709 0.2330982 0.9286782
CMS3-CMS2  -0.024281456 -0.18090288 0.1323400 0.9917657
CMS4-CMS2   0.042873918 -0.08400393 0.1697518 0.8695604
NOLBL-CMS2 -0.005126959 -0.17847510 0.1682212 0.9999880
CMS4-CMS3   0.067155373 -0.07127971 0.2055905 0.6411561
NOLBL-CMS3  0.019154497 -0.16282315 0.2011321 0.9981548
NOLBL-CMS4 -0.048000877 -0.20511098 0.1091092 0.9056272

[1] "Batch: 89.77.0"
[1] ""
            Df Sum Sq  Mean Sq F value Pr(>F)
cms          4 0.0174 0.004341   0.376  0.824
Residuals   36 0.4156 0.011546               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                    diff         lwr       upr     p adj
CMS2-CMS1  -0.0257630866 -0.15801913 0.1064930 0.9800560
CMS3-CMS1   0.0873980523 -0.23978533 0.4145814 0.9384336
CMS4-CMS1   0.0006891637 -0.14563169 0.1470100 1.0000000
NOLBL-CMS1  0.0102465139 -0.16560927 0.1861023 0.9998155
CMS3-CMS2   0.1131611389 -0.20425337 0.4305756 0.8428648
CMS4-CMS2   0.0264522504 -0.09648186 0.1493864 0.9713103
NOLBL-CMS2  0.0360096006 -0.12092433 0.1929435 0.9638501
CMS4-CMS3  -0.0867088886 -0.41023648 0.2368187 0.9377292
NOLBL-CMS3 -0.0771515384 -0.41506508 0.2607620 0.9644876
NOLBL-CMS4  0.0095573502 -0.15939942 0.1785141 0.9998359

[1] "Batch: 157.71.0"
[1] ""
            Df Sum Sq Mean Sq F value Pr(>F)
cms          3 0.0327 0.01090   0.989  0.415
Residuals   23 0.2536 0.01102               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                 diff         lwr        upr     p adj
CMS2-CMS1 -0.09796660 -0.25961741 0.06368422 0.3579995
CMS3-CMS1 -0.02870456 -0.23415916 0.17675004 0.9798770
CMS4-CMS1 -0.05486808 -0.20233118 0.09259502 0.7340745
CMS3-CMS2  0.06926204 -0.13124112 0.26976520 0.7752485
CMS4-CMS2  0.04309852 -0.09738383 0.18358087 0.8305333
CMS4-CMS3 -0.02616352 -0.21541441 0.16308736 0.9804780

[1] "Batch: 300.56.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)
cms          2 0.01249 0.006244   0.713  0.516
Residuals    9 0.07880 0.008755               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                 diff        lwr       upr     p adj
CMS3-CMS2 0.009787615 -0.1897402 0.2093155 0.9897239
CMS4-CMS2 0.069197368 -0.1060499 0.2444447 0.5362928
CMS4-CMS3 0.059409753 -0.1313754 0.2501949 0.6717371

[1] "Batch: 36.83.0"
[1] ""
            Df   Sum Sq   Mean Sq F value Pr(>F)
cms          4 0.003489 0.0008722   1.172  0.347
Residuals   25 0.018604 0.0007442               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff         lwr        upr     p adj
CMS2-CMS1  -0.021741256 -0.07271185 0.02922934 0.7213337
CMS3-CMS1  -0.008680727 -0.07409526 0.05673381 0.9947963
CMS4-CMS1  -0.019300196 -0.07595084 0.03735045 0.8526033
NOLBL-CMS1  0.007264335 -0.05392536 0.06845403 0.9966126
CMS3-CMS2   0.013060529 -0.03791006 0.06403112 0.9416213
CMS4-CMS2   0.002441060 -0.03665160 0.04153372 0.9997282
NOLBL-CMS2  0.029005591 -0.01641598 0.07442716 0.3557424
CMS4-CMS3  -0.010619469 -0.06727012 0.04603118 0.9808642
NOLBL-CMS3  0.015945062 -0.04524463 0.07713475 0.9381634
NOLBL-CMS4  0.026564531 -0.02515020 0.07827926 0.5666790

[1] "Batch: 45.80.0"
[1] ""
            Df    Sum Sq   Mean Sq
cms          3 0.0005848 0.0001949
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                    diff lwr upr p adj
CMS2-CMS1  -0.0049986887 NaN NaN   NaN
CMS4-CMS1   0.0238863287 NaN NaN   NaN
NOLBL-CMS1 -0.0057144199 NaN NaN   NaN
CMS4-CMS2   0.0288850174 NaN NaN   NaN
NOLBL-CMS2 -0.0007157312 NaN NaN   NaN
NOLBL-CMS4 -0.0296007486 NaN NaN   NaN

[1] "Batch: 33.79.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)  
cms          4 0.05204 0.013009   3.358  0.046 *
Residuals   12 0.04649 0.003874                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                  diff         lwr       upr     p adj
CMS2-CMS1   0.08723902 -0.12486141 0.2993394 0.6900263
CMS3-CMS1   0.21667519 -0.02631637 0.4596667 0.0892179
CMS4-CMS1   0.07679835 -0.14053991 0.2941366 0.7901872
NOLBL-CMS1  0.17530668 -0.06768488 0.4182982 0.2103806
CMS3-CMS2   0.12943617 -0.02963914 0.2885115 0.1335874
CMS4-CMS2  -0.01044067 -0.12661285 0.1057315 0.9983080
NOLBL-CMS2  0.08806766 -0.07100766 0.2471430 0.4347741
CMS4-CMS3  -0.13987684 -0.30587168 0.0261180 0.1151845
NOLBL-CMS3 -0.04136851 -0.23977029 0.1570333 0.9603948
NOLBL-CMS4  0.09850833 -0.06748651 0.2645032 0.3714838

[1] "Batch: 30.80.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)
cms          4 0.03411 0.008526   0.393  0.809
Residuals   10 0.21685 0.021685               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff        lwr       upr     p adj
CMS2-CMS1  -0.018563874 -0.3120289 0.2749012 0.9995035
CMS3-CMS1   0.040846958 -0.3646333 0.4463272 0.9969405
CMS4-CMS1   0.172359905 -0.3585383 0.7032581 0.8182348
NOLBL-CMS1 -0.011682141 -0.5425804 0.5192161 0.9999925
CMS3-CMS2   0.059410832 -0.3362973 0.4551190 0.9861281
CMS4-CMS2   0.190923779 -0.3325489 0.7143965 0.7515401
NOLBL-CMS2  0.006881733 -0.5165910 0.5303544 0.9999990
CMS4-CMS3   0.131512947 -0.4620493 0.7250752 0.9446944
NOLBL-CMS3 -0.052529099 -0.6460913 0.5410332 0.9981482
NOLBL-CMS4 -0.184042046 -0.8694287 0.5013446 0.8965756

[1] "Batch: 66.81.0"
[1] ""
            Df  Sum Sq Mean Sq F value Pr(>F)
cms          3 0.08189 0.02730   2.277  0.157
Residuals    8 0.09590 0.01199               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                   diff         lwr       upr     p adj
CMS2-CMS1   0.036650601 -0.21941045 0.2927117 0.9660708
CMS3-CMS1   0.004713347 -0.28157161 0.2909983 0.9999427
NOLBL-CMS1  0.310862899 -0.09400517 0.7157310 0.1423333
CMS3-CMS2  -0.031937254 -0.28799830 0.2241238 0.9769807
NOLBL-CMS2  0.274212298 -0.10987928 0.6583039 0.1804487
NOLBL-CMS3  0.306149552 -0.09871852 0.7110176 0.1498735

[1] "Batch: 132.71.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)
cms          3 0.02485 0.008283   1.097  0.405
Residuals    8 0.06042 0.007553               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                  diff        lwr       upr     p adj
CMS2-CMS1   0.01194094 -0.2886698 0.3125517 0.9992005
CMS3-CMS1   0.05233410 -0.2588276 0.3634958 0.9470580
NOLBL-CMS1 -0.12296258 -0.5165545 0.2706293 0.7536010
CMS3-CMS2   0.04039316 -0.1392561 0.2200425 0.8864457
NOLBL-CMS2 -0.13490352 -0.4355143 0.1657073 0.5130104
NOLBL-CMS3 -0.17529668 -0.4864584 0.1358650 0.3379948

[1] "Batch: 172.69.0"
[1] ""
[1] "Batch: 154.69.0"
[1] ""
            Df  Sum Sq  Mean Sq F value Pr(>F)
cms          2 0.00623 0.003113   0.292  0.761
Residuals    4 0.04263 0.010658               
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VAF ~ cms, data = MACP.batch)

$cms
                  diff        lwr       upr     p adj
CMS3-CMS2  0.002207149 -0.3164429 0.3208572 0.9996641
NOLBL-CMS2 0.086654112 -0.3639851 0.5372934 0.7841145
NOLBL-CMS3 0.084446963 -0.3269285 0.4958224 0.7596589
```

### Batch + CMS Normalized Comparison

Within a batch and CMS, create a comparison set to look at Hypo-modified CRC vs. Normo-modified CRC samples

```
# Batch Selection:
batch <- '116.77.0'
cms   <- 'CMS2'

MACP.select <- MACP[ which(MACP$cms == cms & MACP$batch == batch), ]
MACP.select <- MACP.select[ order(MACP.select$VAF), ]
MACP.select[, c(2,4,10,14,15)]

====================================================================

             lib.name           pt    batch       VAF hypo.macp
2487 TCGA-D5-6532-01A TCGA-D5-6532 116.77.0 0.1143430     .hypo *
2256 TCGA-AA-3712-01A TCGA-AA-3712 116.77.0 0.2150752     .hypo *
2556 TCGA-G4-6298-01A TCGA-G4-6298 116.77.0 0.2157952     .hypo *
2380 TCGA-AU-3779-01A TCGA-AU-3779 116.77.0 0.2779931     .hypo *
2568 TCGA-G4-6317-01A TCGA-G4-6317 116.77.0 0.3044773    .normo *
2554 TCGA-G4-6295-01A TCGA-G4-6295 116.77.0 0.3149887    .normo
2493 TCGA-D5-6538-01A TCGA-D5-6538 116.77.0 0.3346310    .normo
2562 TCGA-G4-6307-01A TCGA-G4-6307 116.77.0 0.3408818    .normo
2253 TCGA-AA-3697-01A TCGA-AA-3697 116.77.0 0.3434516    .normo
2231 TCGA-AA-3660-01A TCGA-AA-3660 116.77.0 0.3670422    .normo
2492 TCGA-D5-6537-01A TCGA-D5-6537 116.77.0 0.3807558    .normo
2143 TCGA-A6-5667-01A TCGA-A6-5667 116.77.0 0.3960426    .normo *
2233 TCGA-AA-3662-01A TCGA-AA-3662 116.77.0 0.4062256    .normo * 
2552 TCGA-G4-6293-01A TCGA-G4-6293 116.77.0 0.4151551    .normo *
2567 TCGA-G4-6315-01A TCGA-G4-6315 116.77.0 0.4326472    .normo *
2488 TCGA-D5-6533-01A TCGA-D5-6533 116.77.0 0.4345105    .normo *
```



```
# Batch Selection:

batch <- '116.77.0'
cms   <- 'CMS4'

MACP.select <- MACP[ which(MACP$cms == cms & MACP$batch == batch), ]
MACP.select <- MACP.select[ order(MACP.select$VAF), ]

MACP.select[, c(2,4,10,14,15)]
====================================================================

             lib.name           pt    batch       VAF hypo.macp
2566 TCGA-G4-6314-01A TCGA-G4-6314 116.77.0 0.2554967     .hypo *
2451 TCGA-CM-5344-01A TCGA-CM-5344 116.77.0 0.2678113     .hypo *
2538 TCGA-F4-6463-01A TCGA-F4-6463 116.77.0 0.3092360    .normo *
2453 TCGA-CM-5349-01A TCGA-CM-5349 116.77.0 0.3221711    .normo *
2565 TCGA-G4-6311-01A TCGA-G4-6311 116.77.0 0.3324151    .normo *
2085 TCGA-A6-2675-01A TCGA-A6-2675 116.77.0 0.3413247    .normo
2555 TCGA-G4-6297-01A TCGA-G4-6297 116.77.0 0.3836968    .normo
2452 TCGA-CM-5348-01A TCGA-CM-5348 116.77.0 0.4278800    .normo
2491 TCGA-D5-6536-01A TCGA-D5-6536 116.77.0 0.4371242    .normo *
2558 TCGA-G4-6302-01A TCGA-G4-6302 116.77.0 0.4777553    .normo *
2385 TCGA-AY-6196-01A TCGA-AY-6196 116.77.0 0.5022196    .normo *
2496 TCGA-D5-6541-01A TCGA-D5-6541 116.77.0 0.5104515    .normo *
2564 TCGA-G4-6310-01A TCGA-G4-6310 116.77.0 0.5256267    .normo *
```