# General file editing

## Purpose 
The following commands are used to edit files that aren't for one particular program. Associated scripts are found in the [General](https://github.com/brittanyhowell/sv_discovery/blob/master/general) folder of [SV_discovery](https://github.com/brittanyhowell/sv_discovery).

## April 12

### Make a bam index for all 4760 chr20 only bams
Commit: [moved file](https://github.com/brittanyhowell/sv_discovery/blob/master/general/indexBAM.sh)
```
bsub -J "BDArray[1-4760]" -o /nfs/team151/bh10/scripts/breakdancer_bh10/output/indexBAM-%I-%J.out -e /nfs/team151/bh10/scripts/breakdancer_bh10/output/indexBAM-%I-%J.err "/nfs/team151/bh10/scripts/breakdancer_bh10/indexBAM.sh /nfs/team151/bh10/scripts/genomestrip_bh10/fileLists/bamList.txt"
```


## April 18

### Filter bams based on standard chromosomes, run on WG bams
Commit: [added reporting echoes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/extract_std_chr.sh)
```
bsub -n3 -J  "filter[1-226]" -o /nfs/team151/bh10/scripts/bh10_general/output/filter/filter_std_chr%I-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/filter/filter_std_chr%I-%J.err -R"select[mem>2000] rusage[mem=2000] span[hosts=1]" -M2000 /nfs/team151/bh10/scripts/bh10_general/extract_std_chr.sh

Job <9600455> is submitted to default queue <normal>.
```


## April 20

### Index 225 WG Bams
Commit: [rename file, change DIRs](https://github.com/brittanyhowell/sv_discovery/blob/master/general/indexSAM.sh)

```
bsub -J "BDArray[1-225]" -o /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.err "/nfs/team151/bh10/scripts/bh10_general/indexBAM.sh /lustre/scratch115/projects/interval_wgs/WGbams_std/WG_225_std.list"

Job <9667951> is submitted to default queue <normal>.
```


## April 26 

### Convert 3724 WG CRAMs to BAMs
Commit: [Changed for 3724 WG CRAMs](https://github.com/brittanyhowell/sv_discovery/blob/master/general/CRAMtoBAM.sh)

```
bsub -J "cramArray[1-3724]" -o /nfs/team151/bh10/scripts/bh10_general/output/CRAMtoBAM/CRAM-to-BAM-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/CRAMtoBAM/CRAM-to-BAM-%J-%I.err /nfs/team151/bh10/scripts/bh10_general/CRAMtoBAM.sh

Job <218379> is submitted to default queue <normal>.
```
#### Failed: wrong list of files

### Use correct file list!
Commit: [correct file list this time I promise](https://github.com/brittanyhowell/sv_discovery/blob/master/general/CRAMtoBAM.sh)

```
bsub -J "cramArray[1-3724]" -o /nfs/team151/bh10/scripts/bh10_general/output/CRAMtoBAM/CRAM-to-BAM-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/CRAMtoBAM/CRAM-to-BAM-%J-%I.err /nfs/team151/bh10/scripts/bh10_general/CRAMtoBAM.sh

Job <218702> is submitted to default queue <normal>.
```

#### Output: (.err)
```
cat *218702*.err 
```

### Index WG BAMs 

Commit: [change for 3724 WG bams](https://github.com/brittanyhowell/sv_discovery/blob/master/general/indexBAM.sh)
```
bsub -J "BDArray[1-3724]" -o /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.err "/nfs/team151/bh10/scripts/bh10_general/indexBAM.sh /nfs/team151/bh10/scripts/bh10_general/fileLists/WG_bams.list"

Job <256158> is submitted to default queue <normal>.
```


#### Output: .out file
```
/nfs/team151/bh10/scripts/bh10_general/output - 

ls *.out | wc -l:  3724
cat *.out | grep -c "Successfully completed.": 3724

```
Ah wait. Just because it "sucessfully completed" the job, doesn't mean the job successfully did what it was meant to do. 

#### Output: Any .err file
```
line 34: /lustre/scratch115/projects/interval_wgs/WGbams_std/EGAN00001345051.bam.bai: No such file or directory
```
Issue: I forgot to update the code on the FARM before it ran. So it still operated on the previous DIRs


## April 27

### Index WG BAMs
Nothing changed, I forgot to update the code on the FARM. So it was looking for old DIRs. 

```
bsub -J "BDArray[1-3724]" -o /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/indexBAM-%J-%I.err "/nfs/team151/bh10/scripts/bh10_general/indexBAM.sh /nfs/team151/bh10/scripts/bh10_general/fileLists/WG_bams.list"

Job <257181> is submitted to default queue <normal>.
```
#### Output: 
```
cat indexBAM-257181-*.err - No printing.
cat *.out | grep -c "Successfully completed." : 3724. Nice. 
```


## May 16

### Intersect with GENCODE protein coding genes
Considering I have a list of x000 samples, I think I am going to cat a long list of protein coding genes and then use it and then delete it. 

``` zcat chr*.gz | awk  '$5=="protein_coding"' > all_chr_protein_coding.txt ```



### run Intersect


### Intersect

#### intersect-genes.go
Commit: [changed output file format to make merging more logical](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)

#### run-intersect-genes.sh
Commit: [set up for array](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes.sh)
```
bsub -J "int[1-3643]" -o /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.err "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes.sh" 

Job <593957> is submitted to default queue <normal>.
```

### intersect again
I made the output more verbose so I could investigate a few options and make sure it was working

#### intersect-genes.go
Commit: [more verbose output](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)

#### run-intersect-genes.sh
Commit: [change outDIR](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes.sh)
```
bsub -J "int[1-3643]" -o /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.err "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes.sh" 

Job <594192> is submitted to default queue <normal>.
```
### Intersect, once more
Honestly, there were many more attempts in the middle of these ones, but this one works.

#### intersect-genes.go
Commit: [The issue with sex chromosomes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)

#### run-intersect-genes.sh
Commit: [The issue with sex chromosomes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes.sh)

```
bsub -n3  -J "int[1-3642]" -o /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.err  -R"select[mem>2000] rusage[mem=2000] span[hosts=1]" -M2000 "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes.sh" 

Job <598780> is submitted to default queue <normal>.
```


### Intersect again!

No need to merge chr 23 with 24, it's both a waste of time *AND* innaccurate
 
 #### intersect-genes.go
Commit: [The issue with sex chromosomes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)

#### run-intersect-genes_BD.sh
Commit: [remove chr merge](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes_BD.sh)


```
bsub -n3  -J "int[1-3642]" -o /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectBD/intersect-%J-%I.err  -R"select[mem>2000] rusage[mem=2000] span[hosts=1]" -M2000 "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes.sh" 
Job <600494> is submitted to default queue <normal>.
```


## May 17 

### Merge files 

merges all intersects into one super power table to rule all tables.

I regret my commit. Why. 
#### run_merge_intersectGenes.sh
Commit: [hi i run a thing](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_merge_intersectGenes.sh)
```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%I.err -R"select[mem>1000] rusage[mem=1000]" -M1000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <711848> is submitted to default queue <normal>.
```

## May 18

Resubmit with more memory: 

#### run_merge_intersectGenes.sh
Commit: [hi i run a thing](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_merge_intersectGenes.sh)

```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.err -R"select[mem>4000] rusage[mem=4000]" -M4000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <725988> is submitted to default queue <normal>.
```
Failed after merging 400. I have over 3000. I may not have enough memory. 

### Hey lets try this one.

I'll look up a new way to merge the tables. For now, throw more memory at it. 
```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.err -R"select[mem>20000] rusage[mem=20000]" -M20000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <726617> is submitted to default queue <normal>.
```

Failed after merging 2018 - it ran out of time.


## May 21
### Run intersect.go on non genotyped discovery data

#### 
run-intersect-genes_GS_no_geno.sh
Commit: [we run this](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes_GS_no_geno.sh)

#### intersect_genes.go
Commit: [The issue with sex chromosomes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)
```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/intersectGS/intersect-%J-%I.out -e bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/intersectGS/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectGS/intersect-%J-%I.err  -R"select[mem>1000] rusage[mem=1000]" -M1000  "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes_GS.sh" 

Job <911058> is submitted to default queue <normal>.
```


### Run intersect.go on genotyped CNV data

#### run-intersect-genes_GS_geno.sh
Commit: [we run this](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-intersect-genes_GS_geno.sh)

#### intersect_genes.go
Commit: [The issue with sex chromosomes](https://github.com/brittanyhowell/sv_discovery/blob/master/general/intersect_genes.go)

```
bsub -J "int[1-226]"  -o /nfs/team151/bh10/scripts/bh10_general/output/intersectGS/geno/intersect-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/intersectGS/geno/intersect-%J-%I.err  -R"select[mem>1000] rusage[mem=1000]" -M1000  "/nfs/team151/bh10/scripts/bh10_general/run-intersect-genes_GS.sh" 

Job <923014> is submitted to default queue <normal>.
```

```
grep "Successfully completed." *.out | wc -l : 226
```



## May 22

### run-filter_repeats.sh

run the repeat filter over the discovery set 

####  run-filter_repeats.sh
Commit: [arguments added - suited to run on disc. GS set](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run-filter_repeats.sh)

#### filter-repeats.R
Commit: [arguments added - suited to run on disc. GS set](https://github.com/brittanyhowell/sv_discovery/blob/master/general/filter-repeats.R)

```
 bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/repeatFilter/repFilt-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/repeatFilter/repFilt-%J-%I.err  -R"select[mem>1000] rusage[mem=1000]" -M1000  "/nfs/team151/bh10/scripts/bh10_general/run-filter_repeats.sh" 

Job <933669> is submitted to default queue <normal>.
```

### Run again, with more memory

Upgraded from 1GB to 3GB
```
 bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/repeatFilter/repFilt-%J-%I.out -e /nfs/team151/bh10/scripts/bh10_general/output/repeatFilter/repFilt-%J-%I.err  -R"select[mem>3000] rusage[mem=3000]" -M3000  "/nfs/team151/bh10/scripts/bh10_general/run-filter_repeats.sh" 

Job <935170> is submitted to default queue <normal>.

```


## May 22

### Merge small - GS CNV
Run merge on th 226 GS samples - this should work, and is useful for the association study. If only for now.

#### run_merge_intersectGenes.sh
Commit:[run merge on 226 samples, rather than thousands. Might work](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_merge_intersectGenes.sh)

#### merge_intersectGenes.R
Commit:[run merge on 226 samples, rather than thousands. Might work](https://github.com/brittanyhowell/sv_discovery/blob/master/general/merge_intersectGenes.R)

```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.err -R"select[mem>10000] rusage[mem=10000]" -M10000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <949105> is submitted to default queue <normal>.
```

### Extract 226 BD samples that match the 226 GS ones

#### extract_226.sh
Commit:[extract the 226 GS files from the pool of BD samples](https://github.com/brittanyhowell/sv_discovery/blob/master/general/extract_226.sh)


### Merge small - BD
Run merge on th 226 BD samples - this should work, the last one did.

#### run_merge_intersectGenes.sh
Commit:[run merge on 226 samples, rather than thousands. Might work](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_merge_intersectGenes.sh)

#### merge_intersectGenes.R
Commit:[changed for BD merge](https://github.com/brittanyhowell/sv_discovery/blob/master/general/merge_intersectGenes.R)

```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.err -R"select[mem>10000] rusage[mem=10000]" -M10000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <951389> is submitted to default queue <normal>.
```

Commit: [why. Why did I forget the D](https://github.com/brittanyhowell/sv_discovery/blob/master/general/merge_intersectGenes.R)
```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/merge-%J.err -R"select[mem>10000] rusage[mem=10000]" -M10000 /nfs/team151/bh10/scripts/bh10_general/mergeTab.sh

Job <953274> is submitted to default queue <normal>.
```


### Association - CNV

#### extract_for_association.R
Commit: [CNV association run 1](https://github.com/brittanyhowell/sv_discovery/blob/master/general/extract_for_association.R)

#### association.R
Commit: [CNV association run 1](https://github.com/brittanyhowell/sv_discovery/blob/master/general/association.R)

#### run_CNV_association.sh
Commit: [CNV association run 1](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_CNV_association.sh)
```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.err -R"select[mem>1000] rusage[mem=1000]" -M1000 /nfs/team151/bh10/scripts/bh10_general/run-association_CNV.sh

Job <973832> is submitted to default queue <normal>.

```

### Forgot about the trailing args..


#### extract_for_association.R
Commit: [added comments and trailing args arguments](https://github.com/brittanyhowell/sv_discovery/blob/master/general/extract_for_association.R)

#### association.R
Commit: [added comments and trailing args arguments](https://github.com/brittanyhowell/sv_discovery/blob/master/general/association.R)

#### run_CNV_association.sh
Commit: [added comments and trailing args arguments](https://github.com/brittanyhowell/sv_discovery/blob/master/general/run_CNV_association.sh)



```
 bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.err -R"select[mem>1000] rusage[mem=1000]" -M1000 /nfs/team151/bh10/scripts/bh10_general/run-association_CNV.sh

Job <974347> is submitted to default queue <normal>.
```

#### run with more memory - it ran out

```
bsub  -o /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.out -e /nfs/team151/bh10/scripts/bh10_general/output/associate_CNV-%J.err -R"select[mem>4000] rusage[mem=4000]" -M4000 /nfs/team151/bh10/scripts/bh10_general/run-association_CNV.sh

Job <1000339> is submitted to default queue <normal>.
```

## May 23
### Extract 226 samples of BD for the intersect
