# Convert bed files into bedgraphs

So that I can visualize the histogram of fragments in IGV and compare to the results

## Reference: Bedtools genomecov

https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html

```
$ cat A.bed
chr1  10  20
chr1  20  30
chr2  0   500

$ cat my.genome
chr1  1000
chr2  500

$ bedtools genomecov -i A.bed -g my.genome
chr1   0  980  1000  0.98
chr1   1  20   1000  0.02
chr2   1  500  500   1
genome 0  980  1500  0.653333
genome 1  520  1500  0.346667
```

## Reference: Genome

**Reference directory**

In [1]:
ls /data/reddylab/gjohnson/reference_data

Anshul_Hg19UltraHighSignalArtifactRegions.bed
Duke_Hg19SignalRepeatArtifactRegions.bed
generate_hg19_filter_out_bed.sh
generate_hg38_filter_out_bed.sh
generate_masked_ranges.py
GRCh38_EBV.chrom.sizes.tsv
hg19.1.bt2
hg19.2.bt2
hg19.3.bt2
hg19.4.bt2
hg19_chrom_sizes.txt
hg19_filter_out.bed
hg19_gaps.bed
hg19.genome.bed
hg19_genome_file_chr_notation.txt
hg19_genome_file.txt
hg19.rev.1.bt2
hg19.rev.2.bt2
hg38.1.bt2
hg38.2.bt2
hg38.3.bt2
hg38.4.bt2
hg38_all.genome.file
hg38_and_extra_contigs.genome.file
hg38_blacklist.bed
hg38_centromeres.bed
hg38_filter_out.bed
hg38_gap.bed
hg38.genome_autosomes.bed
hg38.genome.bed
hg38.genome.file
hg38.rev.1.bt2
hg38.rev.2.bt2
human_g1k_v37.fasta
human_g1k_v37.fasta.fai
make_hg38_filter_out.interactive
trimmomatic_48index.fa
trimmomatic_MHPS.fa
trimmomatic_UMI.fa
wgEncodeHg19ConsensusSignalArtifactRegions.bed


**According to Graham's script this is the genome file Graham used during the alignment**

In [2]:
%%bash
wc -l /data/reddylab/gjohnson/reference_data/hg38.genome.file
cat   /data/reddylab/gjohnson/reference_data/hg38.genome.file

24 /data/reddylab/gjohnson/reference_data/hg38.genome.file
chr1	248956422
chr10	133797422
chr11	135086622
chr12	133275309
chr13	114364328
chr14	107043718
chr15	101991189
chr16	90338345
chr17	83257441
chr18	80373285
chr19	58617616
chr2	242193529
chr20	64444167
chr21	46709983
chr22	50818468
chr3	198295559
chr4	190214555
chr5	181538259
chr6	170805979
chr7	159345973
chr8	145138636
chr9	138394717
chrX	156040895
chrY	57227415


## Test: Convert a bed file to bedgraph

In [46]:
%%bash
### set environment
module load bedtools2
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_REF=/data/reddylab/gjohnson/reference_data
FN_GEN=hg38.genome.file

FD_OUT=${FD_WRK}/data/Input1
FN_DAT=chr17.bed
FN_OUT=chr17.bedgraph

bedtools genomecov -i ${FD_OUT}/${FN_DAT} -g ${FD_REF}/${FN_GEN} -bg > ${FD_OUT}/${FN_OUT}

In [48]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_OUT=${FD_WRK}/data/Input1
FN_DAT=chr17.bed
FN_OUT=chr17.bedgraph

head ${FD_OUT}/${FN_OUT}

chr17	201123	201160	1
chr17	201160	201202	2
chr17	201202	201259	3
chr17	201259	201366	4
chr17	201366	201925	5
chr17	201925	201968	4
chr17	201968	202052	3
chr17	202052	202074	2
chr17	202074	202130	1
chr17	206174	206269	1


## Convert all Input bed files into bedgraph files

**Test loop**

In [3]:
%%bash
FD_ALIGN=/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments
FD_BEDS=($(ls -d ${FD_ALIGN}/Input*/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments/Input1/
Input1
/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments/Input2/
Input2
/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments/Input3/
Input3
/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments/Input4/
Input4
/data/reddylab/gjohnson/whole_genome_STARRseq/wgss3/alignment_and_processing/alignments/Input5/
Input5


In [6]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/Input?/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/Input1/
Input1
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input2/
Input2
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input3/
Input3
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input4/
Input4
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input5/
Input5


In [5]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/Input*/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/Input/
Input
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input1/
Input1
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input2/
Input2
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input3/
Input3
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input4/
Input4
/data/reddylab/Kuei/out/CombEffect_STARR/data/Input5/
Input5


**Bed to Bedgraphs**

In [11]:
%%bash
### set environment
module load bedtools2
module load perl
module load gcc
source /data/reddylab/software/miniconda2/bin/activate alex_dev
export PATH=/data/reddylab/software/homer/bin/:$PATH

### set log file directory
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log

### run script using sbatch
sbatch -pnew,all \
    --array=0-5 \
    --mem 8G \
    -o ${FD_LOG}/prep_bed2bedgraph_input_chr17.%a.txt \
    <<'EOF'
#!/bin/bash
### set directories & global variables
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_REF=/data/reddylab/gjohnson/reference_data
FN_GEN=hg38.genome.file
CHROM="chr17"

### set input & output directory
FD_BEDS=($(ls -d ${FD_WRK}/data/Input*/))
FD_BED=${FD_BEDS[${SLURM_ARRAY_TASK_ID}]}
FN_BED=${CHROM}.bed
FD_BGH=${FD_WRK}/bedgraph/$(basename ${FD_BED})
FN_BGH=${CHROM}.bedgraph

### print start message
echo "Slurm Array Index: " ${SLURM_ARRAY_TASK_ID}
echo "Input  file:       " ${FD_BED}/${FN_BED}
echo "Output file:       " ${FD_BGH}/${FN_BGH}
echo
echo "Show the first few lines of the input file"
head ${FD_BED}/${FN_BED}

### init: create folder
mkdir -p ${FD_BGH}

### convert bed file to bedgraph
bedtools genomecov -i ${FD_BED}/${FN_BED} -g ${FD_REF}/${FN_GEN} -bg > ${FD_BGH}/${FN_BGH}

### print end message
echo
echo "Show the first few lines of the output file"
head ${FD_BGH}/${FN_BGH}

EOF

Submitted batch job 25435792


**Check results**

In [12]:
%%bash
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log
cat ${FD_LOG}/prep_bed2bedgraph_input_chr17.0.txt

Slurm Array Index:  0
Input  file:        /data/reddylab/Kuei/out/CombEffect_STARR/data/Input//chr17.bed
Output file:        /data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input/chr17.bedgraph

Show the first few lines of the input file
chr17	201123	201925
chr17	201160	201968
chr17	201202	202074
chr17	201259	202052
chr17	201366	202130
chr17	206174	207307
chr17	206269	207286
chr17	206283	207194
chr17	206324	207359
chr17	206326	207274

Show the first few lines of the output file
chr17	159510	160362	1
chr17	182115	183108	1
chr17	197679	198597	1
chr17	201123	201133	1
chr17	201133	201145	2
chr17	201145	201155	3
chr17	201155	201160	6
chr17	201160	201174	7
chr17	201174	201190	8
chr17	201190	201202	9


## Ouptut (DMSO)

**Test loop**

In [13]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX?_DMSO/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_DMSO/
TFX2_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX3_DMSO/
TFX3_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX4_DMSO/
TFX4_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX5_DMSO/
TFX5_DMSO


In [14]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX*_DMSO/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_DMSO/
TFX2_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX3_DMSO/
TFX3_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX4_DMSO/
TFX4_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX5_DMSO/
TFX5_DMSO
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX_DMSO/
TFX_DMSO


**Bed to Bedgraph**

In [15]:
%%bash
### set environment
module load bedtools2
module load perl
module load gcc
source /data/reddylab/software/miniconda2/bin/activate alex_dev
export PATH=/data/reddylab/software/homer/bin/:$PATH

### set log file directory
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log

### run script using sbatch
sbatch -pnew,all \
    --array=0-4 \
    --mem 8G \
    -o ${FD_LOG}/prep_bed2bedgraph_output_dmso_chr17.%a.txt \
    <<'EOF'
#!/bin/bash
### set directories & global variables
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_REF=/data/reddylab/gjohnson/reference_data
FN_GEN=hg38.genome.file
CHROM="chr17"

### set input & output directory
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX*_DMSO/))
FD_BED=${FD_BEDS[${SLURM_ARRAY_TASK_ID}]}
FN_BED=${CHROM}.bed
FD_BGH=${FD_WRK}/bedgraph/$(basename ${FD_BED})
FN_BGH=${CHROM}.bedgraph

### print start message
echo "Slurm Array Index: " ${SLURM_ARRAY_TASK_ID}
echo "Input  file:       " ${FD_BED}/${FN_BED}
echo "Output file:       " ${FD_BGH}/${FN_BGH}
echo
echo "Show the first few lines of the input file"
head ${FD_BED}/${FN_BED}

### init: create folder
mkdir -p ${FD_BGH}

### convert bed file to bedgraph
bedtools genomecov -i ${FD_BED}/${FN_BED} -g ${FD_REF}/${FN_GEN} -bg > ${FD_BGH}/${FN_BGH}

### print end message
echo
echo "Show the first few lines of the output file"
head ${FD_BGH}/${FN_BGH}

EOF

Submitted batch job 25435798


**Check results**

In [16]:
%%bash
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log
cat ${FD_LOG}/prep_bed2bedgraph_output_dmso_chr17.0.txt

Slurm Array Index:  0
Input  file:        /data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_DMSO//chr17.bed
Output file:        /data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX2_DMSO/chr17.bedgraph

Show the first few lines of the input file
chr17	87067	87989
chr17	158043	159067
chr17	158043	159066
chr17	159137	160020
chr17	170572	172531
chr17	172392	173515
chr17	172393	173515
chr17	172396	173514
chr17	197679	198597
chr17	197681	198597

Show the first few lines of the output file
chr17	87067	87989	1
chr17	158043	159066	2
chr17	159066	159067	1
chr17	159137	160020	1
chr17	170572	172392	1
chr17	172392	172393	2
chr17	172393	172396	3
chr17	172396	172531	4
chr17	172531	173514	3
chr17	173514	173515	2


## Output (Dex)

**Test loop**

In [17]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX?_Dex/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_Dex/
TFX2_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX3_Dex/
TFX3_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX4_Dex/
TFX4_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX5_Dex/
TFX5_Dex


In [18]:
%%bash
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX*_Dex/))
for FD_BED in "${FD_BEDS[@]}"; do
    echo  ${FD_BED}
    echo "$(basename -- $FD_BED)"
done

/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_Dex/
TFX2_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX3_Dex/
TFX3_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX4_Dex/
TFX4_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX5_Dex/
TFX5_Dex
/data/reddylab/Kuei/out/CombEffect_STARR/data/TFX_Dex/
TFX_Dex


**Bed to Bedgraph**

In [19]:
%%bash
### set environment
module load bedtools2
module load perl
module load gcc
source /data/reddylab/software/miniconda2/bin/activate alex_dev
export PATH=/data/reddylab/software/homer/bin/:$PATH

### set log file directory
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log

### run script using sbatch
sbatch -pnew,all \
    --array=0-4 \
    --mem 8G \
    -o ${FD_LOG}/prep_bed2bedgraph_output_dex_chr17.%a.txt \
    <<'EOF'
#!/bin/bash
### set directories & global variables
FD_WRK=/data/reddylab/Kuei/out/CombEffect_STARR
FD_REF=/data/reddylab/gjohnson/reference_data
FN_GEN=hg38.genome.file
CHROM="chr17"

### set input & output directory
FD_BEDS=($(ls -d ${FD_WRK}/data/TFX*_Dex/))
FD_BED=${FD_BEDS[${SLURM_ARRAY_TASK_ID}]}
FN_BED=${CHROM}.bed
FD_BGH=${FD_WRK}/bedgraph/$(basename ${FD_BED})
FN_BGH=${CHROM}.bedgraph

### print start message
echo "Slurm Array Index: " ${SLURM_ARRAY_TASK_ID}
echo "Input  file:       " ${FD_BED}/${FN_BED}
echo "Output file:       " ${FD_BGH}/${FN_BGH}
echo
echo "Show the first few lines of the input file"
head ${FD_BED}/${FN_BED}

### init: create folder
mkdir -p ${FD_BGH}

### convert bed file to bedgraph
bedtools genomecov -i ${FD_BED}/${FN_BED} -g ${FD_REF}/${FN_GEN} -bg > ${FD_BGH}/${FN_BGH}

### print end message
echo
echo "Show the first few lines of the output file"
head ${FD_BGH}/${FN_BGH}

EOF

Submitted batch job 25435803


**Check results**

In [20]:
%%bash
FD_LOG=/gpfs/fs1/data/reddylab/Kuei/out/CombEffect_STARR/log
cat ${FD_LOG}/prep_bed2bedgraph_output_dex_chr17.0.txt

Slurm Array Index:  0
Input  file:        /data/reddylab/Kuei/out/CombEffect_STARR/data/TFX2_Dex//chr17.bed
Output file:        /data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX2_Dex/chr17.bedgraph

Show the first few lines of the input file
chr17	83638	84547
chr17	92503	93508
chr17	152590	153715
chr17	159027	160041
chr17	173500	174429
chr17	174388	175345
chr17	174388	175346
chr17	197582	198583
chr17	201248	202059
chr17	201249	202059

Show the first few lines of the output file
chr17	83638	84547	1
chr17	92503	93508	1
chr17	152590	153715	1
chr17	159027	160041	1
chr17	173500	174388	1
chr17	174388	174429	3
chr17	174429	175345	2
chr17	175345	175346	1
chr17	197582	198583	1
chr17	201248	201249	1


In [21]:
ls -1 /data/reddylab/Kuei/out/CombEffect_STARR/bedgraph

[0m[01;34mInput[0m/
[01;34mInput1[0m/
[01;34mInput2[0m/
[01;34mInput3[0m/
[01;34mInput4[0m/
[01;34mInput5[0m/
[01;34mTFX2_Dex[0m/
[01;34mTFX2_DMSO[0m/
[01;34mTFX3_Dex[0m/
[01;34mTFX3_DMSO[0m/
[01;34mTFX4_Dex[0m/
[01;34mTFX4_DMSO[0m/
[01;34mTFX5_Dex[0m/
[01;34mTFX5_DMSO[0m/
[01;34mTFX_Dex[0m/
[01;34mTFX_DMSO[0m/
[m

In [25]:
ls -1 /data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/*/*bedgraph

/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input1/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input2/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input3/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input4/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input5/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/Input/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX2_Dex/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX2_DMSO/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX3_Dex/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX3_DMSO/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX4_Dex/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX4_DMSO/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgraph/TFX5_Dex/chr17.bedgraph
/data/reddylab/Kuei/out/CombEffect_STARR/bedgrap