# Meta analysis

Most of the analysis here need bedtools. 
[example](https://github.com/arq5x/bedtools-protocols/blob/master/bedtools.md#bp6--measuring-dataset-similarity)

**Processed ATAC-seq reads**

ATAC-seq read files are in `/gpfs/commons/groups/sanjana_lab/cdai/TFscreen/atac/bams_v3`. 

The finalized bam/bed files are filtered to include only **properly paired reads, with mapq > 20**. If use bam files, use these files: `ATAC*.PE.mapq.bam`, or alternatively use these bed files: `ATAC*.PE.mapq.bed`. 

Bed files are converted using bedtools. Note that each paired alignments are writen twice noted with `1` or `2` in the column. This means, if you sum up all the reads, you are actually 2x the number of fragments.<br>
`bedtools bamtobed -i $bam > $out`

**Annotations**
Annotations are in `/gpfs/commons/groups/sanjana_lab/cdai/TFscreen/atac/annotations`. `Gencode_hg38_v31_proteincoding_gene_features.bed` gives all the protein coding gene's annotations, including these features:
- exon
- UTR
- intron
- promoter (TSS - 2kb, TSS + 0.5kb)
- enchancer (TSS - 10kb, TSS - 2 kb). Consider this as intergenic region instead, or just remove it.

#### Make 10bp windows

`bin_annotations.sh` makes 10bp windows for a given bed

```bedtools makewindows \ 
    -b HitTF.promoter_up2k_dn2k.bed \
	-w 10 \
	-i srcwinnum \
	| sort -k1,1 -k2,2 -V \
	| tr "_" "\t" \
	> HitTF.promoter_up2k_dn2k_10bwindow.bed```

The following binned annotations are created: 
- `HitTF.promoter_up2k_dn2k_10bwindow.bed`
- `NonHitTF.promoter_up2k_dn2k_10bwindow.bed`

#### Summarize read counts to each binned windows

`sum_binned_coverage.sh` maps coverage to overlapping coordinates (binned),then summmarize over the bin

```bedtools map \
	-a NonHitTF.promoter_up2k_dn2k_10bwindow.bed \
	-b $bedg \
	-c 4 \
	-o mean \
	-null 0 \
	-g /c/groups/sanjana_lab/cdai/ref_genome/hg38_chrom_size.txt \
	> $out```

The following summarized files are created:
- `ATAC{1..12}.HitTF.promoter.10bpwindow.coverage`
- `ATAC{1..12}.NonHitTF.promoter.10bpwindow.coverage`
- `ATAC{1..12}.AllHitTF.promoter.10bpwindow.coverage`

In [24]:
cd /gpfs/commons/groups/sanjana_lab/cdai/TFscreen/atac/meta_analysis

In [29]:
ls *.sh

[0m[38;5;34mbin_annotations.sh[0m      [38;5;34mcomputeMatrix_range.sh[0m
[38;5;34mcomputeMatrix_point.sh[0m  [38;5;34msum_binned_coverage.sh[0m


In [16]:
head -5 hitTF_promoter_up2k_dn1kb_5bp_windows.bed

chr1	2226319	2226324	SKI	1
chr1	2226324	2226329	SKI	2
chr1	2226329	2226334	SKI	3
chr1	2226334	2226339	SKI	4
chr1	2226339	2226344	SKI	5


#### Break up each 3kb interval flanking each TSS into 5bp sub-windows, using `makewindws`.

In [18]:
head ../../../bams_v3/ATAC1.PE.mapq.chr.bed

chr1	10426	10463	NB501157:251:HG7FNBGX9:2:23202:10351:14853/2	3	+
chr1	10436	10473	NB501157:251:HG7FNBGX9:2:23202:10351:14853/1	37	-
chr1	12996	13033	NB501157:251:HG7FNBGX9:3:11510:22068:4357/2	15	+
chr1	13294	13331	NB501157:251:HG7FNBGX9:3:11510:22068:4357/1	23	-
chr1	17139	17176	NB501157:251:HG7FNBGX9:1:11202:26265:16487/1	29	+
chr1	17272	17309	NB501157:251:HG7FNBGX9:3:13412:25792:11431/1	29	+
chr1	17407	17444	NB501157:251:HG7FNBGX9:3:22412:11444:2335/1	29	+
chr1	17471	17508	NB501157:251:HG7FNBGX9:3:22412:11444:2335/2	37	-
chr1	17472	17509	NB501157:251:HG7FNBGX9:1:11202:26265:16487/2	37	-
chr1	17475	17512	NB501157:251:HG7FNBGX9:2:23304:12670:6495/2	37	+


In [6]:
bedtools makewindows -b hitTF_promoter_up2k_dn1kb.bed.bed \
    -w 5 -i srcwinnum | sort -k1,1 -k2,2n | tr "_" "\t" > hitTF_promoter_up2k_dn1kb_5bp_windows.bed

#### Map the transcription factor

In [23]:
bedtools map \
    -a hitTF_promoter_up2k_dn1kb_5bp_windows.bed \
    -b ../../../bams_v3/ATAC1.PE.mapq.chr.bed \
    -c 4 \
    -o count_distinct \
    -null 0 \
    -g /c/groups/sanjana_lab/cdai/ref_genome/hg38_chrom_size.txt \
 > hitTF_promoter_up2k_dn1kb_5bp_windows.coverage.bed
    

In [21]:
pwd

/gpfs/commons/groups/sanjana_lab/cdai/TFscreen/atac/macs2/v7/annotate_peaks_w_features


## Count reads from ATAC bam files to known intervals

count script: `/c/groups/sanjana_lab/cdai/TFscreen/atac/bams_v3/calc_TF_promoters_counts.sh`

count results: `/c/groups/sanjana_lab/cdai/TFscreen/atac/bams_v3/calc_TF_promoters_counts.sh/ATAC*.PE.mapq.bam.counts`

In [26]:
cd /c/groups/sanjana_lab/cdai/TFscreen/atac/bams_v3

In [27]:
ls *.counts *.sh

ATAC10.PE.mapq.bam.counts  ATAC8.PE.mapq.bam.counts
ATAC11.PE.mapq.bam.counts  ATAC9.PE.mapq.bam.counts
ATAC12.PE.mapq.bam.counts  [0m[38;5;34mbam2bed.sh[0m
ATAC1.PE.mapq.bam.counts   [38;5;34mbamCoverage.sh[0m
ATAC2.PE.mapq.bam.counts   [38;5;34mbed_filter_chr.sh[0m
ATAC3.PE.mapq.bam.counts   [38;5;34mbin_annotations.sh[0m
ATAC4.PE.mapq.bam.counts   [38;5;34mcalc_TF_promoters_counts.sh[0m
ATAC5.PE.mapq.bam.counts   [38;5;34mfilter_bam.sh[0m
ATAC6.PE.mapq.bam.counts   [38;5;34msubset_bam.sh[0m
ATAC7.PE.mapq.bam.counts   [38;5;34mtoy.sh[0m


**Total number of reads for each sample**

In [29]:
samtools view ATAC10.PE.mapq.bam | cut -f 1 | head

NB501157:251:HG7FNBGX9:2:22206:3402:19725
NB501157:251:HG7FNBGX9:2:22206:3402:19725
NB501157:251:HG7FNBGX9:2:13205:20539:6230
NB501157:251:HG7FNBGX9:2:13205:20539:6230
NB501157:251:HG7FNBGX9:3:23603:8138:14842
NB501157:251:HG7FNBGX9:2:13303:22247:3500
NB501157:251:HG7FNBGX9:1:21206:8453:13144
NB501157:251:HG7FNBGX9:2:13303:22247:3500
NB501157:251:HG7FNBGX9:1:21206:8453:13144
NB501157:251:HG7FNBGX9:3:23603:8138:14842
cut: write error: Broken pipe


In [52]:
pwd

/c/groups/sanjana_lab/cdai/TFscreen/atac/bams_v3


In [48]:
echo "abc" > abc.txt

In [51]:
cat abc.txt

In [58]:
samtools view ATAC10.PE.mapq.bam | head -20 | cut -f1 

NB501157:251:HG7FNBGX9:2:22206:3402:19725
NB501157:251:HG7FNBGX9:2:22206:3402:19725
NB501157:251:HG7FNBGX9:2:13205:20539:6230
NB501157:251:HG7FNBGX9:2:13205:20539:6230
NB501157:251:HG7FNBGX9:3:23603:8138:14842
samtools view: NB501157:251:HG7FNBGX9:2:13303:22247:3500
NB501157:251:HG7FNBGX9:1:21206:8453:13144
NB501157:251:HG7FNBGX9:2:13303:22247:3500
writing to standard output failedNB501157:251:HG7FNBGX9:1:21206:8453:13144
NB501157:251:HG7FNBGX9:3:23603:8138:14842
: Broken pipe
NB501157:251:HG7FNBGX9:3:11610:20233:3851
NB501157:251:HG7FNBGX9:3:11610:20233:3851
NB501157:251:HG7FNBGX9:1:22309:3851:16319
NB501157:251:HG7FNBGX9:1:22309:3851:16319
NB501157:251:HG7FNBGX9:4:21601:15960:14386
NB501157:251:HG7FNBGX9:1:13305:13044:14470
NB501157:251:HG7FNBGX9:2:12307:4385:14030
NB501157:251:HG7FNBGX9:4:21602:24195:4185
NB501157:251:HG7FNBGX9:1:13305:13044:14470
samtools view: NB501157:251:HG7FNBGX9:2:12307:4385:14030
error closing standard output: -1
