here I'm using a custom [scallop](https://github.com/Kingsford-Group/scallop) + [gtfmerge](https://github.com/Kingsford-Group/rnaseqtools#gtfmerge) pipeline to create merged gtf annotation of hg19 and tinat transcripts introduced in this 

Alex
> I was curious myself so I decided to align our RNA-seq data to their chimeric transcripts... it looks promising, but let me know if there's anything in my analysis I need to change. Still working on seeing if we can identify alternative TSS without CAGE-seq. 
I'm guessing we can look for novel transcripts using [StringTie](https://ccb.jhu.edu/software/stringtie/) or cufflinks?

Ray
> I'm asking because in [this paper](https://www.nature.com/articles/ng.3889) they treated lung cancer cells with decitabine and performed Cage-seq to map and identify cryptic transcription start sites. They found cryptic TSSs that generated spliced variants, half of which were in-frame isoforms (and some of which resulted in chimeric transcripts, others original or truncated). This got me thinking about our project and whether we could see these chimeric/fusion transcripts (that are skewed toward the 5'end due Cage-seq and being TSS-induced) in HL60s.  


Luke
> Hi Ray, Alex and Abe
This is an interesting paper which claims genetic alterations to DNA methylation core genes drive dysregulated hematopoietic development due to CpG changes that regulate transcription factor binding (figure 4 and 5). How many TFs are methylation sensitive has been controversial in my understanding but is an interesting idea with lots of data within mylomonocytic cells. https://www.nature.com/articles/s41588-020-0595-4
I am not suggesting a specific plan but this is an interesting idea.

___
last update    

    Thu Oct 21 01:06:19 UTC 2021



In [2]:
cat ../scallop-genome/tinat/tinat-gtf-list.txt

hg19/hg19.knownGene.gtf
tinat/DAC+SB_TINATs.gtf
tinat/DAC_TINATs.gtf
tinat/SB939_TINATs.gtf



In [5]:
# cat scallop/tinat/tinat-gtf-list.txt
wc -l ../scallop-genome/tinat/DAC+SB_TINATs.gtf
wc -l ../scallop-genome/tinat/DAC_TINATs.gtf
wc -l ../scallop-genome/tinat/SB939_TINATs.gtf

12172 ../scallop-genome/tinat/DAC+SB_TINATs.gtf
1403 ../scallop-genome/tinat/DAC_TINATs.gtf
2059 ../scallop-genome/tinat/SB939_TINATs.gtf


In [3]:
cat ../scallop-genome/tinat.sh

mkdir -p scallop/tinat/compare

PATH=$PATH:/rumi/shams/abe/Workflows/rnaseqtools-1.0.3/bin/

conda activate alignment

# 1. merge
gtfmerge union scallop/tinat/tinat-gtf-list.txt scallop/tinat/hg19.tinat.gtf -t 18
# 2. compare 
gffcompare -o scallop/tinat/gffall -r /rumi/shams/genomes/hg19/hg19_genes.gtf scallop/tinat/hg19.tinat.gtf 
# 3. subset
gtfcuff puniq scallop/tinat/gffall.hg19.tinat.gtf.tmap scallop/tinat/hg19.tinat.gtf /rumi/shams/genomes/hg19/hg19_genes.gtf scallop/tinat/unique.gtf
# 4. gtf2fasta
gffread scallop/tinat/unique.gtf -g /rumi/shams/genomes/hg19/hg19.fa -w scallop/tinat/unique.fa
# 5.concatenate with hg19 fasta
# mkdir -p scallop/hg19
# wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.knownGene.gtf.gz
# mv hg19.knownGene.gtf.gz scallop/hg19/hg19.knownGene.gtf.gz
# gunzip scallop/hg19/hg19.knownGene.gtf.gz
gffread scallop/hg19/hg19.knownGene.gtf -g /rumi/shams/genomes/hg19/hg19.fa -w scallop/hg19/hg19.knownGene.fa
cat scallop/hg19/hg19.knownGen

In [29]:
%%bash 
mkdir -p tinat/quants

for f in fastq/*; do
	f=`basename $f`;
	o=${f/.fastq.gz/};
	echo $o
	echo `date`
    salmon quant \
        -i ../scallop-genome/tinat/salmon.index -l A \
        -r fastq/$f -o tinat/quants/$o -p 10;
	echo `date`
done

hl60_120h_t_1
Tue Aug 16 15:50:33 PDT 2022
Tue Aug 16 15:51:37 PDT 2022
hl60_120h_t_2
Tue Aug 16 15:51:37 PDT 2022
Tue Aug 16 15:52:36 PDT 2022
hl60_120h_u_1
Tue Aug 16 15:52:36 PDT 2022
Tue Aug 16 15:53:25 PDT 2022
hl60_120h_u_2
Tue Aug 16 15:53:25 PDT 2022
Tue Aug 16 15:54:23 PDT 2022
hl60_6h_t_1
Tue Aug 16 15:54:23 PDT 2022
Tue Aug 16 15:55:01 PDT 2022
hl60_6h_t_2
Tue Aug 16 15:55:01 PDT 2022
Tue Aug 16 15:56:13 PDT 2022
hl60_6h_u_1
Tue Aug 16 15:56:13 PDT 2022
Tue Aug 16 15:56:48 PDT 2022
hl60_6h_u_2
Tue Aug 16 15:56:48 PDT 2022
Tue Aug 16 15:57:33 PDT 2022
hl60_72h_t_1
Tue Aug 16 15:57:33 PDT 2022
Tue Aug 16 15:58:27 PDT 2022
hl60_72h_t_2
Tue Aug 16 15:58:27 PDT 2022
Tue Aug 16 15:59:29 PDT 2022
hl60_72h_u_1
Tue Aug 16 15:59:29 PDT 2022
Tue Aug 16 16:00:41 PDT 2022
hl60_72h_u_2
Tue Aug 16 16:00:41 PDT 2022
Tue Aug 16 16:02:08 PDT 2022
kg1_t_1
Tue Aug 16 16:02:08 PDT 2022
Tue Aug 16 16:03:18 PDT 2022
kg1_t_2
Tue Aug 16 16:03:18 PDT 2022
Tue Aug 16 16:03:37 PDT 2022
kg1_t_3
Tue Aug 

Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
### salmon (mapping-based) v1.2.1
### [ program ] => salmon 
### [ command ] => quant 
### [ index ] => { ../scallop-genome/tinat/salmon.index }
### [ libType ] => { A }
### [ unmatedReads ] => { fastq/hl60_120h_t_1.fastq.gz }
### [ output ] => { tinat/quants/hl60_120h_t_1 }
### [ threads ] => { 10 }
Logs will be written to tinat/quants/hl60_120h_t_1/logs
[2022-08-16 15:50:35.074] [jointLog] [info] setting maxHashResizeThreads to 10
[2022-08-16 15:50:35.074] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2022-08-16 15:50:35.074] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2022-08-16 15:50:35.074] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consen

In [48]:
!date

Thu Oct 21 01:06:19 UTC 2021
