# HCT116 Methionine Depletion - RNAseq
```
pi:ababaian
files: ~/Crown/data2/met/
start: 2018 01 05
complete : 2018 01 10
```
## Introduction

One of the ideas I've been exploring wrt the mechanism underlying the hypo-modification of 1248.macpPsi is that it may be a limitation of the modification **substrate** SAM/SAH. Methionine feeds into this cycle, thus as cells become hyper-proliferative they may create localized niches which are low [Methionine] and thus new rRNA does have sufficient incorporation of macpPsi.

This is partially supported by the FBS/growth experiments in HCT116 in which faster growing / overconfluent cells may have lower mod leves (although this experiment needs to be repeated).

I've bought (but not tested) cycloleucine as a MAT enzyme inhibitor to deplete intracellular [SAM], although this is a risky experiment (in the sense that it's next to impossible to interpret either way, and thus hasn't been done yet).

![SAM / SAH Cycle](../../data2/met/plot/SAM_cycle.png)

I recently came across a pre-print in which RNAseq and ChIPseq was performed under high and low levels of Methionine in culture; measuring the effect on the breadth of H3K4me3 deposition (which also receives it's Me from the methyl-donor SAM).

I've downloaded and aligned these RNA-seq files (have not looked at the output at all) to hgr1 to test the hypothesis

### Hypothesis

- Cell culture depletion of [Methionine] will correlate with lower levels of intra-cellular [SAM / SAH] and thus lead to the hypo-modification of macp-Psi at 18S.1248.


## Materials and Methods

- [GEO: GSE103602](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103602)
- hgr1 genome

### Download and initialize data

In [3]:
cd ~/Crown/data2/met
ls -alh
ls -alh fq/*

total 12M
drwxrwxr-x  3 artem artem 4.0K Jan  7 13:27 .
drwxrwxr-x 19 artem artem 4.0K Jan  7 13:29 ..
-rw-rw-r--  1 artem artem  11M Jan  5 13:20 243196.full.pdf
drwxrwxr-x  2 artem artem 4.0K Jan  7 13:19 fq
-rw-r--r--  1 artem artem 1.1M Aug 29 17:25 hgr1.fa
-rw-rw-r--  1 artem artem 4.0K Jan  7 13:22 met_align_v0.sh
-rw-rw-r--  1 artem artem  240 Jan  7 13:26 met_data.txt
-rwxrwxr-x  1 artem artem  41K Jan  7 13:27 met_data.xlsx
-rw-rw-r--  1 artem artem  196 Jan  5 16:39 rnaseq_prefetch.sh
-rw-rw-r--  1 artem artem 6.2K Jan  5 13:35 SraRunTable.txt
-rw-rw-r-- 1 artem artem 2.4G Jan  5 22:49 fq/SRR6014279.fastq.gz
-rw-rw-r-- 1 artem artem 2.8G Jan  5 21:22 fq/SRR6014280.fastq.gz
-rw-rw-r-- 1 artem artem 2.5G Jan  5 22:05 fq/SRR6014281.fastq.gz
-rw-rw-r-- 1 artem artem 2.7G Jan  5 16:37 fq/SRR6014282.fastq.gz


In [5]:
## SRA Data
cat SraRunTable.txt

Assay_Type	AvgSpotLen	BioSample	Experiment	Instrument	LibrarySelection	LibrarySource	LoadDate	MBases	MBytes	Organism	Run	SRA_Sample	Sample_Name	source_name	strain	tissue	BioProject	Center_Name	Consent	InsertSize	LibraryLayout	Platform	ReleaseDate	SRA_Study
ChIP-Seq	76	SAMN07614967	SRX3167264	NextSeq 500	ChIP	GENOMIC	2017-09-07	5090	2116	Homo sapiens	SRR6014267	SRS2497426	GSM2775131	HCT116		HCT116	PRJNA402050	GEO	public	0	SINGLE	ILLUMINA	2018-01-02	SRP117054
ChIP-Seq	76	SAMN07614966	SRX3167265	NextSeq 500	ChIP	GENOMIC	2017-09-11	2092	1006	Homo sapiens	SRR6014268	SRS2497427	GSM2775132	HCT116		HCT116	PRJNA402050	GEO	public	0	SINGLE	ILLUMINA	2018-01-02	SRP117054
ChIP-Seq	76	SAMN07614965	SRX3167266	NextSeq 500	ChIP	GENOMIC	2017-09-07	3439	1446	Homo sapiens	SRR6014269	SRS2497430	GSM2775133	HCT116		HCT116	PRJNA402050	GEO	public	0	SINGLE	ILLUMINA	2018-01-02	SRP117054
ChIP-Seq	76	SAMN07614964	SRX3167267	NextSeq 500	ChIP	GENOMIC	2017-09-11	2344	1139	Homo sapiens	SRR6014270	SRS2497428	GSM2775

In [None]:
## Ran Previously

#prefetch SRR6014282
#prefetch SRR6014280
#prefetch SRR6014281
#prefetch SRR6014279

#fastq-dump --gzip SRR6014282
#fastq-dump --gzip SRR6014280
#fastq-dump --gzip SRR6014281
#fastq-dump --gzip SRR6014279

In [4]:
cat met_data.txt

hct_Hmet_1	SRR6014279	SRX3167276	hct116	SRR6014279.fastq.gz
hct_Hmet_2	SRR6014280	SRX3167277	hct116	SRR6014280.fastq.gz
hct_Lmet_1	SRR6014281	SRX3167278	hct116	SRR6014281.fastq.gz
hct_Lmet_2	SRR6014282	SRX3167279	hct116	SRR6014282.fastq.gz


### hgr1 alignment

Copied over `crc4_align_v0.sh` script to `met_align_v0.sh`


In [11]:
#!/bin/bash
# crc_align_hgr1.fa
# rDNA alignment pipeline
# for CRC HCT116 met data on glitch to hgr1
# 180107
# glitch

# Control Panel -------------------------------

# Project Dir
  BASE='/home/artem/Crown/data2/met'
  cd $BASE
  
  LIB_LIST='met_data.txt' # list of crc data fastq files

# Sequencing Data
  CRC_DIR='/home/artem/Crown/data2/met/fq'
  
# CPU
  THREADS='2'
  
# Initialize start-up sequence ----------------
# Make working directory
  mkdir -p align

#Resources
  #aws s3 cp s3://crownproject/resources/hgr1.fa ./
  samtools faidx hgr1.fa
  bowtie2-build hgr1.fa hgr1

# ---------------------------------------------
# SCRIPT LOOP ---------------------------------
# ---------------------------------------------
# For each line in input LIB_LIST; run the pipeline

cat $LIB_LIST | while read LINE
do
    #Initialize Run
    echo "Start Iteration:"
    echo "  $LINE"
    echo ''
    
    LIBRARY=$(echo $LINE | cut -f1 -d' ' -) # Library Name
    RGSM=$(echo $LINE | cut -f2 -d' ' -)    # Sample / Patient Identifer
    RGID=$(echo $LINE | cut -f3 -d' ' -)    # Read Group ID
    RGLB=$(echo $LINE | cut -f3 -d' ' -)    # Library Name. Accession Number
    RGPL='ILLUMINA'                   # Sequencing Platform.
    RGPO=$(echo $LINE | cut -f4 -d' '  -)    # Patient Population

    FASTQ1=$(echo $LINE | cut -f5 -d' ' -)  # Filename Read 1
    #FASTQ2=$(echo $LINE | cut -f6 -d' ' -)  # Filename Read 2
    
    FQ1="$CRC_DIR/$FASTQ1"            # Fastq1 Filepath
    #FQ2="$CRC_DIR/$FASTQ2"            # Fastq2 Filepath
    
    
    echo Read File: $FQ1
    # Extract Sequencing Run Info
    RGPU=$RGID
    
    # Bowtie2: align to genome
    gzip -dc $FQ1 | 
    bowtie2 --very-sensitive-local -p $THREADS --rg-id $RGID \
      --rg LB:$RGLB --rg SM:$RGSM \
      --rg PL:$RGPL --rg PU:$RGPU \
      -x hgr1 -U $FQ1 |\
      samtools view -bS - > aligned_unsorted.bam
      
    # Calcualte library flagstats
    samtools flagstat aligned_unsorted.bam > aligned_unsorted.flagstat

    # Read Subset ------------------------------
    # Extract mapped reads, and their unmapped pairs

      # Extract Header
      samtools view -H aligned_unsorted.bam > align.header.tmp

      # Unmapped reads with mapped pairs
      # Extract Mapped Reads
      # and their unmapped pairs
      samtools view -b -F 4 aligned_unsorted.bam > align.F4.bam #mapped
      samtools view -b -f 4 -F 8 aligned_unsorted.bam > align.f4F8.bam #unmapped pairs

      # Extract just the 45S unit
      #aws s3 cp s3://crownproject/resources/rDNA_45s.bed ./
      #samtools view -b -L rDNA_45s.bed align.F4.bam > align.F4.45s.bam

      # What are the mapped readnames
      samtools view align.F4.bam | cut -f1 - > read.names.tmp

      # Extract mapped reads
      samtools view align.F4.bam | grep -Ff read.names.tmp - > align.F4.tmp.sam


      # Extract cases of read pairs mapped on edge of region of interest
      # -------|======= R O I ======| ----------
      # read:                  ====---====
      samtools view align.F4.bam | grep -Ff read.names.tmp - > align.F4.tmp.sam

      # Complete mapped reads list
      #cut -f1 align.F4.tmp.sam > read.names.45s.long.tmp

      # Extract unmapped reads with a mapped pair
      samtools view align.f4F8.bam | grep -Ff read.names.tmp - > align.f4F8.tmp.sam

      # Re-compile bam file
      cat align.header.tmp align.F4.tmp.sam align.f4F8.tmp.sam | samtools view -bS - > align.hgr1.tmp.bam
        samtools sort align.hgr1.tmp.bam -o align.hgr1.bam
        samtools index align.hgr1.bam
        samtools flagstat align.hgr1.bam > align.hgr1.flagstat

      # Clean up 
      rm *tmp* align.F4.bam align.f4F8.bam

    # Rename the total Bam Files
      rm aligned_unsorted.bam
      rm aligned_unsorted.bam.bai $LIBRARY.bam.bai
      rm aligned_unsorted.flagstat $LIBRARY.flagstat

    # Rename the hgr Bam files
      mv align.hgr1.bam $LIBRARY.hgr1.bam
      mv align.hgr1.bam.bai $LIBRARY.hgr1.bam.bai
      mv align.hgr1.flagstat $LIBRARY.hgr1.flagstat

done

# Primary VCF ----------------------------
# N/A

# Script complete

Settings:
  Output files: "hgr1.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  hgr1.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 4072
Using parameters --bmax 3054 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 3054 --dcv 1024
Constructing suff

## Results

Initially; there's a difference in size between the two sets of files with the Low Methionine containing more read information.

In [1]:
cd ~/Crown/data2/met/align/
ls -alh *.bam

cat *.flagstat

-rw-rw-r-- 1 artem artem  23M Jan  7 18:01 hct_Hmet_1.hgr1.bam
-rw-rw-r-- 1 artem artem  50M Jan  7 18:39 hct_Hmet_2.hgr1.bam
-rw-rw-r-- 1 artem artem 144M Jan  7 19:16 hct_Lmet_1.hgr1.bam
-rw-rw-r-- 1 artem artem 132M Jan  7 19:59 hct_Lmet_2.hgr1.bam
604249 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
133592 + 0 mapped (22.11% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
1323864 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
250247 + 0 mapped (18.90% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 

### 18S.1248U Modification

There is an obvious difference between [High] and [Low] Methionine with respect to 1248U modification; failing to disprove the hypothesis.

![Methionine 1248U](../../data2/met/plot/20180110_hct116_MET_1248U.png)


** High Methionine: 100 uM **
```
chr13:1,004,908
<hr>Total count: 438
A      : 24  (5%,     12+,   12- )
C      : 154  (35%,     87+,   67- )
G      : 21  (5%,     10+,   11- )
T      : 239  (55%,     130+,   109- )
N      : 0
---------------
DEL: 21
INS: 0

chr13:1,004,908
<hr>Total count: 569
A      : 19  (3%,     11+,   8- )
C      : 183  (32%,     96+,   87- )
G      : 24  (4%,     15+,   9- )
T      : 343  (60%,     154+,   189- )
N      : 0
---------------
DEL: 22
INS: 0
```

** Low Methionine: 3 uM **
```
chr13:1,004,908
<hr>Total count: 1194
A      : 15  (1%,     9+,   6- )
C      : 156  (13%,     82+,   74- )
G      : 13  (1%,     6+,   7- )
T      : 1009  (85%,     489+,   520- )
N      : 1  (0%,     1+,   0- )
---------------
DEL: 13
INS: 0

chr13:1,004,908
<hr>Total count: 1229
A      : 23  (2%,     9+,   14- )
C      : 158  (13%,     84+,   74- )
G      : 25  (2%,     18+,   7- )
T      : 1023  (83%,     499+,   524- )
N      : 0
---------------
DEL: 22
INS: 0
```


## Discussion / Conclusions

This is pretty exciting; exactly as predicted but it does not establish causation that it is specifically [Met] levels which control this phenominon, it could still be assocaited with proliferation. Performing the rescue; that is supplemental MET added to a system with high hypo-modification levels (K562 or other?) would be informative if this factor is sufficient to replenish macp modification in cells.

The difference in the total amount of rRNA also is not perfectly controlled but there is no evidence that it is a processing defect (all zero reads for 18S-E). A decrease in global mRNA levels would be consistent with a higher fraction of rRNA reads in a polyA library and is the most likely explaination. Repeating this with an internal spike-in control like ERCC will be valuable as global mRNA levels and rRNA levels per cell count are of interest here.

### Essential Amino Acids

Is this true for all essential amino acids? Methionine has a direct feedback into this pathway which is why it's interesting, find more data-sets of depletion of other essential Amino Acids: Histidine, Isoleucine, Leucine, Lysine, Phenylalaine, Threonine, Tryptophan and/or Valine.

Can this be repeated in yeast? That would accelerate the research substantially.



QED


## Replicate 2 -- continue

Repeat MET depletion: GSE72131 --> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4635069/pdf/nihms720081.pdf


Prefetch:
```
prefetch SRR2170643
prefetch SRR2170644
prefetch SRR2170645
prefetch SRR2170646

fastq-dump --gzip SRR2170643
fastq-dump --gzip SRR2170644
fastq-dump --gzip SRR2170645
fastq-dump --gzip SRR2170646
```

Note: Looks like these RNAseq files are identical to the ones I ran earlier; not going to repeat.