# ChIPAID project

##### TODO

- End biblio on how to perform ChIP-Seq analysis
- Dedup BAM ?
- Launch peak calling
- Perform IDR & other controls
- DE analysis

## 0) Introduction

Context is B cells, focusing on SHM event and AID enzyme. ChIP-Seq is performed to look at AID recruitment on Ig gene & on AID off-target genes. In this study we have WT mouse & MAR-KO (MAR) mouse. It is expected that MAR have lower AID recruitement on IgH, IgK, IgL & AID off-target.

One of the issue is that the B cells they are interested in are pretty rare & the AID action is pretty rare too. Thus we may not have enough data (compared to classical ChIP-Seq protocols).

Samples are : 3 biological rep for both WT & MAR (and all have immunoP (IP) + Input (IN) (control), for a total of 12 samples).

## 1) Data

### 1.1) Reads

Raw reads were retrieved from Ophélie Martin external drive. They were sequenced @FTS (@Lionel & @Romain) on Illumina plateform (75bp single-end).

##### FASTQ samples were barcoded, separated, cleaned but not merged
Thus we need to do this task

##### Get uniques IDs (should be 12)

In [3]:
### CODE ###
path='/run/user/1000/gvfs/smb-share:server=ubox.ad.unilim.fr,share=biscem/BioInformatique/Responsables/Commun/CHIPAID';
cd $path/Data/Reads;
for f in *.gz;
do echo ${f%%_*};
done | sort -u > samples.txt;

##### Then cat the 48 files -> 12 FASTQ

In [None]:
### CODE ###
while read sample;
do cat $sample* > $sample.fastq.gz;
done < samples.txt;

##### Lastly, use "easier" names for samples

In [None]:
mv MARsKO01.fastq.gz MAR_IN_1.fastq.gz;
mv MARsKO02.fastq.gz MAR_IN_2.fastq.gz;
mv MARsKO03.fastq.gz MAR_IN_3.fastq.gz;
mv MARsKO1.fastq.gz MAR_IP_1.fastq.gz;
mv MARsKO2.fastq.gz MAR_IP_2.fastq.gz;
mv MARsKO3.fastq.gz MAR_IP_3.fastq.gz;
mv WT01.fastq.gz WT_IN_1.fastq.gz;
mv WT02.fastq.gz WT_IN_2.fastq.gz;
mv WT03.fastq.gz WT_IN_3.fastq.gz;
mv WT1.fastq.gz WT_IP_1.fastq.gz;
mv WT2.fastq.gz WT_IP_2.fastq.gz;
mv WT3.fastq.gz WT_IP_3.fastq.gz;

### 1.2) Mus musculus

Reference genome is the same as the one used in the PTCB project. It's the latest GRC (GRCm38.p6), minus entries not in the gencode M16 annotation (to avoid future fails with downstream analysis tools).

## 2) Quality control

##### FastQC v0.11.2 @genotoul

In [None]:
### CODE ###
#!/bin/bash
#$ -q workq
#$ -M erwan.scaon@unilim.fr
#$ -m bea
#$ -l mem=8G
#$ -l h_vmem=10G
#$ -N fastqc_chipaid
#$ -o /home/escaon/work/CHIPAID/Verbose/fastqc.o
#$ -e /home/escaon/work/CHIPAID/Verbose/fastqc.e

##############################
### Paths, tools & modules ###
##############################
work='/home/escaon/work/CHIPAID';

##############
### FASTQC ###
##############
cd $work/Data;
for f in *.fastq.gz;
do echo $f; fastqc --outdir $work -f fastq $f;
done;

##### MultiQC 1.4 @genotoul

In [None]:
### CODE ###
#!/bin/bash
#$ -q workq
#$ -M erwan.scaon@unilim.fr
#$ -m bea
#$ -l mem=8G
#$ -l h_vmem=10G
#$ -N multiqc_chipaid
#$ -o /home/escaon/work/CHIPAID/Verbose/multiqc.o
#$ -e /home/escaon/work/CHIPAID/Verbose/multiqc.e

##############################
### Paths, tools & modules ###
##############################
work='/home/escaon/work/CHIPAID';
fastqc_dir='/home/escaon/work/CHIPAID/QC/FastQC';

###############
### MULTIQC ###
###############
multiqc $fastqc_dir \
        -n chipaid \
        -o $work \
        -m fastqc \
        -f;

## 3) Align reads vs reference genome

### 3.1) Build index

Given that reads are shorter here than in the PTCB project, let's rebuild the genome index with this in mind (sjdbOverhang parameter 150 -> 74)

##### STAR 2.5.3a @CALI

In [4]:
### CODE ###
#!/bin/bash
#SBATCH --partition=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=6144
#SBATCH --time=1-23:59:59
#SBATCH --mail-user=erwan.scaon@unilim.fr
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=/home/scaonp01/scratch/PTCB/Verbose/star_idx_custom_chipaid.e
#SBATCH --output=/home/scaonp01/scratch/PTCB/Verbose/star_idx_custom_chipaid.o
#SBATCH --job-name=star_idx_custom

##############################
### Paths, tools & modules ###
##############################
nt=8;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK;
star='/home/scaonp01/Software/STAR_2.5.3a/bin/Linux_x86_64';
ptcb='/home/scaonp01/PTCB/Data/Mus_musculus';
scratch='/home/scaonp01/scratch/PTCB';

############
### STAR ###
############
mkdir -p $ptcb/GRCm38.p6_custom_chipaid_STAR;
cd $ptcb/GRCm38.p6_custom_chipaid_STAR;
$star/STAR --runThreadN $nt \
           --runMode genomeGenerate \
           --genomeDir $ptcb/GRCm38.p6_custom_chipaid_STAR \
           --genomeFastaFiles $ptcb/GRCm38.p6_custom_STAR/GCA_000001635.8_GRCm38.p6_genomic_renamed_subsampled.fna \
           --sjdbGTFfile $ptcb/GRCm38.p6_custom_STAR/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf \
           --sjdbOverhang 74;

### 3.2) Alignments

##### STAR 2.5.3a @CALI

Initial script : star_WT_IN_2.sh
Nb_1 : We use gapped aligner based on biostar handbook advise.
Nb_2 : Default parameters.
Nb_3 : We keep unaligned reads in a separated FASTQ just in case.

In [None]:
### CODE ###
#!/bin/bash
#SBATCH --partition=normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=6144
#SBATCH --time=1-23:59:59
#SBATCH --mail-user=erwan.scaon@unilim.fr
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --error=/home/scaonp01/scratch/CHIPAID/Verbose/star_WT_IN_2.e
#SBATCH --output=/home/scaonp01/scratch/CHIPAID/Verbose/star_WT_IN_2.o
#SBATCH --job-name=star_WT_IN_2

##############################
### Paths, tools & modules ###
##############################
nt=8;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK;
star='/home/scaonp01/Software/STAR_2.5.3a/bin/Linux_x86_64/STAR';
samtools='/home/scaonp01/Software/Samtools_1.6/Install/bin/samtools';
save='/home/scaonp01/CHIPAID';
work='/home/scaonp01/scratch/CHIPAID';

############
### STAR ###
############
cd $save/Data/Reads;
for id in 'WT_IN_2';
do $star --genomeDir $save/Data/GRCm38.p6_custom_chipaid_STAR \
         --readFilesIn $id.fastq.gz \
         --outFileNamePrefix $work/$id"_vs_GRCm38.p6_custom_" \
         --runThreadN $nt \
         --readFilesCommand zcat \
         --outSAMtype BAM SortedByCoordinate \
         --outReadsUnmapped Fastx;
done;

######################
### Compress Stuff ###
######################
cd $work;
mv $id"_vs_GRCm38.p6_custom_Unmapped.out.mate1" $id"_vs_GRCm38.p6_custom_Unmapped.out.mate1.fastq";
gzip $id"_vs_GRCm38.p6_custom_Unmapped.out.mate1.fastq";
$samtools index $id"_vs_GRCm38.p6_custom_Aligned.sortedByCoord.out.bam";

Duplicate script for the 11 other samples

In [None]:
### CODE ###
for id in 'MAR_IN_1' 'MAR_IN_2' 'MAR_IN_3' 'MAR_IP_1' 'MAR_IP_2' 'MAR_IP_3' 'WT_IN_1' 'WT_IN_3' 'WT_IP_1' 'WT_IP_2' 'WT_IP_3';
do cp star_WT_IN_2.sh star_$id.sh;
   sed -i "s/WT_IN_2/$id/" star_$id.sh;
   sbatch star_$id.sh;
done;

### 3.3) Alignements QC

##### MultiQC 1.4 @local

In [5]:
### CODE ###
path='/run/user/1000/gvfs/smb-share:server=ubox.ad.unilim.fr,share=biscem/BioInformatique/Responsables/Commun/CHIPAID';
cd $path/Output/1_STAR/QC;
multiqc ./ -n star_vs_custom -o ./ -m star -f > ./multiqc.verbose 2>&1;