# DNA Oxford Nanopore Processing and Analysis

* **Project:** African-ancestry intronic *GBA1* branch point variant
* **Language:** Bash 
* **Last updated:** 20-DEC-2023

## Notebook Overview
- Process raw RNAseq data from basecalling to mapping

**Note**: Notebook is only showing processing of the CRISPR-edited lines. Other DNA-seq ONT data was also processed the same way.

### CHANGELOG
20-DEC-2023: Notebook final draft

---

**CRISPR NAMING KEY**  \
    CT_37 --> ND01137_TT \
    CT_89 --> ND22789_GG \
    MT_37 --> ND01137_GG_Mock \
    MT_89 --> ND22789_TT_Mock \
    PT_37 --> ND01137_GT \
    PT_89 --> ND22789_GT \
    WT_89 --> ND22789_TT_OG 

In [3]:
MAIN=./GBA1_CRISPR/DNA/

In [1]:
cat $MAIN/sample_names_DNA.txt
# Note: some samples have multiple flow cells

MT_37   PAM72720
MT_89   PAM73235
MT_89   PAM80411
CT_37   PAM73573
CT_89   PAM31684
PT_37   PAM73647
PT_37   PAM74549
PT_89   PAM72819
PT_89   PAM73226
WT_89   PAQ45921


## 1a. Basecalling

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
sbatch --partition=gpu --cpus-per-task=10 --mem=50g --gres=gpu:a100:2,lscratch:200 --time=5-0 \
--wrap="bash guppy_basecaller_R9_DNA.sh $MAIN/CRISPR_"$first"_DNA/fast5/ $MAIN/CRISPR_"$first"_DNA/out_GUP/"
done

## 1b. Cleaning post basecalling

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
sbatch --mem=80g --cpus-per-task=5 --time=1-0 2_ONT_basecalling_clean_up.sh \
$MAIN/CRISPR_"$first"_DNA/out_GUP/pass/ \
"$first"_"$second"
done

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
cd $MAIN/CRISPR_"$first"_DNA/out_GUP/
mkdir log_files
mv *log log_files
mv sequencing_summary.txt ../other_reports/
mv sequencing_telemetry.js ../other_reports/
mv log_files ../other_reports/
mv ./pass/"$first"_"$second".* ../
cd $MAIN
done

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
cd $MAIN/CRISPR_"$first"_DNA/out_GUP/
mv ./pass/pycoQC* ../other_reports/
mv ./pass/stats.pass.tsv ../other_reports/
rm ./pass/*.fastq
rm -r ./pass/
rm -r ./fail/
cd ../
du -sh ./out_GUP/
rm -r ./out_GUP/
cd $MAIN
done

## 2. Mapping

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
mkdir $MAIN/CRISPR_"$first"_DNA/mapped/
done

In [None]:
cat sample_names_DNA.txt | while read -r first second ; do
sbatch --mem=80g --cpus-per-task=5 --time=2-0 --mail-type=END 3_ONT_DNA_meth_mapping.sh \
$MAIN/CRISPR_"$first"_DNA/ \
"$first"_"$second".bam \
 $MAIN/CRISPR_"$first"_DNA/mapped/ \
"$first"_"$second"_hg38
done

In [None]:
# Merge mapped bams per sample, for 1 FC samples it is just a rename for consistency
cat sample_names_DNA.txt | while read -r first second ; do
sbatch --mem=50g --cpus-per-task=5 --time=2-0 merge.sh $first
done

## Bams are now ready to view on IGV