In [1]:
# make sure dependencies have been generated (they should have been
# if this notebook is run in batch mode by make)
(cd ../; make init)

make[1]: Entering directory `/mnt/expressions/mp/archaic-y'
make[1]: Nothing to be done for `init'.
make[1]: Leaving directory `/mnt/expressions/mp/archaic-y'


# Processing of the Denisova 8 shotgun data

Two folders are relevant here:

* `/home/susanna_sawyer/Neandertal/Denisova_Molar_paper`
* `/mnt/expressions/susanna/Denisova_Molar_2_L9133/`

Readme file for processing is here: `/mnt/expressions/susanna/Denisova_Molar_2_L9133/read_me_HiSeq_L9133`

It contains so many things, that re-processing the runs again (as for El Sidron capture data) does not make any sense. It will be better to use the published data, which is here:

`/home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data`

These are the same BAM files as the ones in here (with the same name):

`/mnt/expressions/susanna/Denisova_Molar_2_L9133/separate_damage`

In [2]:
ls /home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data/*.bam

/home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data/Den8_l35q37.bam
/home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data/Den8_l35q37_deam.bam


Let's just use both files as they were filtered (length >= 35, mapping quality 37) and save us lot of work.

In [3]:
bam_dir="../bam"



In [4]:
targets_bed="../input/target_regions_map35-99.bed"



## Extract only reads which overlap target regions from the capture array 

### All reads

In [5]:
bedtools intersect \
    -a /home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data/Den8_l35q37.bam \
    -b $targets_bed \
    > $bam_dir/den8_ontarget.bam



In [6]:
samtools index $bam_dir/den8_ontarget.bam



### Deaminated read only

In [7]:
bedtools intersect \
    -a /home/susanna_sawyer/Neandertal/Denisova_Molar_paper/final_data/Den8_l35q37_deam.bam \
    -b $targets_bed \
    > $bam_dir/deam_den8_ontarget.bam



In [8]:
samtools index $bam_dir/deam_den8_ontarget.bam



# Processing of the Denisova 4 shotgun data

It is a bit more trouble to find the final Denisova 4 data. The deaminated only fragments were published in the archive here: http://www.ebi.ac.uk/ena/data/view/PRJEB10828

The downloaded BAM file for Denisova 4 has 2877362 bytes.

Checking this directory `/mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage` shows:

In [9]:
ls -l /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/*.bam

-rw-r--r-- 1 public staff  2857946 Oct 22  2012 /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_deam.bam
-rw-r--r-- 1 public staff  2877362 Oct 22  2012 /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_deam_clipped.bam
-rw-r--r-- 1 public staff 74248791 Oct 22  2012 /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_notDeam.bam
-rw-r--r-- 1 public staff 77094232 Oct 22  2012 /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_repaired_l35q37.bam


The file `DenMol1_uniq_l35q37_deam.bam` therefore seems to be the one that was uploaded to the archive as deaminated-only fragments. This is in agreement with a README file in `/mnt/expressions/susanna/Denisova_Molar_1_L9351/read_me_HiSeq_L9351` which described the process of separation of damaged reads.

This README also describes how the `DenMol1_uniq_l35q37_notDeam.bam` was generated. This is the file with all fragments.

Therefore, for further analysis, these two files will be used:
* `/mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_deam_clipped.bam`
* `/mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_notDeam.bam`

**Importantly** the readme in `/mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/read_me_separate_damage` also describes how the base quality of terminal bases was decreased for the deaminated-only fragments. This is where the "clipped" suffix comes from.

## Extract only reads which overlap target regions from the capture array 

### All reads

In [10]:
bedtools intersect \
    -a /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_notDeam.bam \
    -b $targets_bed \
    > $bam_dir/den4_ontarget.bam



In [11]:
samtools index $bam_dir/den4_ontarget.bam



### Deaminated read only

In [12]:
bedtools intersect \
    -a /mnt/expressions/susanna/Denisova_Molar_1_L9351/separate_damage/DenMol1_uniq_l35q37_deam_clipped.bam \
    -b $targets_bed \
    > $bam_dir/deam_den4_ontarget.bam



In [13]:
samtools index $bam_dir/deam_den4_ontarget.bam

