# DIA Analysis Part 1: PECAN

In this notebook, I'll walk through how I prepared Pacific oyster (*Crassostrea gigas*) [proteomic data](https://yaaminiv.github.io/Mass-Spec-Start/) collected for the [DNR project](https://yaaminiv.github.io/DNRprojectintroduction/) for [PECAN](https://bitbucket.org/maccosslab/pecan/overview). PECAN is the first step in the [Data-Independent Mass Spectrometry](https://github.com/sr320/LabDocs/wiki/DIA-data-Analyses) pipeline. In general, DIA Analysis is a bottom-up proteomics method that separately gathers MS/MS spectra and MS survey spectra.

PECAN correlates your acquired peptide spectra to a database of known sequences and creates a library of proteins and peptides that you detected in your experiment. PECAN requires several inputs, each of which must be prepared before running PECAN in the command line.

The first step is to install PECAN, [MSConvert](http://proteowizard.sourceforge.net/tools.shtml) and a [Protein Digestion Simulator](https://omics.pnl.gov/software/protein-digestion-simulator) on the same Windows machine.

I then obtained the .raw files from the mass spectrometer. They can be found [here](https://owl.fish.washington.edu/spartina/January_2017_DNR_Raw_Data).

### 1. MSConvert

Output files from a mass spectrometer are in the .raw format, but PECAN requires mzML files. MSConvert is a GUI used to generate these files with the appropriate centroid peaks using 64-bit and zlib compression. Using the settings outlined in the [DIA Wiki](https://github.com/sr320/LabDocs/wiki/DIA-Data-Analyses), Steven ran MSConvert on my .raw files since I didn't have access to a Windows computer with the program.

Converted files can be found [here](http://owl.fish.washington.edu/halfshell/index.php?dir=working-directory%2F17-02-15%2Forf%2F).

### 2. Protein Digestion Simulator

This step requires a list of all peptides I'm interested in identifying in my sample and a list of QC peptides. I'm interested in all possible peptides in my sample, so I'll use a *C. gigas* proteome.

I need to ensure my proteome is tab-delimited, and has protein names and sequences.

In [2]:
!curl http://owl.fish.washington.edu/halfshell/bu-git-repos/nb-2017/C_gigas/data/Cg_Gigaton_proteins.fa \
> /Users/yaamini/Documents/project-oyster-oa/data/PECAN-inputs/Cg_Gigaton_proteins.fa

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.7M  100 20.7M    0     0  19.0M      0  0:00:01  0:00:01 --:--:-- 19.8M


In [4]:
!head /Users/yaamini/Documents/project-oyster-oa/data/PECAN-inputs/Cg_Gigaton_proteins.fa

>CHOYP_043R.1.5|m.16874
TPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPT
PSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTP
SGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPS
VTPTPSGPTPSVTPTPSG
>CHOYP_043R.5.5|m.64252
SRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPS
ATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAP
IENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKI
ITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSI


Using the `!head` command, I confirmed my proteome file has the protein name and sequence in a tab-delimited format. 

Next, I append the list of Quality Control peptides to the list of peptides I'm interested in. To do this, I will use [Galaxy](https://usegalaxy.org).

In [7]:
!curl https://owl.fish.washington.edu/generosa/Generosa_DNR/Pierce_PRTC.tabular -k \
> /Users/yaamini/Documents/project-oyster-oa/data/PECAN-inputs/Pierce_PRTC.tabular

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   230  100   230    0     0     64      0  0:00:03  0:00:03 --:--:--    64


In [8]:
!head /Users/yaamini/Documents/project-oyster-oa/data/PECAN-inputs/Pierce_PRTC.tabular

P00000 Pierce Peptide Retention Time Calibration Mixture	SSAAPPPPPRGISNEGQNASIKHVLTSIGEKDIPVPKPKIGDYAGIKTASEFDSAIAQDKSAAGAFGPELSRELGQSGVDTYLQTKGLILVGGYGTRGILFVGSGVSGGEEGARSFANQPLEVVYSKLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK


![galaxy parameters]()