# Data-Independent Mass Spectrometry: From RAW files to Analyses

In this notebook, I'll outline the workflow necessary to take my *Crassostrea gigas* Data-Independent Mass Spectrometry (DIA) data from `RAW` files off of the mass spectrometer to a `.csv` usable in R Studio. DIA allows us to identify all peptides in each oyster sample without having any prior knowledge of what we may find.

*Note: This pipeline relies heavily on Windows-based programs*

## Step 1: Collect Materials

**Software Dependencies**:

- [Skyline daily](https://skyline.ms/project/home/software/Skyline/daily/register-form/begin.view?)
  - This version of Skyline will udpates as the software is modified. You need permission to download it. The version of Skyline daily used in this notebook is Skyline-daily (64-bit) 3.7.1.11446 (as of 2017-10-11). 
- [MSConvert]()
  - Special version of MSConvert modified by Austin in Genome Sciences on 2017-04-18.
- [R](https://www.r-project.org) and [R Studio](https://www.rstudio.com)
  - R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
  - R Studio Version 1.0.143
  
**Specific files (in order used)**:

- [config file for MSConvert](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_MSConvert_20170412/config_fix_20170413.txt)
  - This file contains all of the information MSConvert needs to demultiplex RAW files. It includes an arugment specific to the isolation scheme used by the mass spectrometer to collect data.
- [.blib file from `pecanpie`](http://owl.fish.washington.edu/spartina/DNR_Skyline_20170524/2017-05-23-oyster-desearleinated.blib)
  - A .blib file is a library of all peptides you want to identify in your samples. For this experiment, the .blib contains the entire *C. gigas* proteome. This is the same .blib file used in my DIA analyses. This file was created by Emma Timmins-Schiffman and her team in `brecan`, a version of `pecanpie` produced by Brian Searle in Genome Sciences. Jarrett Egertson wrote a script to get rid of incompatibilities generated by `brecan`. The final compatible document is found above. For more information regarding .blib creation, see this [lab notebook entry](https://yaaminiv.github.io/Skyline-Attempt-3/).
- [Undigested **FASTA** background proteome, including QC protein sequence](http://owl.fish.washington.edu/spartina/DNR_Skyline_20170427/Combined-gigas-QC.fasta)
- [RAW files](http://owl.fish.washington.edu/spartina/January_2017_DNR_Raw_Data/Oyster_raw_files/)
- [Key for samples](http://owl.fish.washington.edu/spartina/January_2017_DNR_Raw_Data/2017_January_23.csv)

## Step 2: Demultiplex RAW data in MSConvert

DIA mass spectrometry uses overlapping mass-charge (m/z) windows to capture all peptides in a sample. To discern which peptides and transitions were found in each window, RAW files are demultiplexed. This allows us to parse out peptide abundance data from each individual window. Demultiplexing requires use of the MSConvert command line interface.

### Step 2a: Install the MSConvert CLI

Unzip the file in Step 1 in your desired working directory. This should be the same directory with the RAW files to convert.

### Step 2b: Convert RAW files

Use the following code to demultiplex all RAW files and convert them to .mzML files:

`msconvert.exe -c config_fix_20170413.txt *.raw`

## Step 3: Create a new Skyline Document

### Step 3a: Import the Spectral Library

1. Open a new Skyline file ("Blank Document")
2. Under Settings >> Peptide Settings >> Library, click "Edit List." The *C. gigas* .blib should already be on the list (2017-05-23-oyster-desearlinated).
3. If the .blib is not already on the list, click "Add"
 - Name the library and select the .blib file
 - After clicking "OK," select the correct library from the list
4. Under pick peptides matching, select "Library"

<img width="271" alt="screen shot 2017-09-13 at 2 01 34 am" src="https://user-images.githubusercontent.com/22335838/31026852-c1339b8a-a4fc-11e7-8a22-37a549a91524.png">

### Step 3b: Add Background Proteome

1. Settings >> Peptide Settings >> Digestion
2. Select "Gigas-4-27" under "Background proteome"
3. If "Gigas-4-27" is not an option
  - Select "Add" under "Background proteome"
  - Name background
  - Click "Create" under "Proteome file" to choose where to save the background
  - Click "Add file" under "FASTA files", and select your background proteome fasta file
  - After the FASTA loads, Skyline may prompt you about deleting repeated protein sequences. Select "OK"
  
    ![unnamed-1](https://user-images.githubusercontent.com/22335838/31412862-93c9d204-adcb-11e7-9fd4-68eca9957ee7.png)  

4. Select "Trypsin [KR | P]" under "Enzyme

<img width="270" alt="screen shot 2017-09-13 at 2 01 25 am" src="https://user-images.githubusercontent.com/22335838/31027061-b5cd9c0e-a4fd-11e7-94a0-291e42fb8169.png">

### Step 3c: Adjust Peptide Settings

1. Under the Prediction tab, make sure "none" is selected for retention time predictor

    <img width="271" alt="screen shot 2017-09-13 at 2 01 30 am" src="https://user-images.githubusercontent.com/22335838/31027116-e58bdb36-a4fd-11e7-94bf-c3551aee8381.png">

2. Under the Filter tab, click "Auto-select all matching peptides" and ensure 2 and 25 are the Min and Max lengths. Under "Exclude N-Terminal AAs" enter zero.

    <img width="268" alt="screen shot 2017-09-13 at 2 01 32 am" src="https://user-images.githubusercontent.com/22335838/31027164-11eca5d4-a4fe-11e7-8064-30a50f6f150b.png">

3. Under the Modification tab, select "Carbamidomethyl (C)" under structural modifications, "heavy" Isotope label type, and "light" Internal standard type.

    <img width="271" alt="screen shot 2017-09-13 at 2 01 36 am" src="https://user-images.githubusercontent.com/22335838/31027224-55f0238c-a4fe-11e7-8756-6a8139026253.png">

4. Use the following settings under the Quantification tab

    <img width="273" alt="screen shot 2017-09-13 at 2 01 37 am" src="https://user-images.githubusercontent.com/22335838/31027278-82438f64-a4fe-11e7-9683-a57c386f2243.png">

### Step 3d: Populate Analyte Tree

1. Under File >> Import, select "Transition List"
2. Select the transition list (.csv) outlined in Step 1. This will populate the analyte tree only with targeted proteins and quality control (PRTC) peptides.

Skyline will keep the proteins, peptides and transitions that match what it finds in the library provided in Step 2a.

### Step 3e: Adjust Transition Settings

1. **Settings >> Transition Settings >> Prediction**
 - Precursor mass: Monoisotopic
 - Product ion mass: Monoisotopic
 - Collision energy: Thermo TSQ Vantage
 - Declustering potential: None
 - Optimization library: None
 - Compensation voltage: None
 - DO NOT select "Use optimization values when present"
 
 <img width="273" alt="screen shot 2017-09-13 at 2 01 52 am" src="https://user-images.githubusercontent.com/22335838/31027664-d9614fce-a4ff-11e7-9e83-48d95b8d34a0.png">

2. **Settings >> Transition Settings >> Filter >> Peptides**
 - Precursor charges: 2, 3
 - Ion charges: 1, 2
 - Ion types: y
 - Product ion selection
   - From: ion 2
   - To: last ion
   - DO NOT select any Special ions
 - DO NOT specify any Precursor m/z exclusion window
 - DO select "Auto-select all matching transitions"
 
 <img width="271" alt="screen shot 2017-09-13 at 2 01 53 am" src="https://user-images.githubusercontent.com/22335838/31027761-4823f6e6-a500-11e7-9504-3a6fba2c6894.png">
 
3. **Transition Settings >> Library**
 - Ion match tolerance: 0.5 m/z
 - DO NOT select "If a library spectrum is available, pick its most intense ions"
 
 <img width="275" alt="screen shot 2017-09-13 at 2 01 55 am" src="https://user-images.githubusercontent.com/22335838/31027790-708c8a76-a500-11e7-954f-b39b5c70ac12.png">
 
4. **Transition Settings >> Instrument**
 - Min m/z: 100
 - Max m/z: 2000
 - DO NOT select "Dynamic min product m/z"
 - Method match tolerance m/z: 0.055 m/z
 - DO NOT specify any other settings on this tab
 
 <img width="272" alt="screen shot 2017-09-13 at 2 01 57 am" src="https://user-images.githubusercontent.com/22335838/31027851-a62f48ee-a500-11e7-8f99-bb735a401e82.png">
 
5. **Transition Settings >> Full-Scan**
   - Isotope peaks included: Count
   - Precursor mass analyzer: Orbitrap
   - Peaks: 3
   - Resolving power: 60,000
   - At: 400 m/z
   - Isotope labeling enrichment: Default
   - Acquisition method: Targeted
   - Product mass analyzer: Centroided
   - Isolation scheme: None
   - Mass Accuracy: 20 ppm
   - DO NOT select "Use high-sensitivity extraction"
   - Select "Use only scans within 2 minutes of MS/MS IDs"
 
 <img width="271" alt="screen shot 2017-09-13 at 2 01 58 am" src="https://user-images.githubusercontent.com/22335838/31028121-a3242c90-a501-11e7-9443-d0896e6d7dd6.png">

## Step 5: Clean Data

In this step, files without any data will be removed and peaks in Skyline will be verfied against predicted retention times.

### Step 5a: Check that all QC peptides are chosen correctly

### Step 5b: Spot check peptides

### Step 5c: Export data

To proceed with downstream analyses, data must be exported from Skyline as a .csv.

Under File > Export > Report, use the following settings to export Skyline data as a .csv. However, do not include the "Total Ion Current Area" option.

![30132381-03f87a94-9305-11e7-8dfa-812e738abbd0](https://user-images.githubusercontent.com/22335838/30983237-ad0f3056-a43e-11e7-8d51-d99207d262e1.png)

Exported data can be found [here](http://owl.fish.washington.edu/spartina/DNR_SRM_20170728/Analyses/2017-09-12-Gigas-SRM-ReplicatesOnly-PostDilutionCurve-NoPivot-RevisedSettings-Report.csv).

### Step 5d: Normalize peak areas