# Compute p-values from mass spectrometry data

In [1]:
#import basic modules
import glob
import numpy as np
import pandas as pd
import os

import regseq.utils

First we load in all the file names for protein groups. The files contain the normalized heavy to light ratios for all
#identified proteins.

In [2]:
allnames = glob.glob('../data/massspec/proteinGroups*.txt')
# File that gives an error message
allnames.remove("../data/massspec/proteinGroups_Oct5v2.txt")

We can also load in the protein group files for only the targets we will be summarizing.

In [3]:
all_filtered = glob.glob('../data/massspec/filtered/*')
all_filtered

[]

We will format an output dataframe that contains a mean value and variance the most highly enriched protein and for all background proteins.

In [4]:
#create a dataframe for pvals
out_pval = pd.DataFrame(columns=['pval'])

#format the output look of each dataframe.
pd.set_option('max_colwidth', 999)
pd.set_option('display.float_format', '{:10,.9f}'.format)

We will loop through all enriched proteins displayed in the figures in the Reg-Seq paper. The following function stores the resilts 

In [5]:
regseq.utils.cox_mann_p_values(allnames)

In [6]:
with open('test_pval.txt') as f:
    for line in f:
        print(line.strip())

Gene: nan, Ratio H/L normalized: 0.04156737421458918
Gene: Uncharacterized protein YciY, Ratio H/L normalized: 4.0905076191844943e-25
Gene: nan, Ratio H/L normalized: 7.697686989943431e-08
Gene: nan, Ratio H/L normalized: 0.5445888606520145
Gene: nan, Ratio H/L normalized: 1.481930862706421
Gene: nan, Ratio H/L normalized: 4.2487795175108825e-05
Gene: nan, Ratio H/L normalized: 1.7054642629426723e-05
Gene: nan, Ratio H/L normalized: 1.703257594770442e-05
Gene: nan, Ratio H/L normalized: 0.0009230952646035802
Gene: nan, Ratio H/L normalized: 21.581386214107415
Gene: nan, Ratio H/L normalized: 4.669167169207635e-06
Gene: nan, Ratio H/L normalized: 2.9399170892494926
Gene: nan, Ratio H/L normalized: 2.2455639864951904e-65
Gene: nan, Ratio H/L normalized: 9.88962448267779e-10
Gene: nan, Ratio H/L normalized: 0.21097364249287381
Gene: nan, Ratio H/L normalized: 3.6512191316700407e-06
Gene: Putative protein YmiB, Ratio H/L normalized: 2.0915428294698634e-08
Gene: nan, Ratio H/L normalized: 1

Finally, here are the versions of packages used in this notebook. To display the versions, we are using the Jupyter Lab extension `watermark`, which can be found [here](https://github.com/rasbt/watermark).

## Computing Environment

In [7]:
%load_ext watermark
%watermark -v -p jupyterlab,pandas,numpy,regseq

CPython 3.6.9
IPython 7.13.0

jupyterlab not installed
pandas 1.0.3
numpy 1.18.1
regseq 0.0.2
