Skip to content

Commit

Permalink
Merge pull request #37 from WangHong007/master
Browse files Browse the repository at this point in the history
Fix bugs and add quantile normalization considers NA values
  • Loading branch information
ypriverol committed Dec 6, 2023
2 parents 281c4de + fbdb09b commit 15b5407
Show file tree
Hide file tree
Showing 7 changed files with 822 additions and 1,088 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ E.g. http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute
python peptide_normalization.py --msstats PXD003947.sdrf_openms_design_msstats_in.csv --sdrf PXD003947.sdrf.tsv --remove_ids data/contaminants_ids.tsv --remove_decoy_contaminants --remove_low_frequency_peptides --output PXD003947-peptides-norm.csv
```

The command provides an additional `flag` for skip_normalization, pnormalization, compress, log2, violin, verbose.
The command provides an additional `flag` for skip_normalization, pnormalization, compress, log2, violin, verbose. If you use feature parquet as input, you can pass the `--sdrf`.

```asciidoc
Usage: peptide_normalization.py [OPTIONS]
Expand All @@ -74,6 +74,8 @@ Options:
-m, --msstats TEXT MsStats file import generated by quantms
-p, --parquet TEXT Parquet file import generated by quantmsio
-s, --sdrf TEXT SDRF file import generated by quantms
--stream Stream processing normalization
--chunksize The number of rows of MSstats or parquet read using pandas streaming
--min_aa INTEGER Minimum number of amino acids to filter
peptides
--min_unique INTEGER Minimum number of unique peptides to filter
Expand All @@ -88,9 +90,9 @@ Options:
properties for normalization
--skip_normalization Skip normalization step
--nmethod TEXT Normalization method used to normalize
intensities for all samples (options: qnorm)
intensities for all samples (options: msstats, quantile, qnorm)
--pnormalization Normalize the peptide intensities using
different methods (options: qnorm)
different methods (options: quantile, qnorm)
--compress Read the input peptides file in compress
gzip file
--log2 Transform to log2 the peptide intensity
Expand Down Expand Up @@ -129,7 +131,7 @@ The first step is to remove contaminants and decoys. The script `peptide_normali

A peptidoform is a combination of a `PeptideSequence(Modifications) + Charge + BioReplicate + Fraction`. In the current version of the file, each row correspond to one peptidoform.

The current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method.
The current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method. However, the current version of qnorm can not handle NA values which will lead to cause more NA values in data. We suggest users to use default method 'quantile' instead for now.

#### 3. Peptidoform to Peptide Summarization

Expand Down
3 changes: 1 addition & 2 deletions bin/compute_tpa.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,8 @@
from matplotlib.backends.backend_pdf import PdfPages
from pyopenms import *

from bin.compute_ibaq import print_help_msg
from ibaq.ibaqpy_commons import (CONDITION, NORM_INTENSITY, PROTEIN_NAME, SAMPLE_ID,
plot_box_plot, plot_distributions,
plot_box_plot, plot_distributions, print_help_msg,
remove_contaminants_decoys, get_accession)


Expand Down
Loading

0 comments on commit 15b5407

Please sign in to comment.