Merge pull request #37 from WangHong007/master

Fix bugs and add quantile normalization considers NA values
bigbio · Dec 6, 2023 · 15b5407 · 15b5407
2 parents 281c4de + fbdb09b
commit 15b5407
Show file tree

Hide file tree

Showing 7 changed files with 822 additions and 1,088 deletions.
diff --git a/README.md b/README.md
@@ -65,7 +65,7 @@ E.g. http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/absolute
 python peptide_normalization.py --msstats PXD003947.sdrf_openms_design_msstats_in.csv --sdrf PXD003947.sdrf.tsv --remove_ids data/contaminants_ids.tsv --remove_decoy_contaminants --remove_low_frequency_peptides --output PXD003947-peptides-norm.csv
 ``` 
 
-The command provides an additional `flag` for skip_normalization, pnormalization, compress, log2, violin, verbose.
+The command provides an additional `flag` for skip_normalization, pnormalization, compress, log2, violin, verbose. If you use feature parquet as input, you can pass the `--sdrf`.
 
 ```asciidoc
 Usage: peptide_normalization.py [OPTIONS]
@@ -74,6 +74,8 @@ Options:
   -m, --msstats TEXT              MsStats file import generated by quantms
   -p, --parquet TEXT              Parquet file import generated by quantmsio
   -s, --sdrf TEXT                 SDRF file import generated by quantms
+  --stream                        Stream processing normalization
+  --chunksize                     The number of rows of MSstats or parquet read using pandas streaming
   --min_aa INTEGER                Minimum number of amino acids to filter
                                   peptides
   --min_unique INTEGER            Minimum number of unique peptides to filter
@@ -88,9 +90,9 @@ Options:
                                   properties for normalization
   --skip_normalization            Skip normalization step
   --nmethod TEXT                  Normalization method used to normalize
-                                  intensities for all samples (options: qnorm)
+                                  intensities for all samples (options: msstats, quantile, qnorm)
   --pnormalization                Normalize the peptide intensities using
-                                  different methods (options: qnorm)
+                                  different methods (options: quantile, qnorm)
   --compress                      Read the input peptides file in compress
                                   gzip file
   --log2                          Transform to log2 the peptide intensity
@@ -129,7 +131,7 @@ The first step is to remove contaminants and decoys. The script `peptide_normali
 
 A peptidoform is a combination of a `PeptideSequence(Modifications) + Charge + BioReplicate + Fraction`. In the current version of the file, each row correspond to one peptidoform. 
 
-The current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method. 
+The current version of the tool uses the parackage [qnorm](https://pypi.org/project/qnorm/) to normalize the intensities for each peptidofrom. **qnorm** implements a quantile normalization method. However, the current version of qnorm can not handle NA values which will lead to cause more NA values in data. We suggest users to use default method 'quantile' instead for now.
 
 #### 3. Peptidoform to Peptide Summarization
 

diff --git a/bin/compute_tpa.py b/bin/compute_tpa.py
@@ -9,9 +9,8 @@
 from matplotlib.backends.backend_pdf import PdfPages
 from pyopenms import *
 
-from bin.compute_ibaq import print_help_msg
 from ibaq.ibaqpy_commons import (CONDITION, NORM_INTENSITY, PROTEIN_NAME, SAMPLE_ID,
-                                 plot_box_plot, plot_distributions,
+                                 plot_box_plot, plot_distributions, print_help_msg,
                                  remove_contaminants_decoys, get_accession)