Implement additional methods for feature normalization #41

ypriverol · 2024-01-08T23:06:21Z

@WangHong007 would be good to implement additional methods for features normalization apart from quantile. Quantile method is quite strong method, removing most of the variability across peptides. Would be good to enable other normalization methods (e.g median) which less impact on the data. Here a good research paper about peptide normalization

WangHong007 · 2024-01-17T11:39:34Z

We tried several normalization method: msstats, qnorm (fast quantile), and now additional MedScale. Here are some points we need to focus:

For now, the object of normalization we apply is peptidoform (feature), and the normalization of peptide is optional.
In peptidoform normalization, should remove_low_frequency_peptides first, followed by peptidoform selection and polymerization, and finally normalized? When comparing several normalization methods, the global standard deviation increased after these steps.
We dropna many times in this process, but when we do the PivotTable, it produced many null values, which affects the result of the aggregate function.

What are the effects of fractions, biological replication, conditions, and Run in the sample? For example, biological repeats could map to samples (one to one), it will increase impossible combinations if any other index value shouldn’t appear in this biological replication. And that’s how pandas pivot_table works. Here before normalization (except msstats normalization):

ibaqpy/bin/peptide_normalization.py

Lines 339 to 356 in 15b5407

    
           normalize_df = pd.pivot_table( 
        
               dataset, 
        
               index=[ 
        
                   PEPTIDE_SEQUENCE, 
        
                   PEPTIDE_CANONICAL, 
        
                   PEPTIDE_CHARGE, 
        
                   FRACTION, 
        
                   RUN, 
        
                   BIOREPLICATE, 
        
                   PROTEIN_NAME, 
        
                   STUDY_ID, 
        
                   CONDITION, 
        
               ], 
        
               columns=class_field, 
        
               values=field, 
        
               aggfunc={field: np.mean}, 
        
               observed=True, 
        
           )

ypriverol assigned WangHong007 Jan 8, 2024

ypriverol added enhancement New feature or request good first issue Good for newcomers labels Jan 8, 2024

ypriverol mentioned this issue Jan 8, 2024

Research plan for ibaqpy library #38

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement additional methods for feature normalization #41

Implement additional methods for feature normalization #41

ypriverol commented Jan 8, 2024

WangHong007 commented Jan 17, 2024

Implement additional methods for feature normalization #41

Implement additional methods for feature normalization #41

Comments

ypriverol commented Jan 8, 2024

WangHong007 commented Jan 17, 2024