v0.8.0
Change Log
For Version 0.8.0
- Linted the package with flake8
- Increased code coverage
- Added another optional extras install, [chem], including glyles, requests, and pubchempy
glycan_data
- Changed
libto be a dict of type glycoletters:index, as it’s faster to index a dict vs. a long list; also adapted all functions usinglibto reflect this change
loader
- Added
replace_every_secondhelper function - Updated
linkageslist - Changed
linkagesandHexetc to be sets instead of lists
motif
processing
- Added
variance_stabilizationfor variance stabilization normalization, both globally and group-specific - Added
in_libhelper function to check whether all glycoletters of glycan are in lib - Deprecated
small_motif_find cohen_dnow also returns the variance of the effect size and supports paired samples as well (calculating Cohen’s dz in this case)- Added
mahalanobis_distanceto calculate Mahalanobis distance as an effect size for multivariate comparisons - Added
mahalanobis_varianceto estimate variance of Mahalanobis distance via bootstrapping - Added
MissForestfor random forest based data imputation - Cleaned up
canonicalize_iupacand made it slightly faster - Added
variance_based_filtering - Added
impute_and_normalizeand underlying helper functions - Fixed numpy random seed for reproducibility
- Sped-up
presence_to_matrix
tokenization
- Deprecated
mz_to_composition mz_to_composition2is now the newmz_to_composition- Adapted
mz_to_structures,compositions_to_structures, andmatch_composition_relaxedto work with this change
annotate
- Added
create_correlation_networkto identify clusters of highly correlated glycans/motifs - Added
count_unique_subgraphs_of_size_kas a helper function withinget_k_saccharides - Refactor
get_k_saccharidesto be faster and more complete (and be, effectively, a replacement ofmotif_matrix) annotate_datasetnow usesget_k_saccharidesfor mono- and disaccharides, instead ofmotif_matrix- Deprecated
motif_matrix annotate_datasetnow also creates relevant ?-containing motifs if ‘terminal’ in feature_set, even if they don’t explicitly occur in the glycan strings- Big speed-up for
annotate_datasetif known=True, as we now cache the precalculated motif graphs - Added
quantify_motifsas a wrapper aroundannotate_datasetto adequately distribute relative abundances across extracted motifs - Deprecated
estimate_lower_boundas speed-ups make it no longer necessary
analysis
- Renamed
make_heatmaptoget_heatmap - Renamed
make_volcanotoget_volcano - Deprecated
replace_zero_with_random_gaussian(this is now handled byMissForestin .processing withinimpute_and_normalize) - Added
hotellings_t2for multivariate comparisons - Changed multiple-testing correction method from Holm-Sidak to Benjamini-Hochberg
- Added
variance_stabilizationinget_differential_expression - Added the option to analyze highly correlated sets of glycans/motifs (via
create_correlation_network) withinget_differential_expression - Implemented usage of
hotellings_t2and the Mahalanobis distance (as effect size) for usage if sets are analyzed withinget_differential_expression get_heatmapandget_differential_expressionnow scale abundances by the actual counts of motifs per glycan, not just absence/presence- Added
get_meta_analysisto estimate combined effect sizes from the results of multiple studies (both fixed-effects and random-effects models can be estimated) - Added
variance_based_filteringinget_differential_expression - Effect size variances can now also be retrieved within
get_differential_expressionvia the effect_size_variance keyword argument get_differential_expressionnow also can handle paired samples when paired=Trueget_differential_expressionnow also tests the homogeneity of variances using Levene’s test in all settings (also multiple-testing controlled)- Added
get_glycanovato use ANOVA-based analyses on glycomics datasets (uses basically all the improvements ofget_differential_expression, including analysis on the motif level) - Added
get_pcato plot glycomics data (also has the motif interface) - Added
get_pval_distributionto plot the distribution of p-values - Added
get_mato plot a Bland-Altman plot - Added
get_glycan_change_over_timeto detect significant changes in time-course data via OLS fitting - Added
get_time_seriesas a wrapper aroundget_glycan_change_over_timeto do time series analyses, with all the motif & normalization functionality - Added
get_coverageto visualize glycan expression across samples (ordered by average intensity) in a coverage plot
draw
- Added import warning if draw dependencies are not installed
- Removed
pycairofrom dependencies - Modified
annotate_figureto be compatible with .svg files from older Matplotlib versions - Changed “output” to “filepath” in
GlycoDraw - If there are “?” in the provided filepath for
GlycoDraw, they will now be automatically replaced with “_” to avoid saving errors
graph
- Sped-up
glycan_to_graph/glycan_to_nxGraph(and all downstream functions, which are a lot) - Also improved the runtime of downstream functions, such as
subgraph_isomorphismindependent of these advances subgraph_isomorphismnow also accepts precalculated motif graph as inputs (in addition to the already supported precalculated glycan graphs)
ml
- Rephrased import warnings to reflect optional install strategy for extra dependencies
model_training
- Sped-up
train_ml_model
network
biosynthesis
create_neighborsno longer uses the libr keyword