v1.2.0
Change Log
For Version 1.2.0
- Added
glycoworkGUI.pyto build the .exe based GUI for important glycowork endpoint functions:GlycoDraw,plot_glycans_excel, andget_differential_expression - Removed
python-louvainas a required dependency forglycowork
glycan_data
loader
- Switched from
pkg_resourcestoimportlibfor loading tabular data into the package
stats - Fixed an issue in
TST_grouped_benjamini_hochbergthat caused errors if nothing was significantly different in the entire dataset or in any group test_inter_vs_intra_groupingis now robust to non-paired data and data with differing sample sizes per condition- Added
replace_outliers_with_IQR_boundsto support outlier treatment inmotif.analysis - Added
sequence_richness,shannon_diversity_index, andsimpson_diversity_indexto calculate diversity indices of glycomics data
motif
processing
- WURCS handling for universal input now encompass more monosaccharides
- GlycoCT handling for universal input now is robust to the declaration of substituents not immediately following their monosaccharide in the GlycoCT string
- Added
equal_repeatsto check whether two repeating units of a polysaccharide are the same, just shifted - Modified glycan nomenclature detection in
canonicalize_iupacto be less prone of overidentifying Oxford when it’s just numbers etc. - Added “ß” to the typo detection in
canonicalize_iupacand “(-)” as a variation of linkage uncertainty detection - Made
canonicalize_iupacrobust to the variation of using {} instead of () for linkages
graph
- Removed the required usage of lib in
glycan_to_nxGraph,compare_glycans,subgraph_isomorphism, and all downstream functions (lib only remains for stemification and deep learning model training/inference) - The keyword argument “wildcards_ptm” now also works as intended when providing pre-calculated graphs as input to
compare_glycansorsubgraph_isomorphism - Fixed a rare issue in which
subgraph_isomorphism, when “count = False”, would sometimes erroneously output “False” because of a greedy approach to evaluating potential matches
tokenization
- Added
get_unique_topologiesto retrieve all base topologies for a given composition that have been observed for a given taxonomic subset - Added the “obfuscate_ptm” keyword argument to
map_to_basic, to allow for mapping Gal6S to Hex6S rather than the default HexOS, if that is required/advantageous - Support mapping of phosphorylated glycans in
map_to_basic
draw
- Fixed an issue where cross-ring fragments were not correctly rendered in
GlycoDraw plot_glycans_excelcan now also be used with filepaths to .xlsx files (in addition to .csv files)plot_glycans_excelnow also supports compact glycan drawing with the “compact” keyword argument- Improved drawing resolution in
plot_glycans_excel GlycoDrawwill now more strongly make use of nomenclature canonicalization in case of IUPAC dialects (still not 100%, if you suspect you use a dialect of IUPAC, pass your sequences throughcanonicalize_iupacfirst)- If no filepath is specified,
GlycoDrawwill now also display drawn glycan structures in a non-Jupyter environment (as the classic matplotlib pop-up). Note that this functionality requires the cairosvg dependency (head to https://bojarlab.github.io/glycowork/examples.html#glycodraw-code-snippets if you’re unsure about that)
analysis
- Functions able to use .csv paths as input can now also deal with .xlsx paths as input
- The new “annotate_volcano” keyword argument now allows for the direct insertion of SNFG images within plots from
get_volcanowithout having to subsequently rundraw.annotate_figure get_pvals_motifs,get_differential_expression,get_glycanova,get_time_series, andget_jtknow useglycan_data.stats.replace_outliers_with_IQR_boundsto auto-smooth outliers- Moved
hotellings_t2toglycan_data.stats - All functions compatible with motif-level analysis now accept the “custom_motifs” keyword argument to be passed to
annotate_datasetorquantify_motifsif “custom” is included in “feature_set” - Changed the “mode” keyword argument in
get_heatmapto “motifs” as a Boolean argument, like in all othermotif.analysisfunctions - Added a call to
clean_up_heatmaptoget_jtkto avoid redundant motifs - Added
get_biodiversityto compare two groups of glycomics datasets with regard to the sequence diversity that is present (similar to comparable analyses for microbiome data)
regex
- Added
filter_dealbreakersto allow for the exclusion of identified matches if they have illegal components beyond the identified match (e.g., the forbidden Fuc in "Fuc-([Gal|GalNAc])?-Gal-([!Fuc]){,1}-GlcNAc"). Before this, the sequence context except the Fuc was extracted and returned. - Fixed an edge case in
filter_matches_by_locationin which internal locations sometimes had to handle triple-nested lists which led to errors get_matchcan now also use glycan graphs, such as derived fromglycan_to_nxGraph, as input- Added
get_match_batchto process a whole list of glycans at once, with some performance improvements via first pre-compiling the pattern - Fixed an edge case in
get_matchin which pattern components consisting of a single monosaccharide with a specified linkage (e.g., “Fuca3”) could sometimes erroneously output no matches - Added
motif_to_regexto convert glycan motifs (e.g., in IUPAC-condensed) into a regular expression suitable forget_match. Limited to simple queries for now.
annotate
get_terminal_structuresnow has a “size” keyword argument with which users can control the size of the extracted terminal motifsget_k_saccharidesnow has a “terminal” keyword argument with which users can filter to only count motifs at non-reducing endsannotate_datasetand functions using it now can add the “terminal2” and “terminal3” option in “feature_set” to also annotate & analyze terminal motifs of size 2 (e.g., Neu5Ac(a2-3)Gal(b1-4)) or size 3 (e.g., Neu5Ac(a2-3)Gal(b1-4)GlcNAc)
network
biosynthesis
- Added the possibility of providing abundances to
construct_networkthat are then stored as node attributes in the network - Added
add_high_man_removalas a post-processing step inconstruct_networkto allow for the addition of reactions removing mannoses from high-Man N-glycans occurring during maturation - Added
estimate_weightsandget_edge_weight_by_abundanceto estimate reaction capacities from abundances + estimate missing abundances - Added
get_maximum_flow,get_max_flow_path, andget_reaction_flowto calculate maximum flow paths between network root and endpoints as well as aggregate the flow by reaction type - Added
get_differential_biosynthesisas a wrapper function to compare two groups of glycomes/networks with regard to their biosynthesis (differential flow paths or differential reaction flows) - Fixed an issue in
construct_networkin which sometimes nodes with outgoing but no incoming connections were not detected as unconnected nodes, leading to incomplete networks - Added the
rescue_glycansdecorator toconstruct_network, to allow for auto-fixing nomenclature variations - Improved performance of
construct_networkby reducing wasteful computation
evolution
- Switched
get_communitiesfrom usingpython-louvainto the Louvain implementation innetworkx