v0.7.0
Change Log
For Version 0.7.0
- Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use
glycowork - Added optional installs for specialized
glycoworkusage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs
glycan_data
Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings
loader
- Added
multireplacehelper function, to map a dictionary of changes to a string - Made
build_custom_dffaster
motif
draw
- Added
drawas a new submodule of.motif - Added
GlycoDrawto draw glycans in SNFG style and save them as .svg/.pdf - Added
annotate_figureto replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.) - Added
text_to_glycan, which replaces glycan strings in figures with glycan images - Added
scale_in_rangeto normalize a list of numbers within a range
tokenization
- Sped up
glycan_to_compositionby 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions usingglycan_to_composition - Sped up
composition_to_mass(independent of the above) glycan_to_composition(and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN- Renamed
calculate_theoretical_masstoglycan_to_mass - Sped up
mz_to_composition2by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom - Reprioritized
mz_to_composition2by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions canonicalize_iupacnow also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich- Moved
canonicalize_iupacintomotif.processing - Expanded
get_core(and downstream functions) with HexA, HexNAc, dHex - Expanded
map_to_basicto (some) post-biosynthetic modifications mz_to_structuresno longer outright fails if no m/z value can be matched- Deprecated
structures_to_motifs;annotate_datasetcan do the same
processing
- Fixed bug in processing glycans with floating substituents in
small_motif_find - Deprecated
seed_wildcard choose_correct_isoformhas been updated to keep up with the improvedfind_isomorphs- Added more informative error message to
IUPAC_to_SMILES get_libis now slightly faster
graph
- Sped up
compare_glycanswith string inputs, by avoiding graph operations when the two glycans do not have the same composition - Added support for enabling modification wildcards in
compare_glycansandsubgraph_isomorphism(for instance matching GalOS and Gal6S) by setting wildcards_ptm = True - Speed-up
glycan_to_nxGraph_intby optimizing node label/attribute assignments - Refactor
graph_to_stringto be a lot more robust, streamlined, and faster. Its new integration withcanonicalize_iupacmay also result in string improvement upon back-translation (e.g., branch order canonicalization) ensure_graphnow has **kwargs that get passed toglycan_to_nxGraphget_possible_topologiesnow supports internal additions as well, with the keyword argument ‘exhaustive’possible_topology_checknow supports wildcard matching via **kwargs passed on tocompare_glycans- Made changes to make
glycoworkcompatible with NetworkX 3.0 - Moved
bracket_removaltomotif.processing - Fixed a small inconsistency in handling floating substituents in
glycan_to_nxGraph_intthat could have caused issues with custom libs override_reducing_endis no longer needed inglycan_to_nxGraphto delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred withinglycan_to_nxGraphnow
annotate
- Deprecated
convert_to_counts_glycoletterandglycoletter_count_matrix;motif_matrixcan do both - Refactored
motif_matrixto be substantially faster and more condensed in its output (also speeds upannotate_datasetwith the ‘exhaustive’ option in the feature_set argument) - Expanded
motif_matrixto implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former) annotate_glycanis now dual-compatible with string and networkx graph input- expanded feature_set in
annotate_datasetby the option ‘terminal’, which callsget_terminal_structures - This usage of
get_terminal_structuresinannotate_datasetnow also does the same implicit test for subsumption enrichment as described formotif_matrixabove annotate_datasetnow creates its own lib, based on the motif list and the provided glycans- Expanded
find_isomorphsto also be able to re-shuffle (some) branched branches - Moved
find_isomorphsintomotif.processing - Linkages-only are no longer considered by
motif_matrix/annotate_dataset
analysis
- All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
- Added
get_differential_expressionto compare glycomics data, including data cleaning and imputation get_pvals_motifsandmake_heatmapno longer have the lib keyword argument, asannotate_datasetwill generate a suitable lib internally- Fixed relative abundance summation in motif-mode for
make_heatmap - Added the
clean_up_heatmaphelper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs - Added
make_volcano, to generate a volcano plot from internally calculated differential expression using theget_differential_expressionfunction - Moved
cohen_dintomotif.processing
ml
model_training
train_ml_modelno longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally
network
biosynthesis
- Refactored
construct_networkpipeline to be faster and more memory-efficient reducing_endhas been deprecated and is being handled internally- Added
infer_rootsto auto-inferpermitted_roots(also does not need to be specified any longer inconstruct_network) - Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
- Deprecated
subgraph_to_stringandmake_network_from_edges - Deprecated
fill_with_virtualsandmake_network_directed - Minor speed-up of
process_ptm, by pre-calculating stem_lib once instead of for every glycan in network