v0.6.0
Change Log
For Version 0.6.0
- Updated nbdev1 to nbdev2
- Updated documentation notebooks
- Expanded documentation examples for (i) networks and (ii) deep learning models
glycan_data
- Updated v7_sugarbase and associated files + models
- Improved Cellosaurus ID prefixes
- Added glycan composition as a new column to sugarbase
- Exchanged ‘z’ with ‘?’ as a linkage uncertainty indicator
- Added protein column to glycan_binding, indicating the protein name whose sequence is in the target column
loader
- Added “Ins” and “Galf” to Hex list
- Added stringify_dict utils function to convert a dictionary into a string
motif
- Changed functions to use “?” as a linkage uncertainty indicator rather than “z”
processing
- Added enforce_class to check whether glycan is from desired glycan class
- Added IUPAC_to_SMILES to convert glycans from IUPAC-condensed into SMILES via GlyLES
graph
- glycan_to_nxGraph can now use glycan strings with floating substituents, such as “{Neu5Ac(a2-3)}Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc”
- added get_possible_topologies and possible_topology_check to probe whether glycans (could) match a glycan with floating substituents
- added ensure_graph to allow functions to be dual-compatible for string & graph inputs
- generate_graph_features, largest_subgraph, get_possible_topologies, and possible_topology_check are now dual-compatible with string & graph inputs
tokenization
- Refactor match_composition_relaxed to be slightly faster & a much smaller function, that uses glycan_to_composition for matching
- Deprecated match_composition accordingly
- mz_to_composition is now up to 100x faster, based on much better defaults / assumptions
- added support for free oligosaccharides to mz_to_composition
- added mz_to_composition2 as an alternative way of composition matching; better scaling and “more physiological” as it’s constrained by class-specific existing compositions within sugarbase
- glycan_to_composition can now also handle post-biosynthetic modifications such as sulfation
- added composition_to_mass
- Improve linkage uncertainty handling in canonicalize_iupac
- canonicalize_iupac now can handle sulfation and phosphorylation
- updated stemify_glycan & structure_to_basic to correctly handle glycans of length 1
- updated stemify_glycan to terminate the while loop if it would result in infinite loops
- updated glycan_to_composition to support floating substituents
- get_core now also handles “Ins” correctly
- calculate_theoretical_mass now can also handle methylation modifications correctly
- improved reducing end calculation for modified glycans in calculate_theoretical_mass
- added speed-up option to calculate_theoretical_mass & glycan_to_composition for non-exotic glycans
- refactored calculate_theoretical_mass to use composition_to_mass
annotate
- add get_terminal_structures to extract monosaccharide+linkage from all non-reducing ends of glycan
- improved runtime and completeness for get_k_saccharides
- get_terminal_structures & get_k_saccharides are now also both dual-compatible with string & graph inputs
- added get_molecular_properties to obtain chemical features of glycans via SMILES
- ‘chemical’ is a new option in feature_set of annotate_dataset, using get_molecular_properties
- small style fix in motif_matrix to avoid warning
- link_find (and downstream annotation findings) now also support floating substituents
analysis
- add cohen_d to calculate effect size between two comparison groups
- ‘chemical’ is a new option in feature_set of get_pvals_motifs and make_heatmap, using get_molecular_properties
ml
model_training
- added the option to use GSAM instead of SAM for the optimizer by specifying alpha in training_setup
models
- streamlined SweetNet architecture (credit to David Alexander) used in SweetNet and LectinOracle faster training and clearer code
network
biosynthesis
- added a dictionary of pre-calculated glycan graphs to construct_network and underlying functions ~2x speed-up and better scaling
- various other performance improvements to network construction functions further increase speed
- improved pruning of virtual root nodes in construct_network
- modified export_network to allow for custom node attribute extraction
- generalized find_diamonds to allow for extraction of diamonds, hexagons, etc with a custom parameter nb_intermediates (default: 2, for diamonds)
- generalized choose_path to compute path probabilities for non-diamond shape motifs
evolution
- small fix in calculate_distance_matrix