Skip to content

v0.6.0

Choose a tag to compare

@Bribak Bribak released this 09 Dec 13:06
· 929 commits to master since this release
1975edc

Change Log

For Version 0.6.0

  • Updated nbdev1 to nbdev2
  • Updated documentation notebooks
  • Expanded documentation examples for (i) networks and (ii) deep learning models

glycan_data

  • Updated v7_sugarbase and associated files + models
  • Improved Cellosaurus ID prefixes
  • Added glycan composition as a new column to sugarbase
  • Exchanged ‘z’ with ‘?’ as a linkage uncertainty indicator
  • Added protein column to glycan_binding, indicating the protein name whose sequence is in the target column

loader

  • Added “Ins” and “Galf” to Hex list
  • Added stringify_dict utils function to convert a dictionary into a string

motif

  • Changed functions to use “?” as a linkage uncertainty indicator rather than “z”

processing

  • Added enforce_class to check whether glycan is from desired glycan class
  • Added IUPAC_to_SMILES to convert glycans from IUPAC-condensed into SMILES via GlyLES

graph

  • glycan_to_nxGraph can now use glycan strings with floating substituents, such as “{Neu5Ac(a2-3)}Gal(b1-4)GlcNAc(b1-6)[Gal(b1-3)]GalNAc”
  • added get_possible_topologies and possible_topology_check to probe whether glycans (could) match a glycan with floating substituents
  • added ensure_graph to allow functions to be dual-compatible for string & graph inputs
  • generate_graph_features, largest_subgraph, get_possible_topologies, and possible_topology_check are now dual-compatible with string & graph inputs

tokenization

  • Refactor match_composition_relaxed to be slightly faster & a much smaller function, that uses glycan_to_composition for matching
  • Deprecated match_composition accordingly
  • mz_to_composition is now up to 100x faster, based on much better defaults / assumptions
  • added support for free oligosaccharides to mz_to_composition
  • added mz_to_composition2 as an alternative way of composition matching; better scaling and “more physiological” as it’s constrained by class-specific existing compositions within sugarbase
  • glycan_to_composition can now also handle post-biosynthetic modifications such as sulfation
  • added composition_to_mass
  • Improve linkage uncertainty handling in canonicalize_iupac
  • canonicalize_iupac now can handle sulfation and phosphorylation
  • updated stemify_glycan & structure_to_basic to correctly handle glycans of length 1
  • updated stemify_glycan to terminate the while loop if it would result in infinite loops
  • updated glycan_to_composition to support floating substituents
  • get_core now also handles “Ins” correctly
  • calculate_theoretical_mass now can also handle methylation modifications correctly
  • improved reducing end calculation for modified glycans in calculate_theoretical_mass
  • added speed-up option to calculate_theoretical_mass & glycan_to_composition for non-exotic glycans
  • refactored calculate_theoretical_mass to use composition_to_mass

annotate

  • add get_terminal_structures to extract monosaccharide+linkage from all non-reducing ends of glycan
  • improved runtime and completeness for get_k_saccharides
  • get_terminal_structures & get_k_saccharides are now also both dual-compatible with string & graph inputs
  • added get_molecular_properties to obtain chemical features of glycans via SMILES
  • ‘chemical’ is a new option in feature_set of annotate_dataset, using get_molecular_properties
  • small style fix in motif_matrix to avoid warning
  • link_find (and downstream annotation findings) now also support floating substituents

analysis

  • add cohen_d to calculate effect size between two comparison groups
  • ‘chemical’ is a new option in feature_set of get_pvals_motifs and make_heatmap, using get_molecular_properties

ml

model_training

  • added the option to use GSAM instead of SAM for the optimizer by specifying alpha in training_setup

models

  • streamlined SweetNet architecture (credit to David Alexander) used in SweetNet and LectinOracle  faster training and clearer code

network

biosynthesis

  • added a dictionary of pre-calculated glycan graphs to construct_network and underlying functions  ~2x speed-up and better scaling
  • various other performance improvements to network construction functions further increase speed
  • improved pruning of virtual root nodes in construct_network
  • modified export_network to allow for custom node attribute extraction
  • generalized find_diamonds to allow for extraction of diamonds, hexagons, etc with a custom parameter nb_intermediates (default: 2, for diamonds)
  • generalized choose_path to compute path probabilities for non-diamond shape motifs

evolution

  • small fix in calculate_distance_matrix