v0.5.0
Change Log
For Version 0.5.0
- added more in-line documentation to all functions/modules
glycan_data
- df_species is now being generated internally from df_glycan and is no longer a separate file
- added build_custom_df to generate df_species, df_tissue, and df_disease from sugarbase/df_glycan
- We are retiring ‘bond’. Instead, the default for full linkage uncertainty is now z1-z / z2-z. Replace z with ? for full compatibility with IUPAC-condensed
- The ethanolamine modification (previously Etn) is now EtN for consistency with the style of other modifications
- tissue associations now have either Uberon IDs (tissues etc.) or Cellosaurus IDs (cell lines)
- disease associations now have a Disease Ontology ID
- tissue and disease associations now also have a species designation (in tissue_species and disease_species, respectively)
- the internal lib is now a .pkl file instead of being calculated each time the package is loaded
- shifted glycan_representations_species.pkl into .motif, where it will be loaded upon calling .motif.analysis.plot_embeddings
- shifted df_glysum into .alignment, where it will be loaded upon calling .alignment.glysum.pairwiseAlign
- it should be noted that we may deviate more and more from the provided GlyTouCan IDs, as we strive towards removing unnecessary uncertainty (e.g., specifying the core Fuc as alpha, regardless of whether it has been denoted as alpha in the official GlyTouCan entry)
- updated positional information in motif_list to account for new graph generation output
loader
- Deprecated load_file
motif
tokenization
- added mz_to_composition to match m/z values from glycomics with possible monosaccharide compositions
- added mz_to_structures wrapper to directly go from m/z values to matching glycan sequences
- changed some required arguments to optional arguments in compositions_to_structures and mz_to_structures (the default is now human glycans with no additional relative intensities)
- fixed an issue in compositions_to_structures in which an error was returned if none of the provided compositions had any structure matches
- update stemify_glycan to the z-nomenclature for linkage uncertainty
- compositions_to_structures now allows for input of custom Hex, HexNAc, and dHex lists
- condense_composition_matching is updated to the z-linkage uncertainty nomenclature
- sped up composition matching by only considering glycans with correct number of monosaccharides
- added canonicalize_iupac to allow for conversion of other IUPAC “flavors” into the version of IUPAC-condensed nomenclature optimized for glycowork
- added structure_to_basic, glycan_to_composition, and calculate_theoretical_mass utility functions to convert glycan sequences into topologies, compositions, and their theoretical mass, respectively
processing
- added choose_correct_isoform to distinguish glycan branch isomers
- deprecated process_glycans and motif_find
- refactored get_lib to use min_process_glycans
- condensed small_motif_find
- moved check_nomenclature into .motif.tokenization + integrated canonicalize_iupac into it
analysis
- updated characterize_monosaccharide to work with seaborn 0.11.2+
graph
- overhauled graph generation (glycan_to_graph, glycan_to_nxGraph, graph_to_string) to be more robust, modular, and simpler / easier to maintain
- combined fast_compare_glycans and compare_glycans into compare_glycans (which internally detects whether strings or precomputed graphs were provided)
- compare_glycans (and its dependencies) is also 2-3x faster now
- subgraph_isomorphism also should be 2-3x as fast as before
- updated graph_to_string to the z-nomenclature for linkage uncertainty
- fixed a bug in the counting mode of subgraph_isomorphism, in which the graph was modified in-place if precomputed graphs were provided and the function was called multiple times
- glycan_to_nxGraph received a new optional argument to enable generating graphs of glycans ending in a linkage but note that this output will not work for all downstream functions
- correspondingly subgraph_isomorphism can now use motifs ending in a linkage as input
- wildcard matching for compare_glycans etc now uses the string labels instead of the regular lib index labels to define the wildcards
query
- dramatically sped up get_insight by first checking for string identity before doing graph isomorphisms
annotate
- fix scipy import for compatibility with scipy 1.8.0
- improved get_k_saccharides to be (i) compatible with the new graph generation approach and (ii) be a lot more robust and exhaustive
ml
- modified GPU utilization to allow CPU usage of all functions (in theory)
models
- the trained model file for LectinOracle_flex is now contained within the package instead of being loaded externally
- deprecated functions for loading external LectinOracle_flex model
processing
- refactored dataset_to_graphs to directly import from NetworkX graphs
train_test_split
- renamed taxonomic_multilabel to prepare_multilabel, as it now also works for preparing training datasets for tissue and disease associations
model_training
- SAM will now only be loaded by training_setup in case of multiclass or multilabel classification (for performance reasons)
network
- functions working with biosynthetic networks can now use dictionaries of pre-computed networks as inputs; with the default option of stored pre-computed milk glycan biosynthetic networks stored within glycowork
biosynthesis
- added trace_diamonds to automatically extract diamond-shaped motifs from networks and leverage evolutionary information to return likelihoods for real paths
- replaced infuse_network with highlight_network, which allows you to highlight motifs, species-specific glycans, abundances, and degree of conservation in a network
- added prune_network to cut away virtual paths that are unlikely to impossible (depending on threshold)
- added evoprune_network as a wrapper for trace_diamonds, highlight_network, prune_network
- fixed an issue in choose_path returning an error if a path doesn’t occur in any other species; now it returns an empty dictionary
- fixed an issue in propagate_virtuals that prevented proper deorphanization for O-glycans
- fixed a suffix issue in PTM detection for non-milk networks
- made get_virtual_nodes and construct_network more robust toward unusual branch ordering
- improved construct_network to prune virtual leaf nodes with degree > 1
- functions requiring a filepath now require a species : network dictionary as function input
evolution
- added check_conservation to assess the evolutionary conservation of a glycans and glycan motifs via biosynthetic networks
- added get_communities to use Louvain community detection algorithm, e.g., in biosynthetic networks
- refactored distance matrix calculation as separate function, calculate_distance_matrix
alignment
- retired alignment until significant improvements can be made