Skip to content

v0.3.0

Choose a tag to compare

@Bribak Bribak released this 10 Dec 07:43
· 1125 commits to master since this release

Change Log

For Version 0.3.0

ml
models

  • added LectinOracle as option for prep_model & modified prep_model to allow for loading trained models
    model_training
  • train_ml_model now allows for additional (optional) input features
  • changed default optimizer from Adam to AdamW
  • changed default learning rate scheduler from cosine-decay to ReduceLROnPlateau
    processing
  • split_data_to_train now allows for additional (optional) input features
  • label_type is now also an optional argument for split_data_to_train and all lower-level functions
    model_training
  • modified train_model to allow for LectinOracle training
    representation/inference
  • renamed “representation” module into “inference”
  • added get_lectin_preds to use LectinOracle for inferring binding specificity of lectins
  • added get_esm1b_representation to retrieve ESM1b representations for new lectins, to use them as input for LectinOracle

motif
query

  • added tissue expression and disease association to get_insight
  • glytoucan_to_glycan now more robust in dealing with missing GlyTouCan IDs
    tokenization
  • added condense_composition_matching to find the minimum number of glycans to characterize matching compositions
  • added compositions_to_structures wrapper function that will take a list of compositions, find possible matches, condense them into the minimum number of structures, and match them with values, such as provided relative intensities
  • added structures_to_motifs function to convert datasets of relative intensities of glycan structures to relative intensities of the corresponding glycan motifs
  • changed default mode of match_composition_relaxed to “exact”
  • modified match_composition_relaxed to allow for filtering possible matches based on reducing end monosaccharide (e.g., O-linked glycans)
  • fixed issue in match_composition_relaxed that prevented the addition of additional monosaccharide types to the composition
  • moved motif_matrix and dependencies over to motif.annotate

glycan_data

  • replaced glyco_targets_species_seq_all_V4 (~23,000 species-specific glycans) and v4_sugarbase (~47,000 unique glycans) with glyco_targets_species_seq_all_V5 (~31,500 species-specific glycans) and v5_sugarbase (~50,500 unique glycans)
  • added directed disease associations (currently 533 associations) and tissue expression (currently 2,815 associations) for glycans in v5_sugarbase
  • changed nomenclature of glycolipids (mostly receive an “1Cer” at their reducing end, for instance “Glc1Cer”) and free oligosaccharides (receive an “-ol” at their reducing end, for instance “Glc-ol”)
  • made Lewis motifs in motif_list more general
  • correspondingly updated glycan ML models, representations, and substitution matrix