Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Dataset Objects

Natalie Catlett edited this page Sep 18, 2015 · 15 revisions

This page contains information about the dataset objects defined in datasets.py. Some object classes inherit from multiple parent classes (e.g., HGNCData is a NamespaceDataSet, OrthologyData, and HistoryDataSet) .

classes:

  • DataSet
    • contains any data relevant to the BEL Namespace and Annotation resource generator pipeline
    • attributes:
      • prefix - prefix for data set
      • dictionary - dictionary containing data, built in parsed module
    • methods:
      • get_values - returns all non-obsolete values used as keys in the data dictionary
      • _str_ - returns identifying string for data object
  • NamespaceDataSet
    • contains data for BEL Namespaces and Annotations including ids, terms, synonyms, and equivalences
    • parent class - DataSet
    • attributes (in addition to parent class attributes):
      • name - name for namespace (can be same as prefix)
      • ids - flag to produce .belns file with ids, default=False
      • labels - flag to produce .belns file with preferred labels, default=True
      • domain - list containing domain(s) of term in the namespace (e.g., "chemical", "gene and gene product"), default = ['other']
      • scheme_type - list containing 'ns' (namespace) and/or 'anno' (annotation) to indicate if data is used to build namespace and/or annotation files, default=['ns']
    • methods (in addition to parent class methods):
      • get_label(term_id) - returns the value to be used as the preferred label for an associated term_id
      • get_name(term_id) - returns the term name to use as a title (or None)
      • get_xrefs(term_id) - returns equivalences for the term_id to other namespaces, in the case where the data set object is the source information. Returned as set of terms expressed as PREFIX:ID
      • get_species(term_id) - returns species associated with a term_id as NCBI tax ID, or None as applicable
      • get_encoding(term_id) - returns the encoding (allowed bel functions) for the term_id
      • get_concept_type(term_id) - if from an annotation concept schemes, returns set of AnnotationConcept types associated with then term_id
      • get_alt_symbols(term_id) - returns set of synonym symbols associated with the term_id
      • get_alt_names(term_id) - returns set of name synonyms associated with the term_id
      • get_alt_ids(term_id) - returns set of alternative ids associated with the term_id
      • write_ns_values(dir) - writes .belns file(s) to specified dir (uses write_data)
      • write_data(data, dir, name) - writes .belns file
  • HistoryDataSet
    • contains information about obsolete (withdrawn and/or replaced) ids
    • used for change_log and rdf output, not .belns and .beleq file generation
    • parent class - DataSet
    • methods:
      • get_id_update(term_id) - returns updated value for a given term, "withdrawn", or None (if no replacement information)
      • get_obsolete_ids - returns dictionary with all obsolete ids and current value
  • OrthologyDataSet
    • contains orthology relationship data
    • parent class - DataSet
    • methods
      • get_orthologs(term_id) - returns set of orthologs associated with term_id
Clone this wiki locally