Skip to content

[DevDoc] Notes on the API implementation

Marco Brandizi edited this page Sep 15, 2023 · 6 revisions

Some rough, outdated, to-be-reviewed notes of mine (MB) regarding the way the KnetMiner API is implemented. You can get a decent dev-level intro of our code from here, especially if you open the mentioned components.

KnetminerServer

/{ds}/{mode}, handle(), GET case

  • Request hub, gets DS, "mode" (ie, name of API call) and general params
  • And then dispatches to handleRaw()

/{ds}/{mode}, handle(), POST case

  • TODO

handleRaw ()

  • Invokes DS.$method, having got the method from mode

/synonyms

  • Searches synonyms, using UIService.renderSynonymTable()
    • Uses searchService.searchTopConceptsByName() to get relevant concepts
      • Uses luceneMgr.searchTopConceptsByIdxField()
      • Prepares a table where, for each keyword, there is an entry conceptName, conceptType, conceptId

/countHits

  • @param keyword
  • DS.countHits()
  • new SemanticMotifSearchMgr( keyword ), assuming keyword && ! geneList
    • luceneConcepts:Map<Concept -> Score>: SearchService.searchGeneRelatedConcepts () * Split keyword into list, get 'not query' * notList = this.searchTopConceptsByName() if necessary * Populates hit2score (Concept->Score) with a series of Lucene searches, involving keyword (search string) and notList
      • countLinkGenes()
        • Uses luceneConcepts and SM.concepts2Genes to count SM-linked concepts (luceneDocumentsLinked) and matched unique genes (numConnectedGenes)
  • Puts SMSearchMgr counts into the response

/countLoci()

  • DataService.getLociGeneCount() to count the loci in the request's QTL
  • Used in the genome regions input

/genome and _keyword()

  • @param keyword, list, listMode, qtl
  • DS.genome(), prepares GenomeResponse, calls DS._keyword()
    • Extracts the userGenes, using KGUtils.filterGenesByAccessionKeywords()
      • This tunrs the list into genes, using 1) searches over accessions and names and 2) filter on taxId

        • Probably not to be filtered with user taxId (check it's valid and configured)
      • Adds qtl to userGenes, using genome regions, via KGUtils.fetchQTLs ( ONDEXGraph graph, List<String> taxIds, List<String> qtlsStr )

        • QTL.fromStringList ( qtlsStr ) to build QTL region strucutures
          • Then double loop over all regions and all genes in the graph
      • smSearchMgr = new SemanticMotifSearchMgr ( searchString, genes )

        • Like said above, searches concepts based on keywords and scores them
      • candidateGenesMap = smSearchMgr.getSortedGeneCandidates() # Map<Concept->Score> This is based on SemanticMotifsSearchResult.getScoredGenes ( Lucene-scored concepts ), which works like:

        • From lucene-hit concepts, compute gene2HitConcepts, ie, a subfilter over gene->concepts map (coming from sem motifs)
        • use gene2HitConcepts to compute knet scores for each gene => scoredGeneCandidates: Map<Gene -> KnetScore>
        • return gene -> score result, ranked by score and with a filter over (unlikely) duplicated genes
      • Then, this is (possibly) filtered using user genes + QTL genes

      • Finally, we have genesMap and genes

      • Next is the chromosome view

        • what to do with multi-specie case?
      • Next is exportService.exportGeneTable()

      • Next is exportService.exportEvidenceTable()

/network

  • Does the same gene filtering as _keyword()
  • ondexServiceProvider.getSemanticMotifService ().findSemanticMotifs( keyword, seed (genes) )
    • Map<ONDEXConcept, Float> luceneResults = searchService.searchGeneRelatedConcepts ( keyword, seed, false )
    • Then, semanticMotifDataService.getGraphTraverser () with the seed genes => Map<ONDEXConcept, List<EvidencePathNode>> results
    • Splits the search string into actual keyowrds (SearchUtils.getSearchWords())
      • get a colour map for them (UIUtils.createHilightColorMap())
      • Uses the found paths to create the network view graph
      • highlights paths and node labels based on the search keywords

/dataset-info

  • General info on the current dataset
  • Served by DatasetInfo DatasetInfoService.datasetInfo()
  • Mostly based on the dataset section in the config YAML

/dataset-info/network-stats

  • Gets per-type topological information. Used by the 'Release notes' button
  • Served by DatasetInfoService.networkStats()
  • Based on the JSON file produced by KnetMinerInitializer.exportGraphStats()
  • which mostly get data from the Semantic Motif summary data

/dataset-info/knetspace-url

  • Served by DatasetInfoService.knetSpaceURL()
  • Using a dedicated config variable

[REMOVED] /evidencePath

  • @param keyword, used to extract an evidenceOndexId
  • list: usual gene list (except QTL)
  • Similar to /network, see #631
  • No longer used, removed

[REMOVED] /latestNetworkStats

  • Replaced by /dataset-info/network-stats, see #657
  • Fetches stats on the whole dataset,
    • which were computed by ExportService.exportGraphStats()
    • which was invoked by OSP.initData()

[REMOVED] /geneCount

  • Searches genes bases on user input (uses KGUtils.filterGenesByAccessionKeywords() as above)
  • Adds genes in QTL regions, as above+
  • Finds sem motifs and builds the subgraph
  • exports the subgraph to JSON
  • puts counts into the response
  • WTH?!?!?!?
  • No longer used, removed

[REMOVED] /{ds}/genepage

  • Prepares data to perform a network view request
  • Then forwards to genepage.jsp (via MVC)
  • which will know how to invoke /network
  • We moved it to the client, where it belongs

[REMOVED] /{ds}/evidencepage

  • Works similarly to genepage above

[REMOVED] /ksHost

  • Replaced by /dataset-info/knetspace-url.
  • returns the KnetSpace host, set in the config.

[REMOVED] /dataSource

  • Replaced by /dataset-info.

  • Some general info. Very rubbish format, it puts JSON into a string, instead of the usual fields in the response class. The taxIds overwrite each other:

     summaryJSON.put("dbVersion", dataService.getDatasetVersion () );
     summaryJSON.put("sourceOrganization", dataService.getDatasetOrganization ());
     dataService.getTaxIds ().forEach( taxID -> {
     		summaryJSON.put("speciesTaxid", taxID);
     });
     summaryJSON.put("speciesName", dataService.getSpecies());
    
     // TODO: in future, this might come from OXL metadata (the graph descriptor)
     SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm");  
     var timestampStr = formatter.format ( oxlFile.lastModified () );
     summaryJSON.put("dbDateCreated", timestampStr);
    
     summaryJSON.put("provider", dataService.getDatasetProvider () );
     String jsonString = summaryJSON.toString();
     // Removing the pesky double quotes
     jsonString = jsonString.substring(1, jsonString.length() - 1);
     log.info("response.dataSource= " + jsonString);
     response.dataSource = jsonString;
  • It's used by save-knet.js, for exportAsJson(). This is very messy

  • It's also used in showNetworkStats.js::fetchStats(), but dbVersion only is fetched from the API out

Code details

SemanticMotifSearchMgr

  • Map<ONDEXConcept, Float> scoredConcepts: the keyword-related concepts, got from Lucene

    • Based on SearchService.searchGeneRelatedConcepts() (see below)
  • SemanticMotifsSearchResult searchResult

    • Uses SearchService.getScoredGenes ( scoredConcepts, this.taxId ) (see below)

countLinkedGenes()

  • Counts concepts in scoredConcepts, just using its size
  • Counts the genes linked to scoredConcepts
    • For each concept:
      • Get genes in concept2Genes.get ( concept )
      • Filter by taxId
      • Eventually, count

SearchService

searchGeneRelatedConcepts()

Case there is only a gene list:

(gene list is normalised)

for each gene in gene list: add genes2Concepts ( gene ) to the result, with score = 1

Case with keyword

  • get the notQuery expression from keywords
  • Search concepts via Lucene, using keywords

SemanticMotifsSearchResult getScoredGenes ( Map<ONDEXConcept, Float> scoredConcepts, taxId )

Map<Integer, Set<Integer>> gene2HitConcepts

  • For each concept in scoredConcepts:
    • add concept2Genes.get ( concept ) to result
      • possibly, filter by taxId
  • Then, group by gene

Map<ONDEXConcept, Double> scoredGeneCandidates

  • for each gene in gene2HitConcepts:

    • for concept in gene2HitConcepts.get ( gene )
      • luceneScore = scoredEvidenceConcepts.get ( concept )
        • igf = log ( genesCount / concepts2Gene.get ( concept ).size () )
      • invGraphDist = 1 / genes2PathLens.get ( gene, concept )
        • knetScore = the three above combined
      • Sum of knetScore for each concept is knetScore ( gene )
  • scoredGeneCandidates are sorted

  • The final SemanticMotifsSearchResult result contains:

    • geneId2RelatedConceptIds = gene2HitConcepts
      • gene2Score = sorted scoredGeneCandidates
  • genesCount is the total no of genes in the traverser seed, which belong to one of the configured specie In Neo4j: needs to be stored?

  • concepts2Gene.get ( concept ).size (), needs to be stored in Neo4j?

  • genes2PathLens.get ( gene, concept ) in Neo4j, is in the gene/concept link

exportGeneTable()

Params:

* List<ONDEXConcept> candidateGenes
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* String listMode
* SemanticMotifsSearchResult searchResult
  • Best name function in ondex
  • The gene's evidences are got from searchResult.getGeneId2RelatedConceptIds()
  • The gene score is got from searchResult.getGene2Score ()
  • The graph distances are got from genes2PathLengths (SemMotif summaries)
    • In Neo4j, gene/concept links

exportEvidenceTable()

Params:

* String keywords // To be removed, not used
* Map<ONDEXConcept, Float> foundConcepts
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* boolean doSortResult
  • score is summed up for each evidence concept using foundConcepts
  • For each concept, conceptGenes are fetched from concepts2Genes
    • startGenesSize = conceptGenes.size()
  • For each gene in conceptGenes:
    • matchedInGeneList++ if the gene is in userGenes
  • At the end:
    • notMatchedInGeneList = userGenes.size - matchedInGeneList
    • matchedNotInGeneList = startGenesSize - matchedInGeneList
    • notMatchedNotInGeneList = genes2Concepts.size - matchedNotInGeneList - matchedInGeneList - notMatchedInGeneList
    • These are used for Fisher test, from which pvalue is computed
  • At the end:
    • returns the found concept
    • returns the concept score as Lucene score
    • returns pvalue as computed above (ie, Fisher test)
    • returns startGenesSize (the no of SM genes associated to the concept)
    • returns the matching user genes
    • Sorts by pvalue, score and others
Clone this wiki locally