Skip to content

Duplicate CURIEs in NodeNorm Redis #373

@gaurav

Description

@gaurav

Something is weird with drug_chemical_conflate: when I run this query on NodeNorm Redis, I get 146 CURIEs:

This is because of duplicates, however:

  UMLS:C0106127 (drug_chemical_conflate=True): 49 total, 32 unique
  CHEMBL.COMPOUND:CHEMBL221542 (drug_chemical_conflate=True): 49 total, 32 unique
  PUBCHEM.COMPOUND:222284 (drug_chemical_conflate=True): 49 total, 32 unique
      1. CHEBI:27693 [x2 DUPLICATE]
      2. CHEBI:9170 [x2 DUPLICATE]
      3. CHEBI:26692 [x2 DUPLICATE]
      4. UNII:S347WMO6M4 [x2 DUPLICATE]
      5. PUBCHEM.COMPOUND:222284 [x2 DUPLICATE]
      6. CHEMBL.COMPOUND:CHEMBL2108117 [x2 DUPLICATE]
      7. CHEMBL.COMPOUND:CHEMBL221542 [x2 DUPLICATE]
      8. DRUGBANK:DB14038 [x2 DUPLICATE]
      9. CAS:76772-70-8 [x2 DUPLICATE]
     10. CAS:83-46-5 [x2 DUPLICATE]
     11. DrugCentral:2451 [x2 DUPLICATE]
     12. GTOPDB:13860 [x2 DUPLICATE]
     13. HMDB:HMDB0000852 [x2 DUPLICATE]
     14. KEGG.COMPOUND:C01753 [x2 DUPLICATE]
     15. INCHIKEY:KZJWDPNRJALLNS-VJSFXXLFSA-N [x2 DUPLICATE]
     16. UMLS:C0106127 [x2 DUPLICATE]
     17. RXCUI:47070 [x2 DUPLICATE]
     18. RXCUI:485876
     19. RXCUI:333051
     20. UMLS:C1129418
     21. RXCUI:830781
     22. UMLS:C2586976
     23. RXCUI:2104169
     24. UMLS:C4735522
     25. RXCUI:2104170
     26. UMLS:C4735523
     27. RXCUI:2104171
     28. UMLS:C4735524
     29. RXCUI:2104172
     30. UMLS:C4735525
     31. RXCUI:2104173
     32. UMLS:C4735526

NodeNorm ES correctly returns 32 non-duplicated CURIEs:

With drug_chemical_conflate turned off, both services return 17 identifiers each:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions