Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drug names used as IDs in website causing ambigous links #600

Open
yarong-lifemap opened this issue Feb 13, 2022 · 1 comment
Open

Drug names used as IDs in website causing ambigous links #600

yarong-lifemap opened this issue Feb 13, 2022 · 1 comment

Comments

@yarong-lifemap
Copy link

There are 3658 name-based duplicate entries in the DRUGS (causing the number of distinct drugs by name to be 10,776 instead of 14,449). Examples include:
[DAROTROPIUM BROMIDE, 3 times]
[TISAGENLECLEUCEL, 3 times]
[CYCLOPHOSPHAMIDE, 3 times]
[RAUWOLFIA SERPENTINA, 4 times]
[TRASTUZUMAB DERUXTECAN, 3 times]
[INDOCYANINE GREEN, 3 times]

As a result, it's not possible to navigate to a correct drug entry
For example: Searching for "U-50488 METHANE SULFONATE" shows two results, both linking to the same page (which only reflects one of the entries, I can only assume).

@susannasiebert
Copy link
Contributor

I was able to confirm some of these examples. There are definitely name-duplicates in DGIdb, but they have different concept_ids. I wasn't able to confirm the counts posted by @yarong-lifemap though.

RAUWOLFIA SERPENTINA, same name in CHEMBL but two different CHEMBL IDs

DataModel::Drug.where("name ILIKE 'RAUWOLFIA SERPENTINA'")
  DataModel::Drug Load (17.0ms)  SELECT "drugs".* FROM "drugs" WHERE (name ILIKE 'RAUWOLFIA SERPENTINA') LIMIT $1  [["LIMIT", 11]]
=> #<ActiveRecord::Relation [
#<DataModel::Drug id: "778b51ed-e52c-4866-9b6f-ef89f752cb6f", name: "RAUWOLFIA SERPENTINA", approved: true, immunotherapy: false, anti_neoplastic: false, concept_id: "chembl:CHEMBL3559672">, 
#<DataModel::Drug id: "34d5840e-65ab-47d1-8cf9-685cc757ef3f", name: "RAUWOLFIA SERPENTINA", approved: false, immunotherapy: false, anti_neoplastic: false, concept_id: "chembl:CHEMBL123325">
]>

TISAGENLECLEUCEL, same name but some claims matched to CHEMBL and the others to wikidata

DataModel::Drug.where("name ILIKE 'TISAGENLECLEUCEL'")
  DataModel::Drug Load (15.1ms)  SELECT "drugs".* FROM "drugs" WHERE (name ILIKE 'TISAGENLECLEUCEL') LIMIT $1  [["LIMIT", 11]]
=> #<ActiveRecord::Relation [
#<DataModel::Drug id: "41f836e5-6ffe-42a3-848f-41c900ef7261", name: "TISAGENLECLEUCEL", approved: false, immunotherapy: false, anti_neoplastic: false, concept_id: "chembl:CHEMBL3301574">, #<
DataModel::Drug id: "652415b4-2932-4b83-8fd2-8788a47767a4", name: "TISAGENLECLEUCEL", approved: false, immunotherapy: false, anti_neoplastic: false, concept_id: "wikidata:Q30314624">]>

Here are the corresponding claims:

concept_id: chembl:CHEMBL3559672
#<DataModel::DrugClaim id: "87b98691-d0ef-4cfe-8540-10440251f91c", name: "chembl:CHEMBL3559672", nomenclature: "ChEMBL ID", source_id: "3b7b0229-f0fc-4e3d-9c0c-43feb363f837", primary_name: "RAUWOLFIA SERPENTINA", drug_id: "778b51ed-e52c-4866-9b6f-ef89f752cb6f">, 
#<DataModel::DrugClaim id: "023fc148-e765-49d8-b590-7228f35dfc19", name: "CHEMBL3559672", nomenclature: "ChEMBL ID", source_id: "719ae115-b913-44d0-9af8-a2a97b470e95", primary_name: "RAUWOLFIA SERPENTINA", drug_id: "778b51ed-e52c-4866-9b6f-ef89f752cb6f">

concept_id: chembl:CHEMBL123325
#<DataModel::DrugClaim id: "4bbd08d0-9f05-4c1f-a50a-cde83733fcbc", name: "chembl:CHEMBL123325", nomenclature: "ChEMBL ID", source_id: "3b7b0229-f0fc-4e3d-9c0c-43feb363f837", primary_name: "RAUWOLFIA SERPENTINA", drug_id: "34d5840e-65ab-47d1-8cf9-685cc757ef3f">, 
#<DataModel::DrugClaim id: "6582d2ea-1951-4dc7-9438-bab84a60822e", name: "RAUWOLFIA SERPENTINA", nomenclature: "DTC Drug Name", source_id: "2c35aa92-b32f-4aa4-ae4c-434a6db98335", primary_name: "RAUWOLFIA SERPENTINA", drug_id: "34d5840e-65ab-47d1-8cf9-685cc757ef3f">
concept_id: chembl:CHEMBL3301574
#<DataModel::DrugClaim id: "735ebb65-2843-40ba-9efb-a2c18ea5b38b", name: "chembl:CHEMBL3301574", nomenclature: "ChEMBL ID", source_id: "3b7b0229-f0fc-4e3d-9c0c-43feb363f837", primary_name: "TISAGENLECLEUCEL", drug_id: "41f836e5-6ffe-42a3-848f-41c900ef7261">,
 #<DataModel::DrugClaim id: "18ca715a-2fb8-4c4b-a1f5-193cc2753538", name: "CART-19", nomenclature: "TTD Drug Name", source_id: "e553262a-d6f1-453e-aeb4-f123e3021edd", primary_name: "CART-19", drug_id: "41f836e5-6ffe-42a3-848f-41c900ef7261">, 
#<DataModel::DrugClaim id: "0bbaca37-0b40-463a-ac1e-6a33d82ca688", name: "Tisagenlecleucel", nomenclature: "TTD Drug Name", source_id: "e553262a-d6f1-453e-aeb4-f123e3021edd", primary_name: "Tisagenlecleucel", drug_id: "41f836e5-6ffe-42a3-848f-41c900ef7261"> 

concept_id: wikidata:Q30314624
#<DataModel::DrugClaim id: "26a93304-6347-43ff-95ae-8566888c77c3", name: "wikidata:Q30314624", nomenclature: "Wikidata ID", source_id: "46b7e803-120c-49eb-8973-cd646624be74", primary_name: "Tisagenlecleucel", drug_id: "652415b4-2932-4b83-8fd2-8788a47767a4">, 
#<DataModel::DrugClaim id: "8dde431a-2b18-4b07-a1d2-056192c2b472", name: "CTL019", nomenclature: "TTD Drug Name", source_id: "e553262a-d6f1-453e-aeb4-f123e3021edd", primary_name: "CTL019", drug_id: "652415b4-2932-4b83-8fd2-8788a47767a4">

We should check that with the upcoming V5 therapy normalizer improvements, these claims are all grouped into the same drug concept (if appropriate). We should also revisit linking out to drugs by name. Maybe linking out by concept_id or internal DGIdb drug_id would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants