Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Indications' appearing as adverse drug reactions in AEOLUS data #6

Open
mbrush opened this issue Apr 6, 2017 · 13 comments
Open

'Indications' appearing as adverse drug reactions in AEOLUS data #6

mbrush opened this issue Apr 6, 2017 · 13 comments

Comments

@mbrush
Copy link

mbrush commented Apr 6, 2017

Wanted to follow up with a question we discussed offline about AEOLUS data, which describe adverse drug reactions (ADRs), using meddra terms to codify the adverse outcomes. Specifically, I noted the inclusion of 'outcomes' in the mydrug AEOLUS data that seem to be primary indications for a drug (i.e. what it is used to treat), rather than adverse drug reactions.

For example, the mydrug results for imatinib, which is used to treat various leukemias, and outcomes include things like "Leukaemia", "Chronic lymphocytic leukaemia", "Acute leukaemia".

In your parsing of AEOLUS data, was there any metadata that would tell you what medrra terms represent indications for a particular drug so these could be filtered from the results, or at least tagged as being indications rather than adverse reactions to the drug? This would be a significant improvement to the dataset you generate.

@stuppie
Copy link
Collaborator

stuppie commented Apr 6, 2017

Hi @mbrush
Yes this data is in the original FDA AERS dump, but as far as I can tell is not in the AEOLUS data. This data is in the "INDI16Q4" files. But it of course doesn't contain IDs.
Its probably possible to match up the data between AERS and AEOLUS to structure them or to look up the medra IDs from the indication name.

For fun, here are all the indications for imatinib (From the latest quarterly file 2016q4):
Gastric cancer, Gastrointestinal stromal tumour, Leukaemia, Carcinoid syndrome, Gastrointestinal carcinoma, Type 2 diabetes mellitus, Small intestine carcinoma, Adenoid cystic carcinoma, Chronic myeloid leukaemia, Chemotherapy, Neoplasm malignant, Eosinophilia, Benign soft tissue neoplasm, Myelofibrosis, Soft tissue sarcoma, Immunomodulatory therapy, Chronic myelomonocytic leukaemia, Pulmonary fibrosis, Myeloid leukaemia, Acute lymphocytic leukaemia, Product used for unknown indication, Laryngeal squamous cell carcinoma, Acute myeloid leukaemia, Graft versus host disease, Gene mutation, Chordoma

I'll look into this. Is the plan still to ingest the data into your Scigraph instance through mygene? Or through dipper? Which would be easier?

Additionally, there are still "indications" in the results that are neither adverse drug reactions nor original indications... Things like:
Foetal exposure during pregnancy
Streptococcus test
Gun shot wound
Foetal exposure timing unspecified
which don't really sound like drug reactions to me, but may be hard to clean up...

@stuppie
Copy link
Collaborator

stuppie commented Apr 6, 2017

Ok, I think I figured out how to extract this from aeolus. For my future self's reference:

SQL query to extract indications (and count) for imatinib

SELECT indi_pt, count(indi_pt) FROM aeolus.standard_case_indication
 LEFT JOIN aeolus.standard_drug_outcome_drilldown ON aeolus.standard_drug_outcome_drilldown.primaryid=aeolus.standard_case_indication.primaryid
 LEFT JOIN aeolus.concept ON aeolus.standard_drug_outcome_drilldown.drug_concept_id=aeolus.concept.concept_id
WHERE drug_concept_id = 1304107
group by indi_pt;

You get 549 unique indications (as opposed to the 5213 outcomes).
The top 20 or so look reasonable, but then you get things like, Nausea, Anxiety, Depression. This is weird... (unless I did something wrong...) ?

Here are the top 100, along with the count:
indi_pt, count(indi_pt)
Chronic myeloid leukaemia, 16831
Product used for unknown indication, 7085
Gastrointestinal stromal tumour, 5621
Acute lymphocytic leukaemia, 3281
Hypertension, 1185
Pulmonary arterial hypertension, 762
Prophylaxis, 599
Blast crisis in myelogenous leukaemia, 512
Pain, 508
Gastric cancer, 460
Chromosome analysis abnormal, 396
Systemic mastocytosis, 382
Soft tissue cancer, 316
Osteoporosis, 313
Pulmonary hypertension, 265
Diabetes mellitus, 259
Metastases to bone, 247
Neoplasm malignant, 245
Mastocytosis, 240
Off label use, 229
Acute myeloid leukaemia, 225
Leukaemia, 221
Prophylaxis against graft versus host disease, 207
Gastrointestinal neoplasm, 203
Gastrooesophageal reflux disease, 191
Nausea, 187
Anxiety, 186
Depression, 183
Hypothyroidism, 178
Plasma cell myeloma, 176
Constipation, 172
Fistulogram, 172
Atrial fibrillation, 168
Insomnia, 158
Oedema, 151
Colorectal cancer, 141
Chronic graft versus host disease, 134
Hypercholesterolaemia, 134
Diarrhoea, 128
Blood pressure, 125
Type 2 diabetes mellitus, 124
Coronary artery disease, 122
Immunosuppression, 118
MULTIPLE MYELOMA, 114
Anaemia, 113
Antifungal prophylaxis, 111
Soft tissue neoplasm, 110
Breast cancer metastatic, 108
Gastritis, 106
Graft versus host disease, 104
Colitis, 102
Myeloid leukaemia, 99
Vomiting, 98
Lymphocytic leukaemia, 97
Gastrointestinal carcinoma, 96
Metastases to spine, 96
Angina pectoris, 95
Hyperlipidaemia, 93
Muscle spasms, 92
Chronic myeloid leukaemia (in remission), 91
Dyspepsia, 90
Scleroderma, 88
Angiogram, 86
Venogram, 86
Chronic lymphocytic leukaemia, 84
Eosinophilia, 84
Prophylaxis against gastrointestinal ulcer, 84
Gastritis prophylaxis, 83
Gout, 81
Hodgkin's disease, 80
Hyperuricaemia, 80
Hypereosinophilic syndrome, 79
Chronic myeloid leukaemia transformation, 78
Prostate cancer, 77
Infection prophylaxis, 76
Prostatic specific antigen decreased, 74
Premedication, 73
Abdominal pain, 71
Benign prostatic hyperplasia, 71
Pyrexia, 71
Vitamin D deficiency, 71
Ewing's sarcoma, 70
Thyroid disorder, 69
Bipolar disorder, 68
Pruritus, 65
Pneumonitis, 64
Pneumonia, 63
Malignant melanoma, 62
Asthma, 61
Gastric ulcer, 61
Renal transplant, 61
Crohn's disease, 60
Breakthrough pain, 59
Dyspnoea, 59
Graft versus host disease in skin, 59
Neoplasm, 58
Polymyalgia rheumatica, 55
Nuclear magnetic resonance imaging, 52
HIV infection, 51
Hepatic function abnormal, 50

@mbrush
Copy link
Author

mbrush commented Apr 7, 2017

Perhaps you are getting all symptoms/indications for which these patients were treated, not just indications specific for imatinib?

It looks like all of the items on your list of "indications" above are appearing as 'outcomes' in the mydrug AEOLUS data as well (at least based on my spot checking several of them). So clearly the list above is conflating things besides imatinib's primary indications - either other indications found in imatiinib-treated patients, and/or ADRs/side effects they experienced following treatment.

So, at this point, do you think we should make a ticket in the AEOLUS repo to ask if they have any thoughts on this issue? Or do you want to continue to play with the data to see if you can find a solution?

At the end of the day, we just need to be sure that the 'outcomes' returned by the API for a drug like imatinib holds only true ADRs/side effects of the drug, and not things like primary indications for imatinib, or other/unrelated symptoms of patients who were given imatinib. Alternatively, we could return indications as well if these can be determined from the AEOLUS data, but they need to be annotated as such in the mydrug data (as is done for the mydrug SIDER data - which includes both but indicates which are 'indications' and which are 'side effects').

@stuppie
Copy link
Collaborator

stuppie commented Apr 7, 2017

Yes, I'll make a ticket

@mbrush
Copy link
Author

mbrush commented May 2, 2017

Hi Greg. Was the advice the AEOLUS folks provided w.r.t. how to filter indications from the outcomes results helpful? Are there other barriers to updating the mydrug data to filter out some/all of the indications?

@stuppie
Copy link
Collaborator

stuppie commented May 2, 2017

Hey @mbrush. What do you suggest as the best format for the data. For reference, an example of the current is below.

{
  "_id": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
  "aeolus": {
    "pt": "IMATINIB",
    "inchikey": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
    "outcomes": [
      {
        "case_count": 7315,
        "id": "35809059",
        "ror_95_ci": [          14.59499,          15.318320000000002        ],
        "ror": 14.952279999999998,
        "name": "Death",
        "code": "10011906",
        "prr": 13.74818,
        "vocab": "MedDRA",
        "prr_95_ci": [          13.447189999999999,          14.05591        ]
      },
     ...
    ],
    "drug_vocab": "RxNorm",
    "unii": "BKJ8M8G5HI",
    "drug_name": "imatinib",
    "no_of_outcomes": 5213,
    "drug_id": "1304107",
    "drug_code": "282388",
    "rxcui": "282388"
}

The indications I get (for Imatinib, for example) look like this:

{'indication_concept_code': '10009013',
  'indication_concept_id': '35104378',
  'indication_count': 4426,
  'indication_name': 'Chronic myeloid leukaemia',
  'indication_vocabulary': 'MedDRA'}

Potential issues come in when have things like this:
For alprazolam:
An outcome:

{
        "case_count": 3436,
        "id": "36718555",
        "ror_95_ci": [          1.9292599999999998,           2.06419        ],
        "ror": 1.9955900000000002,
        "name": "Insomnia",
        "code": "10022437",
        "prr": 1.98669,
        "vocab": "MedDRA",
        "prr_95_ci": [ 1.92124, 2.05437 ]
}

An indication:

{'indication_concept_code': '10022437',
  'indication_concept_id': '36718555',
  'indication_count': 54,
  'indication_name': 'Insomnia',
  'indication_vocabulary': 'MedDRA'},

Should Insomnia not be marked as an outcome because its also an indication?

Also, looking at comparing the counts between indications and outcomes:
For Imatinib: There are 171 indications, from 7045 cases, totaling 7082 counts. And 5213 outcomes, from 9678 cases, totaling 27595 counts. The 7045 cases are a subset of the 9678 cases. So not every case has an indication reported.

I'm thinking a given drug's doc should just have all outcomes and all indications as separate fields and the user of the data is responsible for reconciling them.

@mbrush
Copy link
Author

mbrush commented May 12, 2017

Hi @stuppie. I think for Biothings your suggestion is best:

a given drug's doc should just have all outcomes and all indications as separate fields and the user of the data is responsible for reconciling them.

As for why the AEOLUS data is this way - would you way that it is because 'outcomes' are considered more broadly than just adverse events/reactions, to include any symptoms experienced by the patient after treatment? So in cases when the indication that the drug was given for did not resolve, then this indication is reported as an outcome as well. This isn't entirely consistent with the language used in the AEOLUS paper - which describes this as an 'adverse event' dataset - but it's the only explanation I can think of for why the data is this way. Unless you have other ideas here?

That said, when we pull this data into Monarch, we will want to use it as a source of true 'adverse event' data. So we would likely process the data to remove indications from the list of outcomes for each drug, so we can treat these as true adverse events in our data set. A complication here is that, based on my understanding of how ror and prr are calculated, these scores may be different if we remove the indications. So re-calculating these may be part of our transformation to generate a derived Monarch data set.

A final thought on how you bring in the data: would it be in scope for BioThings to add a flag to outcomes that are also indications to indicate this fact for consumers of the data? e.g. an 'is_indicator' boolean attribute? Thinking this might help consumers with any confusion, and to more easily process the data to remove indications from outcomes if they so desire? Just a thought.

@stuppie
Copy link
Collaborator

stuppie commented May 12, 2017

I do not know. I don't see anything about it in the paper. Maybe this is a good question to ask @ntatonetti

Also, I didn't know this before, by the Tatonetti lab has an API serving this data:
http://nsideseb-env.us-east-1.elasticbeanstalk.com
http://nsideseb-env.us-east-1.elasticbeanstalk.com/api/v1/query?service=aeolus&meta=drugReactionCounts&q=0

@mellybelly
Copy link

mellybelly commented May 26, 2017

so just perusing this ticket, may not have grokked it all
Ideally i would like to have indications related to the drugs separate from the AEs, but I also don't want to lose the connection between them, e.g. we want to know what the specific indication was for a drug when an AE happens.

Eg there are actually multiple relations here
drug <-> indication
drug <-> AE (NOT indication)
drug <-> indication in which <-> AE

@mbrush
Copy link
Author

mbrush commented May 26, 2017

Good point @mellybelly - this data set would have much more predictive and analytical value if it we could stratify drug-outcome associations according to the indication for which the drug was initially given. We would be able to answer questions like "how often is a particular adverse event observed for drug X when given for indication Y, vs indication Z"?

I'd guess it would be possible to generate this more nuanced dataset, given the granularity at which the FAERS data was collected (at a per patient level). But this kind of wrangling and analysis could be a fair bit of work, and may be something that Nick's team who generated the AEOLUS data set originally would be best suited to perform.

My take is that in the short term, it should be straightforward to at least do as @stuppes proposes and
have all outcomes and all indications as separate fields. This would let us retrieve (or calculate) all simple drug-indication and drug-adverse event pairs. For the third relationship - which includes the indication as a qualifying context in which the AE occurred - we should determine if this is feasible, who would be best to do it, and how much effort would it take - then weigh against the value of this enhancement for our work. Thoughts @stuppes, others?

(Disclaimer that I haven’t yet explored the Tatonetti lab APIs - so not sure if it would make any of these tasks easier.)

@stuppie
Copy link
Collaborator

stuppie commented May 31, 2017

My take is that in the short term, it should be straightforward to at least do as @stuppes proposes and have all outcomes and all indications as separate fields.

I've added the indications to mydrug as separate fields, it will be live in a few days.

drug <-> indication in which <-> AE

Yes, this is doable to query. I don't think we'd want to report all of this through mydrug, mostly because of the "all by all"-ness/large number of possible combinations.

Here are the top ten most common drug <-> indication <-> AE relations:
count, concept_code, concept_name, indication_code, indication_name, outcome_code, outcome_name
21447, 84108, rosiglitazone, 10012601, Diabetes mellitus, 10028596, Myocardial infarction
9590, 84108, rosiglitazone, 10012601, Diabetes mellitus, 10008190, Cerebrovascular accident
8471, 214555, Etanercept, 10039073, Rheumatoid arthritis, 10022086, Injection site pain
7433, 214555, Etanercept, 10039073, Rheumatoid arthritis, 10003239, Arthralgia
7397, 84108, rosiglitazone, 10012601, Diabetes mellitus, 10007559, Cardiac failure congestive
6764, 214555, Etanercept, 10039073, Rheumatoid arthritis, 10022061, Injection site erythema
6107, 354770, natalizumab, 10028245, Multiple sclerosis, 10016256, Fatigue
5934, 1373478, dimethyl fumarate, 10028245, Multiple sclerosis, 10016825, Flushing
5596, 6373, Levonorgestrel, 10010808, Contraception, 10012578, Device expulsion
5482, 6373, Levonorgestrel, 10036251, Post coital contraception, 10027339, Menstruation irregular

And for imatinib specifically:
count, concept_code, concept_name, indication_code, indication_name, outcome_code, outcome_name
1902, 282388, imatinib, 10009013, Chronic myeloid leukaemia, 10011906, Death
597, 282388, imatinib, 10051066, Gastrointestinal stromal tumour, 10011906, Death
308, 282388, imatinib, 10009013, Chronic myeloid leukaemia, 10009013, Chronic myeloid leukaemia
291, 282388, imatinib, 10051066, Gastrointestinal stromal tumour, 10051066, Gastrointestinal stromal tumour
230, 282388, imatinib, 10051066, Gastrointestinal stromal tumour, 10051398, Malignant neoplasm progression

And the query, for my future self
SELECT COUNT(*) AS count, concept.concept_code, concept.concept_name, indication_concept.concept_code AS 'indication_code', indication_concept.concept_name AS 'indication_name', outcome_concept.concept_code AS 'outcome_code', outcome_concept.concept_name AS 'outcome_name' FROM standard_case_indication LEFT JOIN standard_case_drug ON standard_case_drug.primaryid = standard_case_indication.primaryid AND standard_case_indication.indi_drug_seq = standard_case_drug.drug_seq LEFT JOIN concept ON standard_case_drug.standard_concept_id = concept.concept_id LEFT JOIN concept AS indication_concept ON standard_case_indication.indication_concept_id = indication_concept.concept_id LEFT JOIN standard_case_outcome ON standard_case_outcome.primaryid = standard_case_indication.primaryid LEFT JOIN concept AS outcome_concept ON standard_case_outcome.outcome_concept_id = outcome_concept.concept_id WHERE standard_case_drug.role_cod = ('PS') AND concept.concept_id = 1304107 GROUP BY concept_code , concept_name , indication_code , indication_name , outcome_code , outcome_name ORDER BY count DESC

To extract actual value out of these, it would probably be necessary to do some indication and outcome ontology normalization/mapping to higher-level terms (What is the proper term for this? For example, treating "Cerebrovascular accident" and "Cerebrovascular accident due to right carotid artery occlusion" the same), and then recalculate the summary statistics.

@stuppie
Copy link
Collaborator

stuppie commented May 31, 2017

21447, 84108, rosiglitazone, 10012601, Diabetes mellitus, 10028596, Myocardial infarction

"following a meta-analysis published in the New England Journal of Medicine in 2007 that linked the drug's use to an increased risk of heart attack,[1] sales plummeted to just $9.5-million in 2012"
https://en.wikipedia.org/wiki/Rosiglitazone

@mbrush
Copy link
Author

mbrush commented May 31, 2017

Thanks for the update @stuppes, and getting the indications into mydrug (is it called mychem now?). And for the nice analysis of how we might proceed with the more complex drug <-> indication in which <-> AE associations. Another thing to consider if we want to make this a true drug-adverse event resource, besides removing primary indications, is to consider how to handle outcomes we suspect to be complications / symptoms caused by the primary disease rather than the drug. e.g. 'Death' is likely caused by CML in your examples above, not imatinib.

Anyway, I am happy with the dataset with added indications for now, and holding off on the more nuanced dataset unless you are keen to tackle it. Sound good? If any use cases or demonstrators require this additional detail we can pick up the thread here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@mellybelly @mbrush @stuppie and others