Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out an automated way to find first available date of a medication product #30

Open
jrlegrand opened this issue Apr 13, 2021 · 45 comments · Fixed by #56 · May be fixed by #60
Open

Figure out an automated way to find first available date of a medication product #30

jrlegrand opened this issue Apr 13, 2021 · 45 comments · Fixed by #56 · May be fixed by #60

Comments

@jrlegrand
Copy link
Member

Problem Statement

If we don't put any date limits on our modules, patients in 1901 are going to be getting modern medications.

Criteria for Success

Some automated way to query FDA to link first available date of drug application to a product RXCUI or NDC.

Additional Information

See example here: https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=overview.process&ApplNo=075743
Original approval date = 07/01/2002

A lot of modules list this in the remarks: https://github.com/synthetichealth/synthea/blob/master/src/main/resources/modules/medications/moderate_opioid_pain_reliever.json

    "Availability dates for different drugs came from the FDA: ",
    "http://www.accessdata.fda.gov/scripts/cder/daf/",
    "Dosage information came from FDA reference sheets available at: ",
    "https://www.drugs.com/dosage/",
    "The following medications are prescribed in this module. Most opioids were invented ",
    "during or prior to WWII, but did not become widely available until after the war:",
    "1. Meperidine (Demerol)                      available after 1945      ",
    "2. Hydrocodone/Acetaminophen (Vicondin)      available after 1943      ",
    "3. Oxycodone/Acetaminophen (Percocet)        available after 1920 (est)"
@kristentaytok
Copy link
Contributor

kristentaytok commented Apr 18, 2021

Could the "Active Start Date" in RxNav be an easier (and possibly more accurate) substitution here? I say possibly more accurate because some drugs take longer than others to go from FDA approval to market entry.

https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchTerm=airduo --> click on "Status", Active Date = 6/1/17

https://www.asthmaeducators.org/resources/Pictures/AirDuo%20Launch%20Press%20Release.pdf - market launch was ~April 2017?
https://www.accessdata.fda.gov/drugsatfda_docs/nda/2017/208799Orig1s000TOC.cfm - FDA approval was Jan 2017

@jrlegrand
Copy link
Member Author

See this Synthea GitHub issue conversation: synthetichealth/synthea#236

@jrlegrand
Copy link
Member Author

@kristentaytok - do you know where that "Active Start Date" metadata is stored within RxNorm?

@jrlegrand
Copy link
Member Author

This looked promising. See the data structure diagram under the accordion dropdown: https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files

@kristentaytok
Copy link
Contributor

Not sure if this helps; I haven't worked much with RxNorm/Nav/Class, but happy to look further into it.

https://rxnav.nlm.nih.gov/api-RxNorm.getRxcuiHistoryStatus.html

@jrlegrand
Copy link
Member Author

@jrlegrand
Copy link
Member Author

@kristentaytok - you beat me to it!

@jrlegrand
Copy link
Member Author

Tylenol says it was active start date of 2005 ... so I don't think this field means what we think it means...

@kristentaytok
Copy link
Contributor

Sad

Hm, FDA only has active NDCs from what I can tell--but it gets updated daily: https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory?elqTrackId=6ecc57b6b7ec4fa7bdcb28cd7a59da1b&elq=795c4bbff787408d9131556bb0434617&elqaid=4397&elqat=1&elqCampaignId=3471

the NDC database files that come from the above link are broken down to product (first 9 digits of NDC) and package (11 digit NDCs). the product file has more details like drug names. but "startmarketingdate" goes back to 1900

@jrlegrand
Copy link
Member Author

jrlegrand commented May 3, 2021

@hodgesr2 - can you start looking into what is possible for this?

Basically, I would want to know the very earliest date (or even just a year) a medication product was available in a computer understandable and easily updatable format. Ideally, this would be for more than just active NDCs.

@kristentaytok has started doing some analysis so please reach out to her if you have questions about what she's commented above.

@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

yeah I'll take a look at it and see what I can find.

@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

Also sent a email to the FDA to see if they have some info on any data that may help with is.

@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

I found this... but can't seem to find a CSV with all the information. They are all separate CSVs for each drug entity. But it has an action date which I think translates to approval of the medication on the market.

https://www.accessdata.fda.gov/scripts/cder/daf/

@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

Orange book has some decent data I think.

image

Doesn't have NDC, but has ingredient and approval date. However, the approval dates don't go back beyond 1982

LINK to data: https://www.fda.gov/drugs/drug-approvals-and-databases/approved-drug-products-therapeutic-equivalence-evaluations-orange-book

Orange Book Data files compressed

@jrlegrand jrlegrand moved this from To do to In progress in medication-diversification May 3, 2021
@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

Think I found what may work.

NDC database file. has first two sections of NDC and Start Market Date. Looks accurate from a spot check I did on some drugs.

image

LINK To data: https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory

Download NDC database file

@hodgesr2
Copy link
Member

hodgesr2 commented May 3, 2021

Let me know if that is what you were looking for

@jrlegrand
Copy link
Member Author

I think @kristentaytok mentioned this above but noted that it's only active NDCs... but maybe this is as close as we are going to get.

In my lisinopril example way up at the top, I do see that the earliest start marketing date is represented as 07/01/2002 in the FDA spreadsheet.

We need join a product-level RXCUI (SCD/GPCK/SBD/BPC) to the NDCPACKAGECODE in the package.xls file and then join the product.xls file on PRODUCTNDC and select the earliest date.

@jrlegrand
Copy link
Member Author

@hodgesr2 - you can use this file as an intermediary table to do some testing: https://drive.google.com/file/d/1Nc-Ri2EpDQzebiPZsjXAvBDtodw_BGLc/view?usp=sharing

Let me know if you have questions. We should end up with a file with three columns: MEDICATION_PRODUCT_RXCUI, MEDICATION_PRODUCT_TTY, and EARLIEST_START_MARKETING_DATE.

@jrlegrand
Copy link
Member Author

@kristentaytok - the other question is how we actually implement this. I'm guessing it would be crazy complicated to add this to the CSV distributions... so my other thought is that we do a check AFTER the patient gets randomly assigned a product - check if the year is greater than X, if so - continue on - if not, go back to the ingredient distribution check. We would also want an initial date check to see if we are at least past the earliest date that at least ONE of the meds were available - otherwise, we may get stuck in an infinite loop and break Synthea lol.

@jrlegrand
Copy link
Member Author

jrlegrand commented May 4, 2021

OR I guess how hard would it be to add to the CSV (if we can have Date as an attribute)... like if it was before a date a product was available then the distribution would always be 0 for that product. This is also wonky b/c we are trying to apply 2018 distributions to all the years prior to 2018 - back to 1901...

@kristentaytok
Copy link
Contributor

I think the easiest implementation on the MEPS-CSV creation front is to create distributions for each calendar year. So basically any year after the earliest_start_marketing_date to present is given a row, and similar to products that aren't being used in a certain state/age group/gender group, those that have NULLs in a given year will be filled with a zero. So we're just adding a year column to the CSV files that gets aggregated to calculate a unique distribution for each year.

Not sure about how to change the Synthea JSON part of the implementation though. But would this approach for the CSVs help?

@kristentaytok
Copy link
Contributor

I think @kristentaytok mentioned this above but noted that it's only active NDCs... but maybe this is as close as we are going to get.

In my lisinopril example way up at the top, I do see that the earliest start marketing date is represented as 07/01/2002 in the FDA spreadsheet.

We need join a product-level RXCUI (SCD/GPCK/SBD/BPC) to the NDCPACKAGECODE in the package.xls file and then join the product.xls file on PRODUCTNDC and select the earliest date.

I might be missing something here, but i was thinking we just need to use the "PRODUCTNDC" file and join on the first 9 digits of the NDC from the RXCUI_NDC table with the PRODUCTNDC (no need to use the PACKAGE file/NDCPACKAGECODE).

@jrlegrand
Copy link
Member Author

@kristentaytok yeah I think you are right that we wouldn't need the product file - just the package file. And I think the CSV idea you had is what we should do for the implementation if possible. Maybe I'll look into seeing if it's possible to use date in a table transition.

@jrlegrand
Copy link
Member Author

One thought though is that if we want to get to the month or even day level with start dates for med products, that distribution strategy would get even more unwieldy. But year is how I've seen other modules do it so I think it's ok.

@kristentaytok
Copy link
Contributor

Yea I was thinking about the month/day stuff as well, but agree it would get way too big and not worth the extra detail.

It'll be pretty easy to implement the column in the CSV, but my question is for the MDT do we want to download and store daily versions of this file in our db or just connect to the site/link and have it run on the day the user runs the MDT? The disadvantage to the latter is the user may not get the same results if drugs/codes get removed from the file between runs.

@jrlegrand
Copy link
Member Author

For MDT database loading, I think we would treat the FDA NDC file(s) like we do the RxNorm files. We run them once when the user initializes the MDT app, and then give the user the option to periodically refresh all the database tables, or pick specific ones to update (i.e. RxNorm or FDA or MEPS). I'm mostly concerned about not prescribing modern drugs in really ancient years - not about missing cutting edge therapies... unless I'm misunderstanding your concern.

For MDT JSON, we also will probably still want to know the earliest possible year from all the products in the submodule so we can bypass all the logic if the year is earlier than that because we know the patient will not get a match.

@hodgesr2
Copy link
Member

hodgesr2 commented May 5, 2021

I liked the idea of doing a check. To prevent an infinite loop, could we create a random number generator for the year date between a certain range? Like, if check is false, then go to random generated number for the year within a range and recheck? I wonder how much of an extra computational load it would put on it to create the dataset like this.

I'm going to be in Aruba rest of this week and half of next week. I will work on that intermediary table over the weekend probably in between the beach and eating @ brazilian steakhouse and german beer.

@kristentaytok
Copy link
Contributor

@jrlegrand - makes sense given our main goal is to prevent inappropriate med orders from earlier years. My concern was more around reproducibility (and is probably more of an edge case than a major issue): if we ran the NDC file today and the earliest_start_marketing_date for drug X was 1950, for example, and the NDC became obsolete tomorrow and fell off the list such that the next earliest_start_marketing_date for drug X became 1990, then the user running the MDT tomorrow or later will get very different results for drug X vs. what we get today. Hope this makes it more clear?

But again, since the priority is more to address the bigger issue that we're properly prioritizing (preventing med orders from getting created before a drug was available), then this smaller issue may just be more of a caveat that the end-user can consider addressing by downloading and saving the daily FDA files over time and taking the earliest_start_marketing_date across the files.

@jrlegrand
Copy link
Member Author

@kristentaytok - I hear you now. Yeah I feel like this might be more of an edge case but not really sure. Still probably the best data source we have for now.

@hodgesr2
Copy link
Member

hodgesr2 commented May 9, 2021

@jrlegrand do you want the intermediate file to have a date data type for market date as well? Currently it is in integer format with year first.

@jrlegrand
Copy link
Member Author

Yea I think so

@jrlegrand
Copy link
Member Author

At very least we just need to pull year out of that integer

@hodgesr2
Copy link
Member

Ok. Cleaned_CSV.csv is loaded into google drive. Let me know if that is what you want. The NDCs may load without the leading zeros in excel. It's annoying. May have to format the column and do a special type of format of "000000000" to get the leading zeroes on. There are 3 date columns: the original, the year, and a date format column.

@hodgesr2
Copy link
Member

@hodgesr2
Copy link
Member

There are also NULLs for the TTY and RxCUI columns. I didn't know if you wanted those removed or not but I left them in there in case you needed to validate anything.

@Bridg109
Copy link
Contributor

Bridg109 commented May 11, 2021 via email

@hodgesr2
Copy link
Member

Excel and Python (Anaconda).

@jrlegrand
Copy link
Member Author

@hodgesr2 - we actually need NDCPACKAGECODE (the PACKAGE NDC) instead of the PRODUCTNDC unless we have a really reliable way of getting from the MEPS package NDCs to product NDCs. Also, could you please pull in the medication_product_name field for human readability? Did you try to join to the RxNorm rows based on PACKAGE or PRODUCT NDC?

@hodgesr2
Copy link
Member

k. I'll take a look at it again Friday when I get back. I joined on PRODUCTNDC. I'll try and redo it with NDCPACKAGECODE

@Bridg109
Copy link
Contributor

Bridg109 commented May 11, 2021 via email

@hodgesr2
Copy link
Member

do you want it in a ipynb file that jupyter uses (my preference) or prefer a py file?

@Bridg109
Copy link
Contributor

Bridg109 commented May 11, 2021 via email

@jrlegrand
Copy link
Member Author

@hodgesr2 - we should also pull in ENDMARKETINGDATE

medication-diversification automation moved this from In progress to Done May 14, 2021
kristentaytok added a commit that referenced this issue May 17, 2021
Taking Eugene's restructure branch (from PR #59), this updates the following:
1. Added cleaned version of Rob's code to database.by --> creates an ingredient_rxcui_year table & a product_rxcui_year table, for their respective distributions in generate_module
2. added 'year' column to the generate_module dataframes/CSV files.
3. fixed default_probability typo in utils.py (if idx == 1 --> changed to if idx == 0)
@kristentaytok kristentaytok linked a pull request May 17, 2021 that will close this issue
@jrlegrand
Copy link
Member Author

Reopening because we never really finished this feature.

@jrlegrand jrlegrand reopened this Jul 13, 2021
medication-diversification automation moved this from Done to In progress Jul 13, 2021
@jrlegrand
Copy link
Member Author

Synthea has an attribute called time (I think) that can be used in a transition table. It's the number of seconds since some year.

@jrlegrand jrlegrand changed the title Bonus points: Figure out an automated way to find first available date of a medication product Figure out an automated way to find first available date of a medication product Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
4 participants