-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Year Over Year Increases. #42
Conversation
…hamarcological drug classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
I have a few suggestions/ideas/comments:
- We might want to hold off merging this until the new repo structure is in place so that all the bits here can go into their proper place.
- In
Part_D_with_uses.ipynb
, where does the NDC data come from? It's being loaded from disk in the notebook: could that be made a query to wherever the data came from (e.g. data.world)? - More curiosity that anything: how many drugs did you manage to identify with their NDC pharma classes? All of the Part D ones or a subset? I'd be curious how big that subset is ...
- I think out policy is not to have any data in the repo (it makes them large and unwieldy), so it might be best to remove the things in
./data/
and move them todata.world
- The idea of using machine learning for the clustering of names is cool, even if it doesn't seem to work. What I'd do is just print out the top 100 words + occurrences to the screen, and manually look at them. Then add things like "mg" and so on to the stop words. But you might be right: based on what I saw in the earlier DataFrames in your notebooks, the terms may be too specialized to cluster well.
- On a similar note, in the folder
./cms/
I was playing around with another set of definitions for drug usage. They might be useful here, too? - I'm not entirely sure how to read the rainbow coloured plots in
exploration_of_plan_b_yr_to_yr_increases.ipynb
. Maybe having another sentence explaining what each colour represents might be useful?
Hi @dhuppenkothen ,
Thanks for your feedback.
We might want to hold off merging this until the new repo structure is in place so that all the bits here can go into their proper place.
I think that makes a lot of sense.
In Part_D_with_uses.ipynb, where does the NDC data come from? It's being loaded from disk in the notebook: could that be made a query to wherever the data came from (e.g. data.world)?
I didn’t have access to data.world at the time, but I’ve uploaded it now, so the new code will download it from there.
More curiosity that anything: how many drugs did you manage to identify with their NDC pharma classes? All of the Part D ones or a subset? I'd be curious how big that subset is …
There’s about 2500 left to be matched to their NDC classes, but those accound for less than 20% of the spending… I think we could match most of them if I clean the data better, or improve the matching algorithm. But another problem is that there are a lot of NDC pharma classes, and it would be better to put the drugs into larger therapeutic use groups; having the NDC classes around might help with that, though since it might help us to improve the matching algorithm (if we don’t directly know which therapeutic use groups a drug fits in, but we know which pharma class it fits in, and all other drugs in that class are used to treat high blood preasure, then it probably has the same use).
I think out policy is not to have any data in the repo (it makes them large and unwieldy), so it might be best to remove the things in ./data/ and move them to data.world
The idea of using machine learning for the clustering of names is cool, even if it doesn't seem to work. What I'd do is just print out the top 100 words + occurrences to the screen, and manually look at them. Then add things like "mg" and so on to the stop words. But you might be right: based on what I saw in the earlier DataFrames in your notebooks, the terms may be too specialized to cluster well.
On a similar note, in the folder ./cms/ I was playing around with another set of definitions for drug usage. They might be useful here, too?
Yes, I think we should incorporate as many definitions as possible (since I doubt any will be complete), and then get a graph relating drugs to various uses...
I'm not entirely sure how to read the rainbow coloured plots in exploration_of_plan_b_yr_to_yr_increases.ipynb. Maybe having another sentence explaining what each colour represents might be useful?
Probably I should use a stacked area plot instead: the colors are just different years…
Here's a slightly better view (except for the legend, and it's still too cluttered):
![download](https://cloud.githubusercontent.com/assets/19326411/22811966/ea52e4c4-eef6-11e6-9da0-8650fbc5569b.png)
'Unknown' refers to drugs which have not been matched to pharma classes, while 'Other' refers to drugs which have been matched (but were aggregated to reduce clutter).
- david
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#42 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASbly83uWnZxvKfpKr06BWcFHvRWJdJ0ks5ranjBgaJpZM4L7g_a>.
|
@davidlibland What about possibly just a bar graph, if we're concentrating on just the top few categories? EDIT: Derp, it's a longitudinal comparison. In that case, how about a line graph? Areas tend to confuse me. |
…nd fixed import of data (to get all of it rather than first 100 entries
@davidlibland Looks like you've still got a UPDATE: Let's also get rid of the |
Hi @mattgawarecki, |
Looks like at points I was actually commenting on things you were already working on fixing -- apologies for that :-) Anyway though, I think your updates to match the new structure are just about everything we need to get it merged in. I'll do one last run-through tonight -- though if anybody reading this wants to beat me to it, you're more than welcome -- and we should have it merged in soon. |
Added some python notebooks to do some preliminary analysis of year over year increases. Also incorporated the FDA's NDC data to associate drugs to their phamacological classes, and aggregate spending and use-counts across those classes. Steepest year over year increases are visualized both for individual drugs and across drug classes.