Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Year Over Year Increases. #42

Merged
merged 22 commits into from
Feb 15, 2017
Merged

Conversation

davidlibland
Copy link
Contributor

Added some python notebooks to do some preliminary analysis of year over year increases. Also incorporated the FDA's NDC data to associate drugs to their phamacological classes, and aggregate spending and use-counts across those classes. Steepest year over year increases are visualized both for individual drugs and across drug classes.

Copy link
Contributor

@dhuppenkothen dhuppenkothen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!
I have a few suggestions/ideas/comments:

  • We might want to hold off merging this until the new repo structure is in place so that all the bits here can go into their proper place.
  • In Part_D_with_uses.ipynb, where does the NDC data come from? It's being loaded from disk in the notebook: could that be made a query to wherever the data came from (e.g. data.world)?
  • More curiosity that anything: how many drugs did you manage to identify with their NDC pharma classes? All of the Part D ones or a subset? I'd be curious how big that subset is ...
  • I think out policy is not to have any data in the repo (it makes them large and unwieldy), so it might be best to remove the things in ./data/ and move them to data.world
  • The idea of using machine learning for the clustering of names is cool, even if it doesn't seem to work. What I'd do is just print out the top 100 words + occurrences to the screen, and manually look at them. Then add things like "mg" and so on to the stop words. But you might be right: based on what I saw in the earlier DataFrames in your notebooks, the terms may be too specialized to cluster well.
  • On a similar note, in the folder ./cms/ I was playing around with another set of definitions for drug usage. They might be useful here, too?
  • I'm not entirely sure how to read the rainbow coloured plots in exploration_of_plan_b_yr_to_yr_increases.ipynb. Maybe having another sentence explaining what each colour represents might be useful?

@davidlibland
Copy link
Contributor Author

davidlibland commented Feb 10, 2017 via email

@mattgawarecki
Copy link
Contributor

mattgawarecki commented Feb 10, 2017

@davidlibland What about possibly just a bar graph, if we're concentrating on just the top few categories?

EDIT: Derp, it's a longitudinal comparison. In that case, how about a line graph? Areas tend to confuse me.

@mattgawarecki
Copy link
Contributor

mattgawarecki commented Feb 14, 2017

@davidlibland Looks like you've still got a drugs_w_lrg_yr-yr_increases directory in the repository root. Can you make sure everything's moved out, then delete that directory from your repo, commit, and push again?

UPDATE: Let's also get rid of the cms directory since that data's coming from data.world.

@davidlibland
Copy link
Contributor Author

Hi @mattgawarecki,
Thanks, I just moved the files to python/notebooks/drugs_w_lrg_yr-yr_increases/
The part_d_with_uses pulls it's data from data.world, but it still writes to a local database (ignored in .gitignore); it just tries to correlate the drugs with their uses (according to the FDA NDC database), but it's incomplete, so I don't want to put the output on data.world.

@mattgawarecki
Copy link
Contributor

Looks like at points I was actually commenting on things you were already working on fixing -- apologies for that :-)

Anyway though, I think your updates to match the new structure are just about everything we need to get it merged in. I'll do one last run-through tonight -- though if anybody reading this wants to beat me to it, you're more than welcome -- and we should have it merged in soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants