Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate + add data from drugbank.ca #61

Open
jenniferthompson opened this issue Apr 27, 2017 · 15 comments
Open

Investigate + add data from drugbank.ca #61

jenniferthompson opened this issue Apr 27, 2017 · 15 comments

Comments

@jenniferthompson
Copy link
Contributor

@cduvallet made us aware of a site (drugbank.ca) that looks like it has very promising data! We need someone to

  1. Contact them to make sure it's OK if we download and process the data and include it in our data.world repo, of course giving proper citations (their TOS look promising)
  2. Determine exactly what data is available and what would be most helpful in our context
  3. Download, tidy and add that data to our data.world repo, along with a data dictionary
@cduvallet
Copy link
Contributor

Thanks for making this issue @jenniferthompson!

  • re: 1. I'll email them today!
  • re: 2. It would be great to have help from someone who's been more involved in the data analysis projects happening here to work with me to figure out what the most interesting/useful part of the data will be. This can probably wait until after we've started poking around and seeing what's available in DrugBank.
  • re: 3. Happy to take the lead on this, looks like it's easy to download and will probably be relatively straightforward to parse.

@cduvallet
Copy link
Contributor

They got back to me pretty quickly, and had some questions about data.world that I'm not sure I know the answer to:

Looks like an interesting project, thanks for reaching out!

I checked out your site and noticed a couple of issues:

  1. Data.world looks like a commercial project that requires people have accounts to download data. It doesn't look like they have a good way to post the licenses for datasets? Maybe I am not understanding what data.world is.

  2. I don't see a clear indication of the license for the datasets available through your website, or clear citations to the datasets there?

Your use case looks like a non-commercial use case, so that should be fine but, when our data is shared it has to be shared both with a citation and the license we share our data under.

We also have 2 datasets that are public domain and you can do whatever you want with them, on this page: https://www.drugbank.ca/releases/latest#open-data

They include DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project.

Is there any way we can include their license and citation on data.world? I'm pretty sure it will be more characters than are allowed in the "description" on data.world, and I'm not sure where else dataset metadata can be put on data.world (which is pretty surprising...)

Alternatively, should we just stick with the public domain data?

@mattgawarecki
Copy link
Contributor

mattgawarecki commented Apr 27, 2017 via email

@forzavitale
Copy link

hi all-- first time jumping in here! at the NYC hackathon rn, seems like this issue is pretty recent and would like to start munging something.... guidance?

@cduvallet
Copy link
Contributor

@forzavitale I pinged the DrugBank people again to ask if we could just include the license and citation info in the header of the file, since we can't assign it to the file directly via data.world. They haven't gotten back to me about that though. That said, in my opinion it should be fine so you can probably start working on the data. Let's just make sure to check back in with them before we post the data to data.world.

Alternatively, you can poke around the public domain data and see if that's enough to get us what we want!

@jenniferthompson
Copy link
Contributor Author

I think that's a good plan @cduvallet - and at the speed the data.world folks move (read: blazing fast), it's entirely plausible that we might be able to assign a file-specific license by the time we're ready to post it.

@cduvallet
Copy link
Contributor

cduvallet commented May 10, 2017

Update, just heard back from the DrugBank people and they said that including the info in the header of the file is fine. Full speed ahead!

@forzavitale can you update us on your progress from the hackathon (if you ended up working on this)?

@jenniferthompson
Copy link
Contributor Author

Fantastic! Thanks so much for following up, @cduvallet! 🎉

@darwinyfu
Copy link

Is this still a project that needs help? I see the label but comments are fairly old.

Been lurking on D4D for a while but interested in working on something.

@darya-akimova
Copy link
Contributor

Hello! The project has been dormant for a while (hence the old comments), I'm one of the people that's trying to get this project going again. Any issue with the label status-under-review can be ignored for now, it either can't be tackled yet or may need to be trimmed/reformatted. This is one of the older issues that I thought would be good to try and get through because drugbank.ca materials seem to be very useful for our current goal of matching drugs to therapeutic uses.

@darya-akimova
Copy link
Contributor

In PR #83 @proof-by-accident investigated how many of the Medicare drugs can be found in the drugbank.ca data. The results seem similar to matching attempts attempts from other sources: a good number of drugs can be matched easily on the first pass, but about twice as many were not matched and will probably require a non-trivial amount of research to match the rest properly.

@acutrell
Copy link

acutrell commented Apr 3, 2018

I don't have the coding ability to do this, but I am knowledgeable about the domain as an informatics pharmacist and willing to offer some help from that aspect. Pretty sure the answer to this problem is the Structured Product Labeling (SPL). It is a document markup standard approved by Health Level Seven (HL7) and adopted by FDA as a mechanism for exchanging product and facility information.

Different datasets use different drug identifiers: brand name, generic name, NDA, NDC, etc. and it is hard to find the same drug in different datasets. The OpenFDA features harmonization of drug identifiers and fields for various pharmacological use are part of the dataset. Take a look: https://open.fda.gov/drug/label/reference/

@darya-akimova
Copy link
Contributor

Oh this seems great! The OpenFDA might be just what we need because you're right, we have been running into the issue where not everything is in one dataset and the names can be inconsistent between datasets. Thanks for this suggestion.

@veena-v-g
Copy link

Can I help?

@TBusen
Copy link

TBusen commented Oct 24, 2018

Is this still active? Can I start this or is this throw away work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants