Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing URLs of NHANES Data.File and Doc.File #22

Closed
rsgoncalves opened this issue Jul 28, 2023 · 4 comments
Closed

Missing URLs of NHANES Data.File and Doc.File #22

rsgoncalves opened this issue Jul 28, 2023 · 4 comments

Comments

@rsgoncalves
Copy link
Collaborator

While working on NHANES issue #27 (ccb-hms/nhanes-database#27 (comment)) we realized that the URLs to NHANES “Data File" and “Doc File" are not returned by the nhanesA::nhanesSearchTableNames() function.

The links are in the CDC website, e.g.:
https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&CycleBeginYear=2017
but nhanesA returns a data frame with only strings such as “DEMO_J Data [XPT - 3.3 MB]”

Perhaps nhanesA should be modified to include these URLs as additional columns in the returned data frame.

@rgentlem
Copy link
Collaborator

the problem is that the URL that nhanesSearchTableNames uses is
https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx
which does not have any of that information - it really just gives you the names (sort of), the size - sort of and when the data were uploaded.
so something like this:
xx = nhanesSearchTableNames('BMX', details=TRUE)
then
urls = paste0(nhanesA:::nhanesURL, xx$Years, "/" ,sapply(strsplit(xx$Doc.File, " "), function(x) x[[1]]), ".XPT")
seems to do the trick...

I can ask Chris Endres to perhaps modify the nhanesSearchTableNames function to add this in, in the shorter term you could just try this and see if it is what you need.

@cjendres1
Copy link
Collaborator

nhanesSearchTableNames was intended to return table names so that they can be easily input to function nhanes.

If we also want URL's I could do something like:

dataURL <- "https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx"
.checkHtml(dataURL) %>% html_elements(xpath=xpath) %>% html_nodes("a") %>% html_attr('href')

Is that what you had in mind? I can prepend https://wwwn.cdc.gov to construct the full path.

@cjendres1
Copy link
Collaborator

Here's the nhanesA git if you want to report an issue:
https://github.com/cjendres1/nhanes/issues

@rsgoncalves
Copy link
Collaborator Author

If we also want URL's I could do something like:

dataURL <- "https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx"
.checkHtml(dataURL) %>% html_elements(xpath=xpath) %>% html_nodes("a") %>% html_attr('href')

Is that what you had in mind? I can prepend https://wwwn.cdc.gov to construct the full path.

Yes, this is exactly what we had in mind — both for the Data URL and perhaps also the Doc URL if possible.

I'll open a new issue in the nhanesA tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants