Skip to content

Commit

Permalink
Changed xpath to reflect NHANES website upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
cjendres1 committed Jul 11, 2016
1 parent 946c9fa commit 3907244
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 21 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: nhanesA
Version: 0.6.4
Date: 2016-05-22
Version: 0.6.4.1
Date: 2016-07-10
Title: NHANES Data Retrieval
Author: Christopher Endres
Maintainer: Christopher Endres <cjendres1@gmail.com>
Expand Down
9 changes: 6 additions & 3 deletions R/nhanes.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#nhanesA - retrieve data from the CDC NHANES repository
nhanesURL <- 'http://wwwn.cdc.gov/Nchs/Nhanes/'
#varURL <- 'http://wwwn.cdc.gov/Nchs/Nhanes/search/variablelist.aspx'
varURL <- 'http://wwwn.cdc.gov/nchs/nhanes/search/variablelist.aspx'
dataURL <- 'http://wwwn.cdc.gov/nchs/nhanes/search/DataPage.aspx'
dataURL <- 'http://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx'

# Create a list of nhanes groups
# Include convenient aliases
Expand Down Expand Up @@ -101,7 +102,8 @@ anomalytables2005 <- c('CHLMD_DR', 'SSUECD_R', 'HSV_DR')
# Internal function to determine if a number is even
.is.even <- function(x) {x %% 2 == 0}

xpath <- '//*[@id="ContentPlaceHolder1_GridView1"]'
#xpath <- '//*[@id="ContentPlaceHolder1_GridView1"]'
xpath <- '//*[@id="GridView1"]'

#------------------------------------------------------------------------------
#' Returns a list of table names for the specified survey group.
Expand Down Expand Up @@ -594,7 +596,8 @@ nhanesSearchVarName <- function(varname=NULL, ystart=NULL, ystop=NULL, includerd
warning("Multiple variable names entered. Only the first will be matched.")
}

xpt <- str_c('//*[@id="ContentPlaceHolder1_GridView1"]/*[td[1]="', varname, '"]', sep='')
# xpt <- str_c('//*[@id="ContentPlaceHolder1_GridView1"]/*[td[1]="', varname, '"]', sep='')
xpt <- str_c('//*[@id="GridView1"]/tbody/*[td[1]="', varname, '"]', sep='')
tabletree <- varURL %>% read_html() %>% xml_nodes(xpath=xpt)
ttlist <- lapply(lapply(tabletree, xml_children), xml_text)
# Convert the list to a data frame
Expand Down
5 changes: 4 additions & 1 deletion vignettes/Introducing_nhanesA.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ nhanesTableVars('EXAM', 'BMX_D')
```

We see that there are 27 columns in table BMX_D. The first column (SEQN) is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables.

### Import NHANES Tables

We now import BMX\_D along with the demographics table DEMO\_D.
```{r}
bmx_d <- nhanes('BMX_D')
Expand Down Expand Up @@ -100,7 +103,7 @@ names(q2007tables) <- q2007names

### Import Dual X-Ray Absorptiometry Data
Dual X-Ray Absorptiometry (DXA) Data were acquired from 1999-2006. The tables are considerably larger than most
NHANES data tables and are available via ftp server only. More information may be found at http://www.cdc.gov/nchs/nhanes/dxx/dxa.htm. By default the DXA data are imported into the R environment, however, because the tables are quite large it may be desirable to save the data to a local file then import to R as needed. When nhanesTranslate is applied to DXA data, only the 2005-2006 translation tables are used as those are the only DXA codes that are currently available in html format. Note that in a recent upgrade to the NHANES website it appears that the DXA tables may have been moved or deleted.
NHANES data tables and are available via ftp server only. More information may be found at http://www.cdc.gov/nchs/nhanes/dxx/dxa.htm. By default the DXA data are imported into the R environment, however, because the tables are quite large it may be desirable to save the data to a local file then import to R as needed. When nhanesTranslate is applied to DXA data, only the 2005-2006 translation tables are used as those are the only DXA codes that are currently available in html format. **Note that in a recent upgrade to the NHANES website it appears that the DXA tables may have been moved or deleted.**
```{r, eval=FALSE}
#Import into R
dxx_b <- nhanesDXA(2001)
Expand Down
46 changes: 31 additions & 15 deletions vignettes/Introducing_nhanesA.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<meta name="author" content="Christopher Endres" />

<meta name="date" content="2016-05-22" />
<meta name="date" content="2016-07-10" />

<title>Introducing nhanesA</title>

Expand Down Expand Up @@ -70,7 +70,7 @@

<h1 class="title toc-ignore">Introducing nhanesA</h1>
<h4 class="author"><em>Christopher Endres</em></h4>
<h4 class="date"><em>2016-05-22</em></h4>
<h4 class="date"><em>2016-07-10</em></h4>



Expand Down Expand Up @@ -136,7 +136,11 @@ <h3>List Variables in an NHANES Table</h3>
## 25 BMITRI Triceps Skinfold Comment
## 26 BMXSUB Subscapular Skinfold (mm)
## 27 BMISUB Subscapular Skinfold Comment</code></pre>
<p>We see that there are 27 columns in table BMX_D. The first column (SEQN) is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables. We now import BMX_D along with the demographics table DEMO_D.</p>
<p>We see that there are 27 columns in table BMX_D. The first column (SEQN) is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables.</p>
</div>
<div id="import-nhanes-tables" class="section level3">
<h3>Import NHANES Tables</h3>
<p>We now import BMX_D along with the demographics table DEMO_D.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">bmx_d &lt;-<span class="st"> </span><span class="kw">nhanes</span>(<span class="st">'BMX_D'</span>)</code></pre></div>
<pre><code>## Processing SAS dataset BMX_D ..</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">demo_d &lt;-<span class="st"> </span><span class="kw">nhanes</span>(<span class="st">'DEMO_D'</span>)</code></pre></div>
Expand Down Expand Up @@ -265,18 +269,30 @@ <h3>Searching for tables by name pattern</h3>
<span class="kw">nhanesSearchTableNames</span>(<span class="st">'BMX'</span>)</code></pre></div>
<pre><code>## [1] &quot;BMX&quot; &quot;BMX_B&quot; &quot;BMX_C&quot; &quot;BMX_D&quot; &quot;BMX_E&quot; &quot;BMX_F&quot; &quot;BMX_G&quot; &quot;BMX_H&quot;</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">nhanesSearchTableNames</span>(<span class="st">'HPVS'</span>, <span class="dt">includerdc=</span><span class="ot">TRUE</span>, <span class="dt">nchar=</span><span class="dv">42</span>, <span class="dt">details=</span><span class="ot">TRUE</span>)</code></pre></div>
<pre><code>## Years Data.File.Name Doc.File
## 1 2005-2006 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_D Doc
## 2 2007-2008 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_E Doc
## 3 2009-2010 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_F Doc
## 4 2005-2006 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVS_D_R Doc
## 5 2005-2006 Human Papillomavirus (HPV) - Multiplexed 6 HPVSRM_D Doc
## 6 2005-2006 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_D Doc
## 7 2007-2008 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_E Doc
## 8 2009-2010 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_F Doc
## 9 2009-2010 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_F_R Doc
## 10 2011-2012 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_G_R Doc
## 11 2011-2012 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_G Doc
<pre><code>## X1 X2 Years
## 1 Search The CDC submit 2005-2006
## 2 Search The CDC submit 2007-2008
## 3 Search The CDC submit 2009-2010
## 4 Search The CDC submit 2005-2006
## 5 Search The CDC submit 2005-2006
## 6 Search The CDC submit 2005-2006
## 7 Search The CDC submit 2007-2008
## 8 Search The CDC submit 2009-2010
## 9 Search The CDC submit 2009-2010
## 10 Search The CDC submit 2011-2012
## 11 Search The CDC submit 2011-2012
## Data.File.Name Doc.File
## 1 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_D Doc
## 2 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_E Doc
## 3 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVSER_F Doc
## 4 Human Papillomavirus (HPV) - 6, 11, 16 &amp; 1 HPVS_D_R Doc
## 5 Human Papillomavirus (HPV) - Multiplexed 6 HPVSRM_D Doc
## 6 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_D Doc
## 7 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_E Doc
## 8 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_F Doc
## 9 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_F_R Doc
## 10 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_G_R Doc
## 11 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_G Doc
## Data.File Date.Published
## 1 HPVSER_D Data [XPT, 151.6 KB] July, 2013
## 2 HPVSER_E Data [XPT, 155.7 KB] November, 2013
Expand Down

0 comments on commit 3907244

Please sign in to comment.