# Demographics Recoding Examples

In [1]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


### Simulate some data

In [2]:
df <- data.frame(
  PIDM = 1:50,
  gender = sample(c('M','W','N'), 50, replace = TRUE),
  raceethn = sample(c('Non-Resident Alien', 'Hispanic', 'American Indian or Alaskan Native',
  'Asian', 'Black or African American','White', 'More Than One Race', 'Unknown'),
  50, replace = TRUE)
)

head(df)

Unnamed: 0_level_0,PIDM,gender,raceethn
Unnamed: 0_level_1,<int>,<chr>,<chr>
1,1,M,Black or African American
2,2,W,American Indian or Alaskan Native
3,3,M,American Indian or Alaskan Native
4,4,W,More Than One Race
5,5,M,Black or African American
6,6,W,American Indian or Alaskan Native


In [3]:
table(df$gender, useNA = 'ifany')


 M  N  W 
18 16 16 

In [4]:
table(df$raceethn, useNA = 'ifany')


American Indian or Alaskan Native                             Asian 
                                6                                 8 
        Black or African American                          Hispanic 
                                6                                 5 
               More Than One Race                Non-Resident Alien 
                                8                                 7 
                          Unknown                             White 
                                8                                 2 

### Example 1: Simple substitution

In [5]:
df %>%
  mutate(gender = case_match(gender,
  "W" ~ "F",
  .default = gender
  )
  ) %>%
    count(gender)

gender,n
<chr>,<int>
F,16
M,18
N,16


### Example 2: Create levels for Race/Ethnicity even when they don't exist in the data

In [6]:
df$raceethn <- factor(df$raceethn) # must be a factor first for this to work !!!
df$FedRace <- factor(df$raceethn, levels = c(levels(df$raceethn), "Pac Isl"))

table(df$FedRace)


American Indian or Alaskan Native                             Asian 
                                6                                 8 
        Black or African American                          Hispanic 
                                6                                 5 
               More Than One Race                Non-Resident Alien 
                                8                                 7 
                          Unknown                             White 
                                8                                 2 
                          Pac Isl 
                                0 