<a href="https://colab.research.google.com/github/Alphabf/CDISC-Implementation/blob/main/Mapping_Raw_Demographics_to_SDTM_Compliant_DM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Author:** Alpha Traore, Sr Data Scientist, Biostat Consultants Inc.       
**📁 Project:** CDISC Dataset Preparation     
**📁 Domain:** DM (Demographics)    
**🎯 Purpose:** Mapping Raw Clinical Data to SDTM DM Domain Using R   
  
---

**📌Overview**

This notebook demonstrates the process of creating the **Demographics (DM) domain** as defined by the CDISC SDTM model. The objective is to use R to program a compliant DM dataset, suitable for clinical trial data submissions and regulatory review.

**Step 1: Load Raw Data Inputs**

In [None]:
load("dm_rawdata.RData")
ls()

**Step 2: Check Demog Raw data**

In [None]:
# load("dm_rawdata.RData", nv <- new.env())
# ls(nv)

In [None]:
demog

study,pt,sex,ethnic,race,race2,race3,race4,racesp,age_raw,age_rawu,brthdt_raw,country
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1001,Male,Hispanic or Latino,White,,,,,35,Years,,USA
CSG001,1002,Female,Not Hispanic or Latino,Asian,American Indian or Alaska Native,,,,40,Years,,USA
CSG001,1003,Male,Hispanic or Latino,Other,,,,BRAZILIAN,40,Years,,USA
CSG001,1004,Male,Hispanic or Latino,White,,,,,38,Years,,USA
CSG001,1005,Male,Not Hispanic or Latino,American Indian or Alaska Native,,,,,64,Years,,USA
CSG001,1006,Female,Not Hispanic or Latino,Native Hawaiian or Other Pacific Islander,,,,,75,Years,,USA
CSG001,1007,Male,Not Hispanic or Latino,Unknown,,,,,32,Years,,USA
CSG001,1008,Female,Not Hispanic or Latino,Not Reported,,,,,83,Years,,USA


**3. Generate the identifier variables and additional variables that are directly contingent upon the raw variables, without engaging in significant derivations.**

In [None]:
dm01 <- demog %>%
 rename(race0 = race) %>%
 mutate(
 domain = "DM",
 studyid = study,
 subjid = pt,
 siteid = substr(pt, 1, 2),
 usubjid = paste(study, pt, sep = "-"),
 country = country,
 ethnic = toupper(ethnic),
 non_missing_count = rowSums(across(c(race0, race2, race3, race4), ~ !is.na(.) & . != "")),

 race = ifelse(non_missing_count > 1, "MULTIPLE", toupper(coalesce(race0, race2, race3, race4))),
 racesp = racesp,
 race1 = ifelse(non_missing_count > 1,toupper(race0),""),
 race2 = ifelse(non_missing_count > 1,toupper(race2),""),
 race3 = ifelse(non_missing_count > 1,toupper(race3),""),
 race4 = ifelse(non_missing_count > 1,toupper(race4),""),

 age = ifelse(!is.na(age_raw), as.integer(age_raw), NA_integer_),
 ageu = toupper(age_rawu),
 sex = ifelse(sex == "Female", "F", ifelse(sex == "Male", "M", sex))
 )
 select(dm01, c('domain','studyid','subjid', 'siteid','usubjid','country', 'ethnic','race','age','ageu','sex'))

domain,studyid,subjid,siteid,usubjid,country,ethnic,race,age,ageu,sex
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>
DM,CSG001,1001,10,CSG001-1001,USA,HISPANIC OR LATINO,WHITE,35,YEARS,M
DM,CSG001,1002,10,CSG001-1002,USA,NOT HISPANIC OR LATINO,MULTIPLE,40,YEARS,F
DM,CSG001,1003,10,CSG001-1003,USA,HISPANIC OR LATINO,OTHER,40,YEARS,M
DM,CSG001,1004,10,CSG001-1004,USA,HISPANIC OR LATINO,WHITE,38,YEARS,M
DM,CSG001,1005,10,CSG001-1005,USA,NOT HISPANIC OR LATINO,AMERICAN INDIAN OR ALASKA NATIVE,64,YEARS,M
DM,CSG001,1006,10,CSG001-1006,USA,NOT HISPANIC OR LATINO,NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER,75,YEARS,F
DM,CSG001,1007,10,CSG001-1007,USA,NOT HISPANIC OR LATINO,UNKNOWN,32,YEARS,M
DM,CSG001,1008,10,CSG001-1008,USA,NOT HISPANIC OR LATINO,NOT REPORTED,83,YEARS,F


**4. Check the new variables**

In [None]:
select(dm01, c('domain','studyid','subjid', 'siteid','usubjid','country', 'ethnic','race','age','ageu','sex'))

domain,studyid,subjid,siteid,usubjid,country,ethnic,race,age,ageu,sex
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>
DM,CSG001,1001,10,CSG001-1001,USA,HISPANIC OR LATINO,WHITE,35,YEARS,M
DM,CSG001,1002,10,CSG001-1002,USA,NOT HISPANIC OR LATINO,MULTIPLE,40,YEARS,F
DM,CSG001,1003,10,CSG001-1003,USA,HISPANIC OR LATINO,OTHER,40,YEARS,M
DM,CSG001,1004,10,CSG001-1004,USA,HISPANIC OR LATINO,WHITE,38,YEARS,M
DM,CSG001,1005,10,CSG001-1005,USA,NOT HISPANIC OR LATINO,AMERICAN INDIAN OR ALASKA NATIVE,64,YEARS,M
DM,CSG001,1006,10,CSG001-1006,USA,NOT HISPANIC OR LATINO,NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER,75,YEARS,F
DM,CSG001,1007,10,CSG001-1007,USA,NOT HISPANIC OR LATINO,UNKNOWN,32,YEARS,M
DM,CSG001,1008,10,CSG001-1008,USA,NOT HISPANIC OR LATINO,NOT REPORTED,83,YEARS,F


In [None]:
enrlment

study,pt,folder,icdt_raw,icvers,prtvers,enrldt_raw,randdt_raw,randno
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1001,SCR,1/JAN/2010,1,1,,,
CSG001,1002,SCR,1/JAN/2010,1,1,4/JAN/2010,,
CSG001,1003,SCR,1/JAN/2010,1,1,3/JAN/2010,3/JAN/2010,514876.0
CSG001,1004,SCR,1/JAN/2010,1,1,4/JAN/2010,5/JAN/2010,101415.0
CSG001,1005,SCR,15/JAN/2010,1,1,1/FEB/2010,5/FEB/2010,306185.0
CSG001,1006,SCR,18/FEB/2010,1,1,1/MAR/2010,1/MAR/2010,987435.0
CSG001,1007,SCR,4/APR/2010,2,2,14/APR/2010,14/APR/2010,98745.0
CSG001,1008,SCR,20/JUN/2010,2,3,26/JUN/2010,27/JUN/2010,123098.0


**Derive disposition related variables**

In [None]:
rficdtc <- enrlment %>%
 mutate(
 rficdtc = ifelse(!is.na(icdt_raw), format(as.Date(icdt_raw, format = "%d/%b/%Y"),"%Y-%m-%d"), NA),
 enrldtc = ifelse(!is.na(enrldt_raw), format(as.Date(enrldt_raw, format = "%d/%b/%Y"),"%Y-%m-%d"), NA),
 randdtc = ifelse(!is.na(randdt_raw), format(as.Date(randdt_raw, format = "%d/%b/%Y"),"%Y-%m-%d"), NA)
 ) %>%
 select(study, pt, rficdtc, enrldtc, randdtc)

rfendtc <- eos %>%
 filter(eoscat == "End of Study") %>%
 mutate(rfendtc = ifelse(!is.na(eostdt_raw), format(as.Date(eostdt_raw, format = "%d/%b/%Y"),"%Y-%m-%d"), NA)) %>%
 select(study, pt, rfendtc)

dthdtc <- eos %>%
 filter(eoscat == "End of Study" & eoterm == "Death") %>%
 mutate(dthdtc = ifelse(!is.na(eostdt_raw), format(as.Date(eostdt_raw, format = "%d/%b/%Y"),"%Y-%m-%d"), NA), dthfl = "Y") %>%
 select(study, pt, dthdtc, dthfl)


In [None]:
rficdtc

study,pt,rficdtc,enrldtc,randdtc
<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1001,2010-01-01,,
CSG001,1002,2010-01-01,2010-01-04,
CSG001,1003,2010-01-01,2010-01-03,2010-01-03
CSG001,1004,2010-01-01,2010-01-04,2010-01-05
CSG001,1005,2010-01-15,2010-02-01,2010-02-05
CSG001,1006,2010-02-18,2010-03-01,2010-03-01
CSG001,1007,2010-04-04,2010-04-14,2010-04-14
CSG001,1008,2010-06-20,2010-06-26,2010-06-27


In [None]:
rfendtc

study,pt,rfendtc
<chr>,<chr>,<chr>
CSG001,1002,2010-01-05
CSG001,1003,2010-01-05
CSG001,1004,2010-02-28
CSG001,1006,2010-03-25
CSG001,1007,2010-06-12
CSG001,1008,2010-08-18


In [None]:
dthdtc

study,pt,dthdtc,dthfl
<chr>,<chr>,<chr>,<chr>
CSG001,1003,2010-01-05,Y


**Get Exposure Related Variables**

In [None]:
exp01 <- ipadmin %>%
 filter(as.integer(ipqty_raw) > 0) %>%
 mutate(
 ipstdtc = as.Date(ipstdt_raw, format = "%d/%b/%Y"),
 ipsttm = format(as.POSIXct(ipsttm_raw, format = "%H:%M", tz = ""),"%H:%M"),
 tempdtc = paste(ipstdtc, ipsttm, sep = "T")
 ) %>%
 select(study, pt, tempdtc, ipboxid)

#Earliest treatment date
rfxstdtc <- exp01 %>%
 arrange(study,pt,tempdtc) %>%
 group_by(study, pt) %>%
 slice(1) %>%
 mutate(rfxstdtc = tempdtc)

#Late treatment date
rfxendtc <- exp01 %>%
 arrange(study,pt,tempdtc) %>%
 group_by(study, pt) %>%
 slice(n()) %>%
 mutate(rfxendtc = tempdtc)

In [None]:
exp01

study,pt,tempdtc,ipboxid
<chr>,<chr>,<chr>,<chr>
CSG001,1004,2010-01-05T08:35,13434371
CSG001,1004,2010-01-12T08:35,52970539
CSG001,1004,2010-01-18T09:30,52120567
CSG001,1004,2010-01-25T08:45,59305202
CSG001,1005,2010-02-05T08:46,13787377
CSG001,1005,2010-02-12T08:30,65580239
CSG001,1006,2010-03-02T08:30,39024101
CSG001,1006,2010-03-10T08:30,65845489
CSG001,1007,2010-04-15T08:23,66223983
CSG001,1007,2010-04-22T09:00,71763169


In [None]:
rfxstdtc

study,pt,tempdtc,ipboxid,rfxstdtc
<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1004,2010-01-05T08:35,13434371,2010-01-05T08:35
CSG001,1005,2010-02-05T08:46,13787377,2010-02-05T08:46
CSG001,1006,2010-03-02T08:30,39024101,2010-03-02T08:30
CSG001,1007,2010-04-15T08:23,66223983,2010-04-15T08:23
CSG001,1008,2010-06-27T08:45,68891589,2010-06-27T08:45


In [None]:
rfxendtc

study,pt,tempdtc,ipboxid,rfxendtc
<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1004,2010-01-25T08:45,59305202,2010-01-25T08:45
CSG001,1005,2010-02-12T08:30,65580239,2010-02-12T08:30
CSG001,1006,2010-03-10T08:30,65845489,2010-03-10T08:30
CSG001,1007,2010-05-06T08:12,68706162,2010-05-06T08:12
CSG001,1008,2010-07-11T09:20,3199027,2010-07-11T09:20


**Derive Planned and Actual Arm related variables**

In [None]:
randno <- enrlment %>%
 filter(!is.na(randno) & randno!="") %>%
 select(study, pt, randno)

rand01 <- rand %>%
 mutate(
 armcd = tx_cd,
 arm = ifelse(armcd == "ACTIVE", "Active", ifelse(armcd == "PBO", "Placebo", NA_character_))
 ) %>%
 select(armcd, arm, randno=rand_id)

armcd <- randno %>%
 left_join(rand01, by = "randno")

In [None]:
randno

study,pt,randno
<chr>,<chr>,<chr>
CSG001,1003,514876
CSG001,1004,101415
CSG001,1005,306185
CSG001,1006,987435
CSG001,1007,98745
CSG001,1008,123098


In [None]:
rand01

armcd,arm,randno
<chr>,<chr>,<chr>
PBO,Placebo,514876
ACTIVE,Active,101415
ACTIVE,Active,306185
PBO,Placebo,987435
PBO,Placebo,98745
ACTIVE,Active,123098


In [None]:
armcd

study,pt,randno,armcd,arm
<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,1003,514876,PBO,Placebo
CSG001,1004,101415,ACTIVE,Active
CSG001,1005,306185,ACTIVE,Active
CSG001,1006,987435,PBO,Placebo
CSG001,1007,98745,PBO,Placebo
CSG001,1008,123098,ACTIVE,Active


**Derive actual related variable**

In [None]:
actarmcd01 <- rfxstdtc

# Create 'box01' data frame
box01 <- box %>%
 mutate(
 ipboxid = kitid,
 actarmcd = case_when(
 content == "ACTIVE" ~ "ACTIVE",
 content == "PBO" ~ "PBO",
 TRUE ~ NA_character_
 ),
 actarm = case_when(
 content == "ACTIVE" ~ "Active",
 content == "PBO" ~ "Placebo",
 TRUE ~ NA_character_
 )
 )

# Merge 'actarmcd01' and 'box01' data frames by 'ipboxid'
actarmcd <- left_join(actarmcd01, box01, by = "ipboxid") %>%
 filter(!is.na(actarmcd)) %>%
 select(study, pt, actarmcd, actarm)

In [None]:
actarmcd

study,pt,actarmcd,actarm
<chr>,<chr>,<chr>,<chr>
CSG001,1004,ACTIVE,Active
CSG001,1005,PBO,Placebo
CSG001,1006,PBO,Placebo
CSG001,1007,PBO,Placebo
CSG001,1008,ACTIVE,Active


**RFPENDTC;**

In [None]:
# Combine the raw date variables into 'alldates01' data frame
alldates01 <- bind_rows(
 adverse %>% select(study, pt, date = aestdt_raw),
 adverse %>% select(study, pt, date = aeendt_raw),
 adverse %>% select(study, pt, date = hadmtdt_raw),
 adverse %>% select(study, pt, date = hdsdt_raw),
 conmeds %>% select(study, pt, date = cmstdt_raw),
 conmeds %>% select(study, pt, date = cmendt_raw),
 ecg %>% select(study, pt, date = egdt_raw),
 enrlment %>% select(study, pt, date = icdt_raw),
 enrlment %>% select(study, pt, date = enrldt_raw),
 enrlment %>% select(study, pt, date = randdt_raw),
 eos %>% select(study, pt, date = eostdt_raw),
 eoip %>% select(study, pt, date = eostdt_raw),
 eq5d3l %>% select(study, pt, date = dt_raw),
 hosp %>% select(study, pt, date = stdt_raw),
 hosp %>% select(study, pt, date = endt_raw),
 ipadmin %>% select(study, pt, date = ipstdt_raw),
 lab_chem %>% select(study, pt, date = lbdt_raw),
 lab_hema %>% select(study, pt, date = lbdt_raw),
 physmeas %>% select(study, pt, date = pmdt_raw),
 surg %>% select(study, pt, date = surgdt_raw),
 vitals %>% select(study, pt, date = vsdt_raw)
)

In [None]:
alldates01

study,pt,date
<chr>,<chr>,<chr>
CSG001,1001,01/JAN/2010
CSG001,1003,05/JAN/2010
CSG001,1004,01/JAN/2010
CSG001,1004,03/JAN/2010
CSG001,1004,08/JAN/2010
CSG001,1004,10/JAN/2010
CSG001,1005,18/FEB/2010
CSG001,1006,UN/MAR/2010
CSG001,1007,9/MAY/2010
CSG001,1001,01/JAN/2010


In [None]:
# Process the date variables to create date in ISO format sprintf("in 'alldates02' data frame
alldates02 <- alldates01 %>%
 mutate(
     dayn = suppressWarnings(as.numeric(word(date, 1, sep='/'))),
     daync = sprintf("%02d", dayn),
     day =suppressWarnings(as.numeric(word(date, 1, sep='/'))),

     monthc = toupper(word(date, 2, sep='/')),
     month = case_when(
         monthc == "JAN" ~ "01",
         monthc == "FEB" ~ "02",
         monthc == "MAR" ~ "03",
         monthc == "APR" ~ "04",
         monthc == "MAY" ~ "05",
         monthc == "JUN" ~ "06",
         monthc == "JUL" ~ "07",
         monthc == "AUG" ~ "08",
         monthc == "SEP" ~ "09",
         monthc == "OCT" ~ "10",
         monthc == "NOV" ~ "11",
         monthc == "DEC" ~ "12",
         TRUE ~ "-"
     ),

 year = word(date,3,sep='/'),
 year = if_else(toupper(year) == "UNK", "-", year),

 # datec = str_c(year, month, day, sep = "-"),
 datec = str_c(year, month, daync, sep = "-"),

 datec = ifelse(str_sub(datec, -5) == "-----", str_sub(datec, end = -6), datec),
 datec = ifelse(str_sub(datec, -4) == "----", str_sub(datec, end = -5), datec),
 datec = ifelse(str_sub(datec, -2) == "--", str_sub(datec, end = -3), datec)

 )


alldates03 <-na.omit(alldates02)


# Pick the latest non-missing date for each subject
rfpendtc <- alldates03 %>%
 filter(!is.na(datec) & datec != "") %>%
 arrange(study, pt, datec) %>%
 group_by(study, pt) %>%
 slice(n()) %>%
 ungroup() %>%
 select(study, pt, rfpendtc = datec)

In [None]:
alldates03

study,pt,date,dayn,daync,day,monthc,month,year,datec
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>
CSG001,1001,01/JAN/2010,1,01,1,JAN,01,2010,2010-01-01
CSG001,1003,05/JAN/2010,5,05,5,JAN,01,2010,2010-01-05
CSG001,1004,01/JAN/2010,1,01,1,JAN,01,2010,2010-01-01
CSG001,1004,03/JAN/2010,3,03,3,JAN,01,2010,2010-01-03
CSG001,1004,08/JAN/2010,8,08,8,JAN,01,2010,2010-01-08
CSG001,1004,10/JAN/2010,10,10,10,JAN,01,2010,2010-01-10
CSG001,1005,18/FEB/2010,18,18,18,FEB,02,2010,2010-02-18
CSG001,1007,9/MAY/2010,9,09,9,MAY,05,2010,2010-05-09
CSG001,1001,01/JAN/2010,1,01,1,JAN,01,2010,2010-01-01
CSG001,1003,05/JAN/2010,5,05,5,JAN,01,2010,2010-01-05


In [None]:
rfpendtc

study,pt,rfpendtc
<chr>,<chr>,<chr>
CSG001,1001,2010-01-01
CSG001,1002,2010-01-05
CSG001,1003,2010-01-05
CSG001,1004,2010-02-28
CSG001,1005,2020-02-20
CSG001,1006,2010-03-25
CSG001,1007,2010-06-12
CSG001,1008,2010-08-18


**Merge all datasets together**

In [None]:
dm02 <- dm01 %>%
 left_join(rficdtc, by = c("study", "pt")) %>%
 left_join(dthdtc, by = c("study", "pt")) %>%
 left_join(rfendtc, by = c("study", "pt")) %>%
 left_join(rfxstdtc, by = c("study", "pt")) %>%
 left_join(rfxendtc, by = c("study", "pt")) %>%
 left_join(armcd, by = c("study", "pt")) %>%
 left_join(actarmcd, by = c("study", "pt")) %>%
 left_join(rfpendtc, by = c("study", "pt"))

**Derive additional variables which are dependent on other derived variables**

In [None]:
dm03 <- dm02 %>%
 mutate(
 rfstdtc = substr(rfxstdtc, 1, 10),
 rfstdtc = ifelse(is.na(rfstdtc) & !is.na(randdtc), randdtc, rfstdtc),
 rfstdtc = ifelse(is.na(rfstdtc) & !is.na(rficdtc), rficdtc, rfstdtc),

 armcd = case_when(
 is.na(enrldtc) ~ "SCRNFAIL",
 is.na(randdtc) ~ "NOTASSGN",
 TRUE ~ armcd
 ),

 arm = case_when(
 armcd =="SCRNFAIL" ~ "Screen Failure",
 armcd == "NOTASSGN" ~ "Not Assigned",
 TRUE ~ arm),

 actarmcd = case_when(
 is.na(enrldtc) ~ "SCRNFAIL",
 is.na(randdtc) ~ "NOTASSGN",
 is.na(rfxstdtc) ~ "NOTTRT",
 TRUE ~ actarmcd
 ),

 actarm = case_when(
 actarmcd =="SCRNFAIL" ~ "Screen Failure",
 actarmcd == "NOTASSGN" ~ "Not Assigned",
 actarmcd == "NOTTRT" ~ "Not Treated",
 TRUE ~ actarm),

 ) %>%
 rename_all(toupper)

# Write attributes and keep only required variables and in the required order
varlist <- c(
 'STUDYID', 'DOMAIN', 'USUBJID', 'SUBJID', 'RFSTDTC', 'RFENDTC', 'RFXSTDTC', 'RFXENDTC',
 'RFICDTC', 'RFPENDTC', 'DTHDTC', 'DTHFL', 'SITEID', 'AGE', 'AGEU', 'SEX', 'RACE', 'ETHNIC',
 'ARMCD', 'ARM', 'ACTARMCD', 'ACTARM', 'COUNTRY', 'RACE'
)

dm <- dm03 %>%
 select(all_of(varlist))

output <- dm


In [None]:
View(dm)

STUDYID,DOMAIN,USUBJID,SUBJID,RFSTDTC,RFENDTC,RFXSTDTC,RFXENDTC,RFICDTC,RFPENDTC,⋯,AGE,AGEU,SEX,RACE,ETHNIC,ARMCD,ARM,ACTARMCD,ACTARM,COUNTRY
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
CSG001,DM,CSG001-1001,1001,2010-01-01,,,,2010-01-01,2010-01-01,⋯,35,YEARS,M,WHITE,HISPANIC OR LATINO,SCRNFAIL,Screen Failure,SCRNFAIL,Screen Failure,USA
CSG001,DM,CSG001-1002,1002,2010-01-01,2010-01-05,,,2010-01-01,2010-01-05,⋯,40,YEARS,F,MULTIPLE,NOT HISPANIC OR LATINO,NOTASSGN,Not Assigned,NOTASSGN,Not Assigned,USA
CSG001,DM,CSG001-1003,1003,2010-01-03,2010-01-05,,,2010-01-01,2010-01-05,⋯,40,YEARS,M,OTHER,HISPANIC OR LATINO,PBO,Placebo,NOTTRT,Not Treated,USA
CSG001,DM,CSG001-1004,1004,2010-01-05,2010-02-28,2010-01-05T08:35,2010-01-25T08:45,2010-01-01,2010-02-28,⋯,38,YEARS,M,WHITE,HISPANIC OR LATINO,ACTIVE,Active,ACTIVE,Active,USA
CSG001,DM,CSG001-1005,1005,2010-02-05,,2010-02-05T08:46,2010-02-12T08:30,2010-01-15,2020-02-20,⋯,64,YEARS,M,AMERICAN INDIAN OR ALASKA NATIVE,NOT HISPANIC OR LATINO,ACTIVE,Active,PBO,Placebo,USA
CSG001,DM,CSG001-1006,1006,2010-03-02,2010-03-25,2010-03-02T08:30,2010-03-10T08:30,2010-02-18,2010-03-25,⋯,75,YEARS,F,NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER,NOT HISPANIC OR LATINO,PBO,Placebo,PBO,Placebo,USA
CSG001,DM,CSG001-1007,1007,2010-04-15,2010-06-12,2010-04-15T08:23,2010-05-06T08:12,2010-04-04,2010-06-12,⋯,32,YEARS,M,UNKNOWN,NOT HISPANIC OR LATINO,PBO,Placebo,PBO,Placebo,USA
CSG001,DM,CSG001-1008,1008,2010-06-27,2010-08-18,2010-06-27T08:45,2010-07-11T09:20,2010-06-20,2010-08-18,⋯,83,YEARS,F,NOT REPORTED,NOT HISPANIC OR LATINO,ACTIVE,Active,ACTIVE,Active,USA


In [None]:
save(dm, file = "C:/Users/Waraba/Desktop/Ckinical Trial Training Materiels/SDTM_withR/SDTM_EXAMPLE/SDTM_FINAL_DATA/dm.RData")

In [None]:
write.csv(dm, "C:/Users/Waraba/Desktop/Ckinical Trial Training Materiels/SDTM_withR/SDTM_EXAMPLE/SDTM_FINAL_DATA/dm.csv", row.names = FALSE)

In [None]:
write_xpt(dm, "dm.xpt")

In [None]:
colnames(dm)

## 📬 Contacts

Your comments and questions are valued and encouraged. Please feel free to contact the author:

**Alpha Traore**  
Sr Data Scientist  
Biostat Consultants Inc.  
312 Ridgewood Pl  
Fort Thomas, KY 41075  
📧 alpha.s.traore@wmich.edu  
🔗 [LinkedIn Profile] (https://www.linkedin.com/in/alphatraore)