Skip to content

This SAS macro displays the structure of a SAS dataset. All variables will be parsed regarding type, real length and display format (including no. of decimal places)

Notifications You must be signed in to change notification settings

clindocu/sas-macros-r-str2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

sas-macros-r-str2

%str2()

The SAS macro %str2() displays the structure of a SAS dataset (in the SAS log).

The macro is similar to R function utils:::str(object) that compactly displays the structure of an R object.

All variables in the SAS dataset will be parsed regarding type, real length and display format (including no. of decimal places)

Example

%include "gather2.sas";

%str2(ADSL);

NOTE: the %gather2() macro will be invoked within the macro %str2() and must be available.

Result (SAS Log) - Example

'dataframe':   180 obs. of   24 variables:
 $ STUDYID : chr ALEA01 ALEA01 ALEA01 ALEA01 ALEA01 ALEA01 ALEA01 ALEA01 ALEA01...
           : SAS Format: $7. (real length: 6)
           : Label: "Study Identifier"
 $ USUBJID : chr ALEA01-01-001 ALEA01-01-002 ALEA01-01-003 ALEA01-01-004 ALEA01...
           : SAS Format: $14. (real length: 13)
           : Label: "Unique Subject Identifier"
 $ SUBJID  : chr 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 01...
           : SAS Format: $3. (real length: 3)
           : Label: "Subject Identifier for the Study"
 $ SITEID  : chr 01 01 01 01 01 01 01 01 01 01 01 02 02 02 02 02 02 02 02 02 02...
           : SAS Format: $2. (real length: 2)
           : Label: "Study Site Identifier"
 $ AGE     : int 35 85 63 46 42 79 49 51 37 55 44 88 60 60 48 74 37 28 49 67 59...
           : SAS Format: 8. ("integer:", length: 2)
           : Label: "Age"
 $ AGEU    : chr YEARS YEARS YEARS YEARS YEARS YEARS YEARS YEARS YEARS YEARS YE...
           : SAS Format: $5. (real length: 5)
           : Label: "Age Units"
 $ AGEGR1  : chr <65 >=65 <65 <65 <65 >=65 <65 <65 <65 <65 <65 >=65 <65 <65 <65...
           : SAS Format: $4. (real length: 4)
           : Label: "Pooled Age Group 1"
 $ AGEGR1N : int 1 2 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1...
           : SAS Format: 8. ("integer:", length: 1)
           : Label: "Pooled Age Group 1 (N)"
 $ SEX     : Factor w/ 2 levels "Female", "Male" 2 1 2 NA 2 1 2 1 1 2 2 2 1...
           : SAS Format: $SEX.
           : Label: "Sex"
 $ SEXN    : Factor w/ 2 levels "Female", "Male" 2 1 2 NA 2 1 2 1 1 2 2 2 1...
           : SAS Format: SEXN.
           : Label: "Sex (N)"
 $ RACE    : Factor w/ 4 levels "White", "Black or African... 1 1 1 1 1 1 1 1 1 1 1 1 1...
           : SAS Format: $RACE.
           : Label: "Race"
 $ RACEN   : Factor w/ 4 levels "White", "Black or African... 1 1 1 1 1 1 1 1 1 1 1 1 1...
           : SAS Format: RACEN.
           : Label: "Race (N)"
 $ WEIGHTBL: num 50.8 63.9 63.4 83.5 116.3 58.3 87.2 54.1 66.3 54.8 74.3 77.8 5...
           : SAS Format: BEST12. ("float:", display format: 5.1)
           : Label: "Weight at Baseline (kg)"
 $ L10WTBL : num 1.705864 1.805501 1.802089 1.921686 2.06558 1.765669 1.940516...
           : SAS Format: BEST8. ("float:", display format: 8.6)
           : Label: "Log10(Weight (kg))"
 $ HEIGHT  : int 166 166 172 173 184 162 170 160 177 172 169 175 159 172 159 17...
           : SAS Format: 8. ("integer:", length: 3)
           : Label: "Height (cm)"
 $ BMI     : num 18.435186529 23.189142111 21.430502975 27.899361823 34.3513705...
           : SAS Format: BEST12. ("float:", display format: 12.9)
           : Label: "Body Mass Index (kg/m2)"

There are following features:

  1. Numeric variables will be parsed to identify the "real" data type, i.e. integer or float. The display format will be identified, incl. no. of decimal places - in addition to the assigned numeric SAS format.

  2. Character variables: the "real" length based on the actual values will be identified (NOTE: also leading blanks will be counted).

  3. The number of missing/blank values will be identified

  4. Data definition tables (with format decodes) will be generated

Data Definition Tables: DDT_FINAL, DDT_FINAL_SHORT, DDT_STR

NOTE: character variables will not be converted to factors, equivalent to R option stringsAsFactors = FALSE

Limitations

  • Character variables in database may not be longer than <= 400 characters (performance issue; can be changed, see below)
  • Some variables/formats will not be parsed regarding number of decimal places (e.g., DOLLAR., COMMA., COMMAX., NUMX., PERCENT.)
  • Number of decimal places: also zeros at the end of a number will be counted (e.g., the value 1.00 will be identified as a float number with 2 decimal places)
  • Format code, format level and decode will be truncated after 4.000 characters ($4000.); can be adjusted (performance issue)
  • Does not work correct in case of truncated formats, e.g., when $SEX. "F"=Female" is formatted as $SEX4. and the values will be truncated to "Fema" (4 characters)
  • SAS version 9.4 above required to execute %gather2()
  • Input dataset must not contain variables: _ColFormat _ColTyp _ColTypVar _UserDef _IsRealNum _SASFormat _IsMissing
  • Low performance: the whole dataset will be transposed from wide to long and all variable values will be parsed

NOTE: If there are character variables in the SAS dataset with a length > 400 char. then change twice within the Macro:

  • ValFormat=$400. to ValFormat=$[YourMaxlength].

Usage

%include "gather2.sas";

%str2(DbIn, DDTOUT=Y); * with output of data definition table in SAS output window;

%str2(DbIn);
DbIn:           Input dataset
DDTOUT:         If Y: Output of a Data Definition Table in the SAS output window: 
                With "real" data type, i.e. 'integer' or 'float', number of decimal places, 
                "real" length of character variables, no. of missing values, format decodes

SAS output files: DDT_FINAL, DDT_FINAL_SHORT, DDT_STR

%gather2()

A gather Macro with following features:

The SAS macro %gather2() transposes a dataset from fat to skinny (normalization) or from a wide to a long dataset.

The macro is similar to R function tidyr:::gather().

The macro identifies the "real" SAS format, generates a flag for missing values (removes "."), generates a flag for numeric formats (BEST. or w.d.) and outputs variable labels.

Usage

%gather2(DbIn, Key, Value, Exclude, DbOut, ValFormat=, WithFormats=N, WithLabels=N, SASFormats=N);

NOTE: Option WithFormats executes the SAS function fmtinfo() that requires SAS version 9.4 above!

DbIn:        Input dataset
Key:         Name of key variable in output (should not be a column name!)
Value:       Name of variable with values in output (should not be a column name!)
             NOTE: character variables: leading blanks will not be removed!
             NOTE: Missing "." will be removed (set to blank)
Exclude:     Is the -exclude variable (ID variable) - must be a column name
             (or more column names/primary keys, or empty)
DbOut:       Output dataset
ValFormat:   if Y, Character format of variable with values (default character $200.)
             not possibe for (numeric) dates (dates as character only)
             Numeric Format, e.g. 8. only possible if all values in dataset are numeric;
WithFormats: if Y, Output of the associated SAS Formats (e.g. char, num, time, date, datetime) in 
             Variables: _ColFormat, _ColFor2, _ColTyp, _ColTypVar (N, C)
             Please Note: SAS function fmtinfo() requires SAS 9.4 above!
WithLabels:  if Y, Output of the Variable Labels 
             Variable _ColLab
             (additional feature)
SASFormats:  if Y, Output of (real) SAS Formats, Identification of missing Values 
             Variables: _UserDef, _IsRealNum (i.e. BEST. or w.d Format), _SASFormat, _IsMissing 

Limitations

  • Input dataset must not contain Variables: _ColFormat _ColTyp _ColTypVar _UserDef _IsRealNum _SASFormat _IsMissing
  • Does not identify "special" missing values (e.g. .Z) as missing
  • SAS v. 9.4 above required

Example

%gather2(VS_WIDE, VSTESTCD, VSORRES, SUBJID VISITNUM, VS_LONG, ValFormat=$10., WithLabels=Y, SASFormats=Y);

About

This SAS macro displays the structure of a SAS dataset. All variables will be parsed regarding type, real length and display format (including no. of decimal places)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages