Skip to content

Documentation

Chris Swenson edited this page Nov 14, 2020 · 3 revisions

Here are some general notes about my reasoning for some of the processes within the macros. Feel free to comment below!

CheckLog

The CheckLog macro is documented separately because it is much more detailed than the other general macros.

CompCon

This macro compares the contents or formats of two data sets to see where they differ. The code uses the CONTENTS, SORT, and COMPARE procedures. It is very similar to the SortComp macro.

DupCheck

This macro does not use the SORT procedure. There are two reasons for this: First, the SORT procedure does not handle large data sets very well. Instead, I used the SQL procedure to count the records using the BY variables specified by the user. For any groups that have more than one record, there are duplicates.

Second, the SORT procedure can output duplicates into a separate data set using the DUPOUT= option, but the first record of the group still flows into the OUT= data set. This does not help when the user is trying to understand the reason for duplication. Instead, I identified those that are distinct and those that are not and split them from the main data set.

The SORT= option is optional because large data sets may take a long time to sort. This option adds to the SQL statements that output the single and duplicate records.

The EXPECT= option is available so the user can split observations into records with and without duplicates without generating a warning message.

The DupVar macro is used instead of copying the code in case changes need to be made to that process.

DupVar

This macro uses similar processing as the DupCheck macro using SQL to count records instead of sorting.

Macro Variable Manipulation

The following macro programs create macro variables for use later in the code, except for DelVars, which deletes macro variables.

  • ColumnVars
  • DelVars
  • IntoList
  • ObsMac
  • SetVars
  • TableVars
  • VarMac

Documentation Pages

OpenTable

I created this macro because I was tired of looking up the last table I just made or when I needed to find a table lost among thousands of tables in a very large library (e.g., 10,000+ tables).

The macro has the ability to read what the user has copied, so it can open the most recent copied data set as well.

When both the copied data set and the last data set exist, the macro asks the user which one to open. This works better than trying to guess which one the user meant when no argument is specified.

RandomSet

Mode 2 of this macro also does not use the SORT procedure, for the same reasons that the DupCheck macro does not. Instead, I randomized the observations to select and only selected those observations. This is much faster for large data sets.

SearchCode

This macro is based on the CheckLog macro.

The reason for this macro is simple: I was not able to search for SAS code using the Windows search capability because a) the search capability did not have the SAS files associated and b) network files aren't always indexed.

SortComp

This macro compares two data sets using the SORT and COMPARE procedures. The macro additionally outputs the outer joins of the two data sets when they differ.