# Chapter 10 - Creating and Managing Variables

## Table of Contents:

1. [Objectives](#objectives)
2. [Creating and Modifying Variables](#createmodvars)
3. [Assigning Values Conditionally](#assigncond)
4. [Specifying Lengths for Variables](#lengthvars)
5. [Subsetting Data](#subsetdata)
6. [Assigning Permanent Labels and Formats](#permlabelsformats)
7. [Assigning Values Conditionally Using SELECT Groups](#selectgroups)
8. [Grouping Statements Using DO Groups](#dogroups)

## Objectives: <a class="anchor" id="objectives"></a>

* create variables that accumulate variable values
* initialize retained variables
* assign values to variables conditionally
* specify an alternative action when a condition is false
* specify lengths for variables
* delete unwanted observations
* select variables
* assign permanent labels and formats

## Creating and Modifying Variables <a class="anchor" id="createmodvars"></a>

Variables are created or modified in the DATA step. Different variables can be combined to create new variables using arithmetic or logical operators. Remember that all expressions are subject to PEMDAS rules. Shorthand methods do not require a variable initialization and will not be assigned as missing. For example, NEW_VAR is the difference between VAR1 and VAR2:

    new_var = var1 - var2;
    
This new variable will be added to the PDV before each line is processed.

To set variables to be initial values in the PDV instead of missing, use the RETAIN statement. The RETAIN statement keeps the values of the variables in the PDV from the previous iteration instead of setting them all to missing. Variables can also be assigned initial values using the RETAIN statement. For example, maybe you want to set a timing variable to start at 5 minutes instead of 0 minutes:

    retain time 5;

The RETAIN statement is a compile-time only statement that automatically creates variables if they don't already exist. If no value is supplied in the statement, the variables in PDV are set to be missing for only \_N\_ = 1.

## Assigning Values Conditionally <a class="anchor" id="assigncond"></a>

One way of making conditional statements in SAS is through the IF/THEN statement:

    if <expression1> then <outcome1>;
    else if <expression2> then <outcome2>;
    else <outcome3>;

Logical operators and comparison operators can be used to create expressions.

Some general rules about IF statements:
* IF/ELSE IF/ELSE are a cluster of IF statements that do not affect other clusters of IF statements
* a cluster of IF statements will execute the first expression that returns TRUE and exit out of that cluster

For example:

    * IF-cluster 1;
    if x > 10 then put 'TRUE';
    else if x < 10 and x > 5 then put 'MAYBE';
    else put 'FALSE';
    
    * IF-cluster 2;
    if x >= 7 and x < 9 then put 'MOST LIKELY';
    
Both IF-cluster 1 and IF-cluster 2 will execute. However, only 1 of the options in IF-cluster 1 will execute.

## Specifying Lengths for Variables <a class="anchor" id="lengthvars"></a>

New variables are allocated as much space as required by the first observation of that particular variable that SAS encounters. To assign a specific length for a variable, use the LENGTH statement:

    data <output dataset>;
        set <source dataset>;
        length <variable> <$ if char variable> <length>;
        ...;
    run;
    
For example:

In [2]:
data class;
    length sex_enum $ 7;
    set sashelp.class;
    if sex = 'M' then sex_enum = 'Male';
    else if sex = 'F' then sex_enum = 'Female';
    else sex_num = ' ';
    label sex_enum = 'Sex';
run;

proc print data = class (obs = 5) label;
    var name sex_enum;
run;

Obs,Name,Sex
1,Alfred,Male
2,Alice,Female
3,Barbara,Female
4,Carol,Female
5,Henry,Male


For character variables, if no length is specified, the variable is assigned a length equal to that of the first observation's. For example:

In [1]:
data test;
    charvar = "Charvar's length is 22";
run;

proc contents data = test;
run;

0,1,2,3
Data Set Name,WORK.TEST,Observations,1
Member Type,DATA,Variables,1
Engine,V9,Indexes,0
Created,06/14/2017 15:54:35,Observation Length,22
Last Modified,06/14/2017 15:54:35,Deleted Observations,0
Protection,,Compressed,NO
Data Set Type,,Sorted,NO
Label,,,
Data Representation,"SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64",,
Encoding,utf-8 Unicode (UTF-8),,

Engine/Host Dependent Information,Engine/Host Dependent Information.1
Data Set Page Size,65536
Number of Data Set Pages,1
First Data Page,1
Max Obs per Page,2960
Obs in First Data Page,1
Number of Data Set Repairs,0
Filename,/tmp/SAS_work32CD000026E4_localhost.localdomain/test.sas7bdat
Release Created,9.0401M4
Host Created,Linux
Inode Number,148036

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len
1,charvar,Char,22


## Subsetting Data <a class="anchor" id="subsetdata"></a>

To conditionally delete observations, add a DELETE keyword to the IF/THEN statement:

In [4]:
data class;
    set sashelp.class;
    if age < 14 then delete;
run;

proc print data = class label;
run;

Obs,Name,Sex,Age,Height,Weight
1,Alfred,M,14,69.0,112.5
2,Carol,F,14,62.8,102.5
3,Henry,M,14,63.5,102.5
4,Janet,F,15,62.5,112.5
5,Judy,F,14,64.3,90.0
6,Mary,F,15,66.5,112.0
7,Philip,M,16,72.0,150.0
8,Ronald,M,15,67.0,133.0
9,William,M,15,66.5,112.0


To select certain variables to appear in the output dataset, use either the KEEP or the DROP statement. The KEEP statement prevents the variables listed from being discarded. The DROP statement removes the variables listed from appearing in the output dataset. Usage is shown below:

    data <output dataset>;
        set <sourcedataset>;
        keep <variables to keep>;
    run;

If the number of variables to keep is significantly fewer than the number of extraneous variables, use the KEEP statement. If the number of variables to discard is small, use the DROP statement.

## Assigning Permanent Labels and Formats <a class="anchor" id="permlabelsformats"></a>

To assign a permanent label or format to a dataset variable, follow similar steps outlined in Chapter 04. Use the FORMAT or LABEL statements:

    data <output dataset>;
        set <sourcedataset>;
        label var1 = 'label1' var2 = 'label2' ... varn = 'labeln';
        format var1 <$ if char format> format1. ...;
    run;

## Assigning Values Conditionally Using SELECT Groups <a class="anchor" id="selectgroups"></a>

A SELECT group is useful for selecting TRUE/FALSE cases where one variable is equal to particular values. Its syntax is as follows:

    select(expression)
        when(case1) <output1>;
        when(case2) <outputn>;
        ...
        when(casen) <outputn>;
        otherwise <output>;
    end;
    
For example:

    select (a);  
       when (1) x=x*10;  
       when (3,4,5) x=x*100;  
       otherwise x=x; 
    end 

Just with the IF/ELSE IF/ELSE statements, the first case to return TRUE is output and the SELECT cluster exits.

## Grouping Statements Using DO Groups <a class="anchor" id="dogroups"></a>

A DO loop is a FOR loop that repeats the contents of the loop until a certain condition is met. Its syntax is like this:

    do <condition>;
        <SAS statements>
        ...
    end;

If no condition is inputed, then SAS runs all the contents of the DO loop once. This is useful for having many action statements within a IF/THEN or SELECT statement:

    if <expression> then
        do;
            <outcome1>;
            <outcome2>;
            ...
        end;
        
There are 3 kinds of DO loops:
* DO loop
* DO UNTIL loop - executes statments until a condition is true, checking the condition **after** each iteration of the loop
* DO WHILE loop - executes statements while a condition is true, checking the condition **before** each iteration of the loop