# Chapter 13 - Transforming Data with SAS Functions

## Table of Contents:

1. [Objectives](#Objectives)
2. [Understanding SAS Functions](#Understanding-SAS-Functions)
3. [Converting Data with Functions](#Converting-Data-with-Functions)
4. [Manipulating SAS Date Values with Functions](#Manipulating-SAS-Date-Values-with-Functions)
5. [Modifying Character values with Functions](#Modifying-Character-values-with-Functions)
6. [Modifying Numeric Values with Functions](#Modifying-Numeric-Values-with-Functions)
7. [Nesting SAS Functions](#Nesting-SAS-Functions)

### Objectives

* convert char data to num data
* convert num data to char data
* create SAS date values
* extract month, year, and interval from a SAS date value
* perform calculations with date and datetime values and time intervals
* extract, edit, and search the values of char variables
* replace/remove all occurrences of a particular word within a char string

[(back to top)](#Table-of-Contents:)

### Understanding SAS Functions

SAS functions are pre-written routines that perform a variety of services. Functions return a specific value. The general form of a sas function is as follows:

    output = function(arg1, arg2, ...);
    
If the input is a variable array, add the OF operator before the argument, like so:

    output = function(of array{*});

If the output is a character string, the default length of the string depends on the function. To specify a specific length, use the `LENGTH` statement to define a maximum length for that variable at the beginning of the DATA step.

[(back to top)](#Table-of-Contents:)

### Converting Data with Functions

To convert data between char/num types, use the `INPUT`/`PUT` functions. INPUT chages character data to numeric data, whereas `PUT` changes numeric data to character data. In certain instances, SAS can automatically convert between types, but it is best not rely on automatic conversion. For good practices, explicitly define conversions.

The `INPUT` syntax is as follows:

    input(source, informat.);
    
PUT is similar:

    put(source, format.);
    
An example of the `PUT` statement in action is the following:

In [1]:
data class;
    length nameage $ 15;
    set sashelp.class;
    nameage = trim(name)||'/'||put(age, 2.);
    keep nameage;
run;

proc print data = class (obs = 5);
run;

Obs,nameage
1,Alfred/14
2,Alice/13
3,Barbara/13
4,Carol/14
5,Henry/14


If a num variable is automatically converted to a char variable, SAS writes the num value using the `BEST12.` format. The resulting char value is right aligned and padded so that the char character length is 12. For example:

    val1n = 56565656           * 8 digits;
    ... code that automatically converts num variable val1n to char variable val1c ...
    va1lc = ____56565656       * 12 characters;
    
To remove those blanks, `VAL1C` needs to be left aligned.

[(back to top)](#Table-of-Contents:)

### Manipulating SAS Date Values with Functions

SAS stores date values as integers with 0 being equal to January 1st, 1960 and time values as integers with 0 being equal to 12:00 AM. Date values are incremented by days and time values are incremented by seconds. SAS can then use these integer values to calculate a wide range of expressions from input dates. To convert a date or time from this integer value to a human-readable date, use the `FORMAT` statement and a proper format.

A list of date functions can be found in the SAS documentation and most of them will not be included here. Important date functions are:

<table>
    <tr>
        <td>**function name**</td>
        <td>**function syntax**</td>
        <td>**result**</td>
    </tr>
    <tr>
        <td>day</td>
        <td>`day=day(date);`</td>
        <td>day of month (1-31), 1 = Sunday and 7 = Saturday</td>
    </tr>
    <tr>
        <td>quarter</td>
        <td>`quarter=qtr(date);`</td>
        <td>quarter (1-4)</td>
    </tr>
    <tr>
        <td>weekday</td>
        <td>`wkday=weekday(date);`</td>
        <td>day of week (1-7)</td>
    </tr>
    <tr>
        <td>month</td>
        <td>`month=month(date);`</td>
        <td>month (1-12)</td>
    </tr>
    <tr>
        <td>year</td>
        <td>`yr=year(date);`</td>
        <td>year (4 digits)</td>
    </tr>
    <tr>
        <td>today's date</td>
        <td>`now=today();`</td>
        <td>today's date as a SAS date</td>
    </tr>
    <tr>
        <td>current time</td>
        <td>`curtime=time();`</td>
        <td>current time as a SAS time</td>
    </tr>
    <tr>
        <td>time period between dates</td>
        <td>`x=intck('interval',d1, d2);`</td>
        <td>time between 2 dates depending on what is specified, interval being day/weekday/week/month/qtr/year</td>
    </tr>
</table>

The `INTCK` function counts intervals from fixed interval beginnings, with partial intervals not being counted. Therefore, between December 31 2000 to January 01 2001:
* 0 weeks have passed
* 1 month has passed
* 1 year has passed

[(back to top)](#Table-of-Contents:)

### Modifying Character values with Functions

The `SCAN` function allows you to select a substring separated by delimiters from a char variable:

    x = scan(arg, n, <delimiters>);
    
where the `N` argument represnts the nth delimited value in the character variable. `SCAN` automatically assigns a length of 200 to the output value. To specify a length, use the `LENGTH` statement.

The `SUBSTR` function is similar, but returns the instance of the substring specified by the `POSITION` argument, not the first delimited instance:

    x = substr(arugment, position, <n>);
    
`N` here being the number of character to return. `SUBSTR` can also be placed on the left hand side of the equals sign to replace an instance of a substring with a new one:

    substr(arugment, position, <n>) = 'new_str';
    
If `SUBSTR` is used on a numeric variable as the argument, the function automatically changes the numeric format to `BEST12.` and then converts the type to characters. If the amount of digits in in the number is less than 12, then SAS pads leading blank spaces until the char string is of length 12.
    
The `TRIM` function removes trailing blanks for char variables:

    x = trim(argument);
    
The resulting variable retains the same length as the original argument.
    
The `CATX` function concatenates char strings, removes leading and trailing blanks, and inserts delimiters:

    x = catx(delimiter, arg1, arg2, ...);
    
The `INDEX` function searches for a substring in a specified string and returns the position of the substring's first character:

    x = index(source, substring);
    
Similarly, the `FIND` function also searches for a substring in a specified string and returns `TRUE` if found and `FALSE` if not found:

    x = find(string, substring, <modifiers>, <start position>);
    
For help on modifiers, look at the SAS documentation.

`UPCASE` and `LOWCASE` convert char string case respectively:

    x = UPCASE(string);
    x = LOWCASE(string);
    
`TRANWRD` replaces or removes all occurrences of a substring within a string:

    x = tranwrd(source, target, replacement);

[(back to top)](#Table-of-Contents:)

### Modifying Numeric Values with Functions

A useful one is `ROUND`. `ROUND` rounds a value to the nearest specified unit:

    x = round(argument, round-off-unit);
    
An example of rounding of decimal places to the nearest hundreth:

In [2]:
data class;
    set sashelp.classfit;
    predict = round(predict, .01);
    keep name predict;
run;

proc print data = class (obs = 10);
run;

Obs,Name,predict
1,Joyce,56.99
2,Louise,76.49
3,Alice,77.27
4,James,80.39
5,Thomas,81.17
6,John,87.02
7,Jane,90.14
8,Janet,100.66
9,Jeffrey,100.66
10,Carol,101.83


[(back to top)](#Table-of-Contents:)

### Nesting SAS Functions

Functions can be nested as arguments within each other. For example:

    MiddleInitial=substr(scan(name,3),1,1); 

As long as the output of a nested function meets the requirement of the argument of another function, then any number of functions can be nested in a chain.

[(back to top)](#Table-of-Contents:)