# Lecture 3-2: SAS Help and Documentation, Assignment Statement and Numeric Functions

## July 11th, Tuesday




# SAS Help and Documentation

- Almost any information you could possibly need can be found in **SAS Help and Documentation** -- it just sometimes takes some patience to find it.
- Two ways to open the SAS Help and Documentation:
  - Help > SAS Help and Documentation (or the dictionary icon)
  - <http://support.sas.com/documentation/94/index.html>
	(or simply google “SAS documentation” to find this webpage)
- We can find all categories of informats in the SAS documentation

# Local SAS Documentation

## Find informats in local SAS documentation:

- Type key words 'definition of informats' in 'Quick Search', press [Enter]
- Choose “Definitions of informats”
- In the “Contents” tab, click “Dictionary of informats”
- Click informats by categories

# Online SAS Documentation

## Find informats in online SAS documentation:

- Search 'definition of informats'
- Choose 'Definitions of informats'
- In the 'Contents' tab, click 'Dictionary of informats'
- Click informats by categories

### Let’s find out what informats should be used for such data values 07/09/15 and 2,036.


# SAS System Options

- **SAS System options** are parameters that affect how SAS works, how SAS formats your output, how SAS uses memory, how SAS handles errors, and so on

- You can place an OPTION statement anywhere in a SAS program to change the settings from that point onward

In [1]:
OPTION nocenter;

DATA TestData;
input Subject Gender $ Height Weight;
DATALINES;
1024 M 68.5 155
1167 F 61.2 99 
1168 F 63.0 115
1201 M 70.0 205
1302 M . 170
;
PROC PRINT data = TestData;
RUN;

Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


In [2]:
OPTION center;

DATA TestData;
input Subject Gender $ Height Weight;
DATALINES;
1024 M 68.5 155
1167 F 61.2 99 
1168 F 63.0 115
1201 M 70.0 205
1302 M . 170
;
PROC PRINT data = TestData;
RUN;

Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


# Some Common Options

- **CENTER/NOCENTER**: CENTER, which is the default, centers the text in your output

- **DATE/NODATE**: DATE, which is the default, prints the current date and time on your output, while NODATE does not

- **NUMBER/NONUMBER**: NUMBER, which is the default, prints an accumulative page number on each page of your output

- **ls =  n**: linesize, where  n  can be any number between 64 and 256, specifies the width of the print line for your procedure output and log

- **ps =  n**: pagesize where  n  can be any number between 15 and 32767, specifies how many lines each page of output contains

- **PAGENO =  n**: where  n  is the page number at which you want SAS to start numbering your output pages. If you don't specify the PAGENO = n, your output is numbered sequentially throughout your SAS session, starting with page 1

# Assignment Statement

- Assign a value to a variable or modify the value of a variable

- syntax: variable = expression;

- If the variable does not exist, it will be created automatically

# Example Dataset

In [3]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 p1 f1;
RUN;

Obs,name,e1,e2,e3,e4,p1,f1
1,Alexander Smith,78,82,86,69,97,80
2,John Simon,88,72,86,.,100,85
3,Patricia Jones,98,92,92,99,99,93
4,Jack Benedict,54,63,71,49,82,69
5,Rene Porter,100,62,88,74,98,92


### VAR statement: print part of the dataset that is specified

# Assignment Statement Basics

In [4]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* add up each students four exam scores and store it in examtotal;
	examtotal = e1 + e2 + e3 + e4;
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 examtotal;
RUN;

Obs,name,e1,e2,e3,e4,examtotal
1,Alexander Smith,78,82,86,69,315
2,John Simon,88,72,86,.,.
3,Patricia Jones,98,92,92,99,381
4,Jack Benedict,54,63,71,49,237
5,Rene Porter,100,62,88,74,324


In [5]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	e2 = e2 + 8;  
    * add 8 to each student's second exam score (e2);
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 p1 f1;
RUN;


Obs,name,e1,e2,e3,e4,p1,f1
1,Alexander Smith,78,90,86,69,97,80
2,John Simon,88,80,86,.,100,85
3,Patricia Jones,98,100,92,99,99,93
4,Jack Benedict,54,71,71,49,82,69
5,Rene Porter,100,70,88,74,98,92


# Basic Arithmetic Calculations

- addition: a = b + c;	add b and c
- subtraction: a = b - c;	subtract c from b
- multiplication: a = b * c;	multiply b and c
- division: a = b / c;	divide b by c
- exponentiation: a = b \*\* c;	raise b to the power of c
- negative prefix: a = -b;	take the negative of b
- They can be combined together and they perform as they are for any mathematical expressions

In [6]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	final = 0.6*(e1+e2+e3+e4)/4 + 0.2*p1 + 0.2*f1;
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 p1 f1 final;
RUN;

Obs,name,e1,e2,e3,e4,p1,f1,final
1,Alexander Smith,78,82,86,69,97,80,82.65
2,John Simon,88,72,86,.,100,85,.
3,Patricia Jones,98,92,92,99,99,93,95.55
4,Jack Benedict,54,63,71,49,82,69,65.75
5,Rene Porter,100,62,88,74,98,92,86.60


# Numeric Functions

- A SAS function is a pre-programmed routine that returns a value computed from one or more arguments
- The standard form of any SAS function is:
  functionname (argument1, argument2, ...);

In [7]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* calculate the average by definition;
	avg1 = (e1+e2+e3+e4)/4;   
	* calculate the average using the mean function;
	avg2 = mean(e1,e2,e3,e4); 
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 avg1 avg2;
RUN;


Obs,name,e1,e2,e3,e4,avg1,avg2
1,Alexander Smith,78,82,86,69,78.75,78.75
2,John Simon,88,72,86,.,.,82.0
3,Patricia Jones,98,92,92,99,95.25,95.25
4,Jack Benedict,54,63,71,49,59.25,59.25
5,Rene Porter,100,62,88,74,81.00,81.0


# Are avg1 and avg2 the same?

- The mean function omits the missing values and calculate the mean of the rest
- In different situations, missing values may be handled in different ways. There is no conclusive statement about how missing values are handled
- You would be wise to always **check your calculations out on a few representative observations** to make sure that your SAS program is doing exactly as you intended


# Numeric Functions

- You can refer to SAS Help and Documentation (under "SAS Functions and CALL Routines by Category") for a full accounting of the built-in numeric functions that are available in SAS

- Common Functions
    - INT: the integer portion of a numeric value	a = int(x);
    - ABS: the absolute value of the argument	a = abs(x);
    - SQRT: the square root of the argument	a = sqrt(x);
    - MIN: the minimum value of the arguments	a = min(x, y, z);
    - MAX: the maximum value of the arguments	a = max(x, y, z);
    - SUM: the sum of the arguments	a = sum(x, y, z);
    - MEAN: the mean of the arguments	a = mean(x, y, z);
    - ROUND: round the argument to the specified unit	a = round(x, 1);
    - LOG: the log (base e) of the argument	a = log(x);
    - LAG: the value of the argument in the previous observation	a = lag(x);
    - DIF: the difference between the values of the argument in the current and previous observations	a = dif(x);
    - N: the number of non-missing values of the argument	a = n(x);
    - NMISS: the number of missing values of the argument	a = nmiss(x);


In [8]:
DATA grades;
	input name $ 1-15 phone e1 e2 e3 e4 p1 f1;
	areacode = int(phone/10000000);
	DATALINES;
Alexander Smith 8145551212  78 82 86 69  97 80
John Simon      8145562314  88 72 86  . 100 85
Patricia Jones  7175559999  98 92 92 99  99 93
Jack Benedict   5705551111  54 63 71 49  82 69
Rene Porter     8145542323 100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name phone areacode;
RUN;

Obs,name,phone,areacode
1,Alexander Smith,8145551212,814
2,John Simon,8145562314,814
3,Patricia Jones,7175559999,717
4,Jack Benedict,5705551111,570
5,Rene Porter,8145542323,814


In [9]:
DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	*calculate the average using the mean function and then round it;
	avg = round(mean(e1,e2,e3,e4),0.1);
	DATALINES;
Alexander Smith   78 82 86 69  97 80
John Simon        88 72 86  . 100 85
Patricia Jones    98 92 92 99  99 93
Jack Benedict     54 63 71 49  82 69
Rene Porter      100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 avg;
RUN;

Obs,name,e1,e2,e3,e4,avg
1,Alexander Smith,78,82,86,69,78.8
2,John Simon,88,72,86,.,82.0
3,Patricia Jones,98,92,92,99,95.3
4,Jack Benedict,54,63,71,49,59.3
5,Rene Porter,100,62,88,74,81.0


# Converting Data

- If you try to do some operations on a character variable that should only be done to a numeric variable, SAS automatically will first convert the character variable into a numeric variable
- However, this lazy approach doesn't always work 


In [10]:
DATA grades;
	input name $ 1-15 e1 $ e2 $ e3 $ e4 $ standtest $;
	avg = round(mean(e1,e2,e3,e4),1); 
	std = standtest/4;
	DATALINES;
Alexander Smith   78 82 86 69   1,210
John Simon        88 72 86  .     990
Patricia Jones    98 92 92 99   1,010
Jack Benedict     54 63 71 49     875
Rene Porter      100 62 88 74   1,180
;
RUN;

PROC PRINT data = grades;
RUN;

Obs,name,e1,e2,e3,e4,standtest,avg,std
1,Alexander Smith,78,82,86,69.0,1210,79,.
2,John Simon,88,72,86,,990,82,247.50
3,Patricia Jones,98,92,92,99.0,1010,95,.
4,Jack Benedict,54,63,71,49.0,875,59,218.75
5,Rene Porter,100,62,88,74.0,1180,81,.


# Converting Data: Input function

- input(source, informat.);
- It returns the converted data from a character source value based on the specified informat

In [11]:
DATA grades;
	input name $ 1-15 e1 $ e2 $ e3 $ e4 $ standtest $;
	std = input(standtest,comma5.)/4;
	DATALINES;
Alexander Smith   78 82 86 69   1,210
John Simon        88 72 86  .     990
Patricia Jones    98 92 92 99   1,010
Jack Benedict     54 63 71 49     875
Rene Porter      100 62 88 74   1,180
;
RUN;

PROC PRINT data = grades;
   var name standtest std;
RUN;

Obs,name,standtest,std
1,Alexander Smith,1210,302.5
2,John Simon,990,247.5
3,Patricia Jones,1010,252.5
4,Jack Benedict,875,218.75
5,Rene Porter,1180,295.0
