# Lecture 1: Introduction and Read Data into SAS

## June 29th, Thursday




# What is SAS

### SAS (Statistical Analysis System) is a software suite developed for statistical analysis
### SAS was first developed at North Carolina State University in 1966
### SAS has more than 3 million users and is widely used in public health, business intelligence, etc

# Access to SAS

- Most computers in campus libraries and labs are installed with SAS
- With any computer, use PSU WebApps: <https://webapps.psu.edu> (not recommended)
  - It can be very slow and unstable
  - It only accesses remote directories
- Install SAS in your own computer
  - For Windows and Linux users, you can purchase SAS and install in your computers. (For windows, you can purchase via <http://software.psu.edu/>)
  - For Mac users, you have to use a virtual machine
  - There is also a free SAS University edition, but you need a virtualization software <http://www.sas.com/en_us/software/university-edition/download-software.html>

# Open SAS

## Open SAS: Start > All Programs > Spreadsheets and Statistics > SAS 

- Editor: enter, edit and submit SAS programs
- Log: display messages about your SAS session and any submitted SAS codes. More importantly: you can check errors and warnings in this window
- Output: display the listing output
- Results: display a list of all generated outputs.
  This window also helps you navigate and manage output from submitted SAS programs.
- Explorer: view and manage the SAS files, 
  create new SAS libraries and files and open any SAS file



# Read data into SAS

In [1]:
 /* This is my first SAS code! */
DATA TestData;
input Subject Gender $ Height Weight;
DATALINES;
1024 M 68.5 155
1167 F 61.2 99 
1168 F 63.0 115
1201 M 70.0 205
1302 M . 170
;
PROC PRINT data = TestData;
RUN;


Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


# Two Main Steps of SAS Program

## DATA Step

This step creates SAS data sets, including reading data from external sources, manipulating it and combining it with other data sets

## PROC Step

This step can be any operations on SAS data sets, including performing statistical analysis, producing report, generating graphs/charts, etc

## SAS program consists of many statements!

# Line by Line Analysis

## Comment: /\* This is my first SAS code! \*/

Statements enclosed in /* ... */ are treated as comments, which will not be executed by SAS. We can also simply use * to comment an entire line. Comments also
make your codes more readable and manageable. 


## DATA Step: DATA TestData;

 Every data step begins with DATA statement.
 The letter string after DATA statement is the name of the dataset.
 An appropriate name for SAS dataset must be between 1 and 32 characters long, must begin with either an uppercase letter, lowercase letter or an underscore \_, and thereafter can contain any combination of numbers, letters, and underscores.


# Line by Line Analysis

## INPUT Statement: INPUT Subject Gender `$` Height Weight;

It tells SAS about the details of the variables in the dataset, such as how many variables, names of variables, types of variables. 
The sign `$` after the variable Gender indicates that it is a character variable (F or M), not numerical.


## DATALINES;

It indicates that the source of the data will follow. The semicolon(;) after the data indicates the end the dataset.

## PROC PRINT data=TestData;

This is a PROC step: print a dataset called TestData


# More details about SAS codes

- SAS program is not case-sensitive. You can use uppercase letters, lowercase letters and even a combination of both
- Semicolon (;) must be added at the end of each statement
- If an observation is missing,  it is necessary to enter dot (.) to tell SAS there is a missing observation here in this input format
- When SAS reads a character variable, SAS specifies the default length (8) to the character variable. 
  If you have a character variable whose length is longer than 8, you can specify it in the INPUT statement.

# Create a SAS Library

### SAS library is simply a collection of SAS files that are stored in the same folder or directory on your computer

### Explorer > Libraries > Work > Testdata

The Work library is the default library for new datasets.
However, datasets stored in this library will be removed at the end of each SAS session.
In another word, it is a temporary library.

## Define a library by using LIBNAME statement

LIBNAME libname 'libpath';

libname is the name you refer to the library, which is the folder indicated between the quotation marks (the folder must exist)

In [2]:
LIBNAME STAT480 './STAT480';

### We have created a library STAT480 which refers to a directory in the computer (which is permanent), now we are ready to create a permanent SAS dataset

# Refer to Datasets in a SAS Library

- SAS refers to the data set by a two-level name: libraryname.filename
  - libraryname is the name of the library you want to refer to, which is defined by LIBNAME statement
  - filename is the name of the SAS dataset file

- Recall that Work is the name for the temporary library in SAS. The temporary data set can also be referred to as Work.filename

- If there is no libraryname, the default libraryname is Work

# Create Permanent SAS Datasets

- Simply use a two-level name with a library name other than Work in the DATA step

  DATA lib1.TestData;

- Take a look the Explorer window, you should find both Work and STAT480 libraries in the 'Libraries' folder, with a data set in each library

- In the folder 'STAT480', a permanent xxx.sas7bdat file is created

- Although the folder is permanent, the library reference name (STAT480) is not. So every time you start a new SAS section, you need to use LIBNAME statement to assign a library reference name for the folder



In [3]:
DATA STAT480.TestData;
input Subject Gender $ Height Weight;
DATALINES;
1024 M 68.5 155
1167 F 61.2 99 
1168 F 63.0 115
1201 M 70.0 205
1302 M . 170
;
PROC PRINT data = STAT480.TestData;
RUN;

Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


# Read a .txt file into SAS

- In the previous example, the source data is embedded in the program. It is called in-stream data source. We can also read external files, e.g., .txt and .csv files into SAS

- Use infile statement within the DATA step to indicate the location of the data file.

  infile 'location-of-file';

In [4]:
DATA TestData1;
infile './STAT480/TestData.txt';
input Subject Gender $ Height Weight;
RUN;

PROC PRINT data = TestData1;
RUN;

Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


# Infile statement

- The infile statement must precede the INPUT statement 

- It merely replaces the DATALINES statement and the in-stream source data that appeared in the previous example. 

# FILENAME statement

- With FILENAME statement, we can use a fileref (for file reference) to point to a file.

- Just as we use a LIBNAME statement to assign a libref to a library, we use a FILENAME statement to assign a fileref to a file.

- Filerefs perform the same as librefs, that is, they temporarily point to the  location of a data file.

In [5]:
FILENAME test './STAT480/TestData.txt';
DATA TestData1;
infile test;
input Subject Gender $ Height Weight;
RUN;

PROC PRINT data = TestData1;
RUN;

Obs,Subject,Gender,Height,Weight
1,1024,M,68.5,155
2,1167,F,61.2,99
3,1168,F,63.0,115
4,1201,M,70.0,205
5,1302,M,.,170


# Tips on Writing Programs 

- Writing programs should be done in small steps. If you start small, build on what works, and always check your results along the way, you will increase your programming efficiency.

- Documentations are one of your best friends! <http://support.sas.com/documentation>

- Sometimes programs that do not produce errors are still incorrect. This is why it is vital to check your results as you go even when there are no errors. 

- If you do get errors, don’t worry. Most programs don’t work the first time. You may forget a semicolon, misspell a word. If you build you programs piece by piece, programs are much easier to correct when something goes wrong. And let the error messages help you correct the mistakes.


In [6]:
DATA STAT480.TestData;
input Subject Gender $ Height Weight;
DATALINES;
1024 M 68.5 155
1167 F 61.2 99 
1168 F 63.0 115
1201 M 70.0 205
1302 M . 170
;
PROC PRINT data = STAT481.TestData;
RUN;