# Introduction to SAS Programming  
**Market Research & Analysis Class — Lecture Notes**

This notebook covers the basics of SAS programming, focusing on:
- Libraries (temporary vs permanent)
- Creating datasets
- Viewing and exploring data
- Running statistical procedures
- Merging datasets
- Best practices & troubleshooting

## 1. SAS Overview
SAS (Statistical Analysis System) is used to store, manage, and analyze survey data.  
We will use SAS throughout this course to process survey data, calculate statistics, and create reports.

## 2. SAS Interface
SAS has several main windows:

- **Editor (Program window):** where we write code  
- **Log:** shows messages, warnings, errors — always check after running code  
- **Results:** displays tables/outputs generated by procedures  
- **Explorer:** shows libraries (folders) and datasets

## 3. Libraries: Temporary vs Permanent
- **Work Library:** Temporary, erased when SAS session ends  
- **Permanent Library:** Created by you, stores data between sessions

We must create a permanent library to keep our data.

In [None]:
/* Create a permanent library */
LIBNAME mylib 'C:\Users\YourName\SASData';

/* Temporary dataset (saved in WORK library) */
DATA work.students;
    INPUT name $ age weight height;
    DATALINES;
John 21 70 175
Mary 22 65 162
;
RUN;

/* Permanent dataset */
DATA mylib.students;
    SET work.students;
RUN;

## 4. Viewing Data and Metadata
Use PROC CONTENTS and PROC PRINT to inspect datasets.

In [None]:
/* View dataset structure: variables, types, number of obs */
PROC CONTENTS DATA=mylib.students;
RUN;

/* View dataset observations */
PROC PRINT DATA=mylib.students;
RUN;

/* Print only selected rows */
PROC PRINT DATA=mylib.students;
    WHERE age > 21;
RUN;

## 5. Frequency and Summary Statistics
Generate counts, means, min, max, and other statistics.

In [None]:
/* Frequency table */
PROC FREQ DATA=mylib.students;
    TABLES age;
RUN;

/* Summary statistics: mean, min, max, std dev */
PROC MEANS DATA=mylib.students;
    VAR weight;
RUN;

/* Grouped summary statistics by age */
PROC MEANS DATA=mylib.students;
    CLASS age;
    VAR weight;
RUN;

## 6. Creating New Variables
We can add new variables to a dataset using a DATA step.

In [None]:
DATA mylib.students;
    SET mylib.students;
    BMI = (weight / (height*height)) * 10000;  /* Body Mass Index */
RUN;

## 7. Merging Datasets
Merge two datasets using a common key (BY variable).

In [None]:
/* Make sure both datasets are sorted by BY variable before merging */
PROC SORT DATA=mylib.students; BY name; RUN;
PROC SORT DATA=mylib.survey; BY name; RUN;

DATA mylib.merged;
    MERGE mylib.students mylib.survey;
    BY name;
RUN;

## 8. Keeping or Dropping Variables
Choose which variables to keep or drop in a dataset.

In [None]:
DATA mylib.students_small;
    SET mylib.students(KEEP=name weight);
RUN;

DATA mylib.students_noage;
    SET mylib.students(DROP=age);
RUN;

## 9. Troubleshooting & Common Errors
| Problem | Cause | Solution |
|--------|--------|---------|
| Dataset disappears after session | Saved in WORK library | Use permanent library (LIBNAME) |
| "Dataset not found" | Misspelled name or wrong library | Check LIBNAME and dataset name |
| Code not running | Missing semicolon | Always end with `;` |
| Wrong results in WHERE filter | Case sensitivity issue for character variables | Match exact case of text |
| Can't overwrite dataset | Table is open in Explorer | Close the open table, then rerun |

## 10. Best Practices
- **Always check Log window** after running code
- Save both **datasets** and **programs (.sas files)**
- Use permanent libraries for important work
- Close open tables before overwriting
- Name datasets and libraries clearly

## 11. Workflow Summary
1. **Write code** in the Editor  
2. **Submit** (Run)  
3. **Check Log** for errors/warnings  
4. **View results** in Results/Explorer  
5. **Save** datasets and programs if needed