# Lecture04 - SAS Programming, Lecture Notes

**Goal**: Learn the core SAS concepts, rules, and procedures used in data management and analysis.

**What you will learn**:
1. SAS windows and libraries
2. Syntax rules and naming rules
3. DATA step vs PROC step
4. Common procedures, like CONTENTS, PRINT, FREQ, MEANS, SUMMARY
5. Sorting, merging, and conditional logic
6. Worked examples: Weight Club and Cake Competition

> Tip: When you run SAS programs, always check the **Log**. The Log shows errors, warnings, and notes.

## 1. SAS Windows

- **Explorer**, manage libraries and datasets
- **Editor**, write and edit programs
- **Log**, messages about your run
- **Results**, tables and reports from procedures
- **Output**, printed results

> Practice, after you run code, switch to **Log** and scan for red text or warnings.

## 2. SAS Libraries

- **WORK** is temporary, data disappears when SAS closes
- **Permanent library** is created with `LIBNAME`, and points to a folder

```sas
libname mylib 'C:\Users\Chiaki\SASdata';
```

If SAS says the library is not assigned, check that the folder path exists.

## 3. Syntax Rules and Naming Rules

- Every statement ends with a semicolon `;`
- Use quotes for character values and folder paths
- Names must start with a letter, numbers can follow, use `_` for spaces
- Avoid special characters
- SAS is not case sensitive for names, but it is case sensitive for character **values** in filters

## 4. DATA step vs PROC step

- **DATA** creates or transforms datasets
- **PROC** analyzes or reports data with built in tools

### Example, DATA step
```sas
data weightclub;
   input idno 1-4 name $ 6-24 team $ strtwght endwght;
   loss = strtwght - endwght;
   datalines;
1023 David Shaw         red    189 165
1049 Amelia Serrano     yellow 145 124
1219 Alan Nance         red    210 192
1246 Ravi Sinha         yellow 194 177
1078 Ashley McKnight    red    127 118
1221 Jim Brown          yellow 220   .
1095 Susan Stewart      blue   135 127
1157 Rose Collins       green  155 141
1331 Jason Schock       blue   187 172
1067 Kanoko Nagasaka    green  135 122
1251 Richard Rose       blue   181 166
1333 Li-Hwa Lee         green  141 129
1192 Charlene Armstrong yellow 152 139
1352 Bette Long         green  156 137
1262 Yao Chen           blue   196 180
1087 Kim Blackburn      red    148 135
1124 Adrienne Fink      green  156
;
run;
```

### Example, PROC step
```sas
proc print data=weightclub;
   where team='red';
run;
```

## 5. Useful Procedures

### 5.1 PROC CONTENTS, dataset structure
```sas
proc contents data=weightclub;
run;
```

### 5.2 PROC PRINT, list rows with filters
```sas
proc print data=weightclub;
   where team='red';
run;
```

### 5.3 PROC FREQ, frequency and cross tables
```sas
proc freq data=weightclub;
   tables team;
run;
```

### 5.4 PROC MEANS, descriptive stats
```sas
proc means data=weightclub;
   var loss;
run;
```

### 5.5 PROC SUMMARY, like MEANS but more control of outputs
```sas
proc sort data=weightclub; by team; run;

proc summary data=weightclub nway;
   class team;
   var loss;
   output out=team_stats mean=avg_loss;
run;
```

## 6. Creating and Modifying Variables

- Create variables with arithmetic or conditions
- Row wise operations happen in the DATA step, column wise with procedures

```sas
data weightclub2;
   set weightclub;
   /* Example conditional labels vs team average */
   /* Assume team_stats has avg_loss by team */
run;
```

Vector functions across many columns:
```sas
max_val = max(of x1-x10);
min_val = min(of x1-x10);
avg_val = mean(of x1-x10);
```

## 7. Sorting and Merging

```sas
proc sort data=weightclub; by team; run;
proc sort data=team_stats; by team; run;

data weightclub_joined;
   merge weightclub(in=a) team_stats(in=b);
   by team;
   if a; /* keep only original rows */
   /* Label performance vs team */
   length result $30;
   if not missing(avg_loss) then do;
      if      loss >  avg_loss then result = 'Better than team';
      else if loss =  avg_loss then result = 'On average';
      else                          result = 'Keep training';
   end;
run;
```

> Warning, if both datasets have a same variable name, the right one will overwrite. Use `RENAME=` to avoid accidental overwrite.

## 8. Worked Example, Cake Competition

**Dataset**: name, age, presentation score, taste score, flavor, layers

```sas
libname mylib 'C:\Users\Chiaki\SASdata';

data mylib.cake;
   length name $30 flavor $12;
   input name $ age present_score taste_score flavor $ layers;
   datalines;
Orlando 38 93 84 vanilla 1
Ava     29 88 91 chocolate 2
Mina    44 90 86 chocolate 1
Ken     41 85 79 vanilla 2
Rita    36 92 95 strawberry 2
;
run;
```

### 8.1 Average age of participants
```sas
proc means data=mylib.cake;
   var age;
run;
```

### 8.2 Count cakes by flavor
```sas
proc freq data=mylib.cake;
   tables flavor;
run;
```

### 8.3 Highest taste score per flavor and merge back
```sas
proc sort data=mylib.cake; by flavor; run;

proc means data=mylib.cake nway noprint;
   class flavor;
   var taste_score present_score;
   output out=flavor_stats max=max_taste max_present=max_present;
run;

data flavor_stats_clean;
   set flavor_stats;
   keep flavor max_taste max_present;
run;

data mylib.cake_with_max;
   merge mylib.cake flavor_stats_clean;
   by flavor;
run;
```

Now the dataset has the max taste and presentation score for each flavor for easy comparison.

## 9. Common Pitfalls and Checks

- Always end statements with `;`
- Check the **Log** after each run
- Sort before using `BY` in procedures and merges
- Use permanent libraries for datasets you want to keep
- Rename variables on merge if names collide
- Character value filters are case sensitive

**Quick checklist**:
1. Library assigned ok, no error in Log
2. Paths exist
3. PROC with `BY` has prior `PROC SORT`
4. Output datasets written to the intended library

## 10. Mini Exercises

1. Use `PROC CONTENTS` to list all variables in `weightclub`
2. Use `PROC FREQ` to show the number of members by `team`
3. Compute the mean of `loss` by team with `PROC SUMMARY` and write to `team_stats`
4. Merge `team_stats` back to `weightclub` and label performance with `result`
5. In `mylib.cake`, compute the highest `present_score` by `flavor` and merge back

> Bonus, adapt the code to your own folders by changing the `libname` path.