# 4 - Modifying & Combining Datasets

## The SET statement:

The following program:
* 1 & 2 - reads the raw data for the south and north entrances into SAS data sets and prints them to make sure they are correct
* 3 - combines the two sas data sets using a set statement.  the same data step creates a new variable, which tells how much each customer paid based on their age

In [3]:
data southentrance;
    infile "South.dat";
    input entrance $ passnumber partysize age;
proc print data = southentrance;
    title 'south entrance data';
run;
data northentrance;
    infile "North.dat";
    input entrance $ passnumber partysize age lot;
proc print data = northentrance;
    title 'north entrance data';
run;
* create a data set, both, combining northentrance and southentrance;
* create a variable, amountpaid, based on value of variable Age;
data both;
    set southentrance northentrance;
    if age = . then amountpaid = .;
        else if age < 3 then amountpaid = 0;
        else if age < 65 then amountpaid = 35;
        else amountpaid = 27;
proc print data = both;
    title 'both entrances';
run;

Obs,entrance,passnumber,partysize,age
1,S,43,3,27
2,S,44,3,24
3,S,45,3,2

Obs,entrance,passnumber,partysize,age,lot
1,N,21,5,41,1
2,N,87,4,33,3
3,N,65,2,67,1
4,N,66,2,7,1

Obs,entrance,passnumber,partysize,age,lot,amountpaid
1,S,43,3,27,.,35
2,S,44,3,24,.,35
3,S,45,3,2,.,0
4,N,21,5,41,1,35
5,N,87,4,33,3,35
6,N,65,2,67,1,27
7,N,66,2,7,1,35


### Interleaving Data
The following program:
* 1 & 2 - reads & sorts the 2 data sets
* 3 - combines the two datasets

In [4]:
data southentrance;
    infile "South.dat";
    input entrance $ passnumber partysize age;
proc print data = southentrance;
    title 'south entrance data';
run;
data northentrance;
    infile "North.dat";
    input entrance $ passnumber partysize age lot;
proc sort data = northentrance;
    by passnumber;    
proc print data = northentrance;
    title 'north entrance data';
run;
* create a data set, both, combining northentrance and southentrance;
* create a variable, amountpaid, based on value of variable Age;
data interleave;
    set southentrance northentrance;
    by passnumber;
proc print data = interleave;
    title 'both entrances, by pass number';
run;

Obs,entrance,passnumber,partysize,age
1,S,43,3,27
2,S,44,3,24
3,S,45,3,2

Obs,entrance,passnumber,partysize,age,lot
1,N,21,5,41,1
2,N,65,2,67,1
3,N,66,2,7,1
4,N,87,4,33,3

Obs,entrance,passnumber,partysize,age,lot
1,N,21,5,41,1
2,S,43,3,27,.
3,S,44,3,24,.
4,S,45,3,2,.
5,N,65,2,67,1
6,N,66,2,7,1
7,N,87,4,33,3


## Merging Data sets
### One-way Merge

The following program:
* reads the descriptions (already sorted) and sales data (unsorted) --> MUST PRE-SORT BEFORE JOINING DATASETS

In [9]:
data descriptions;
    infile "Chocolate.dat" TRUNCOVER;
    input CodeNum $ 1-4 Name $6-14 Description $15-60;
run;
data sales;
    infile "Chocsales.dat";
    input CodeNum $ 1-4 PiecesSold 6-7;
proc sort data = sales;
    by codenum;
run;
* merge datasets by codenum;
data chocolates;
    merge sales descriptions;
    by codenum;
proc print data = chocolates;
    title 'today's chocolate sales';
run;

Obs,CodeNum,PiecesSold,Name,Description
1,A206,12,Mokka,Coffee buttercream in dark chocolate
2,A536,21,Walnoot,Walnut halves in bed of dark chocolate
3,B713,29,Frambozen,Raspberry marzipan covered in milk chocolate
4,C865,15,Vanille,Vanilla-flavored rolled in ground hazelnuts
5,K014,1,Kroon,Milk chocolate with a mint cream center
6,K086,9,Koning,Hazelnut paste in dark chocolate
7,M315,.,Pyramide,White with dark chocolate trimming
8,S163,34,Orbais,Chocolate cream in dark chocolate


### One to Many Match Merge

In [11]:
data regular;
    infile "Shoe.dat";
    input style $ 1-15 exercisetype $regularprice;
proc sort data = regular;
    by exercisetype;
run;
data discount;
    infile "Disc.dat";
    input exercisetype $ adjustment;
run;
data prices;
    merge regular discount;
    by exercisetype;
    newprice = round(regularprice - (regularprice*adjustment),.01);
proc print data = prices;
    title 'price list for may';
run;

Obs,style,exercisetype,regularprice,adjustment,newprice
1,Zip Sneak,c-train,92.99,0.25,69.74
2,Max Flight,running,142.99,0.3,100.09
3,Zoom Airborne,running,112.99,0.3,79.09
4,Zip Fit Leather,walking,83.99,0.2,67.19
5,Light Step,walking,73.99,0.2,59.19
6,Max Step Woven,walking,75.99,0.2,60.79
