# GWU STAT 4197/6197
## Week 2- DATA Step: Reading Data, and Creating Reports
### Reading Raw Data into SAS Data Sets (SAS Code Examples - Part 1)

* Column Input
* Formatted Input
* List Input 
* Named Input

In [None]:
*Ex1_Column_Input.sas;
options nocenter nonumber nodate;
* No INFILE statement; 
data work.HAVE1;
 input id $ 1-3 name $ 5-16 
       score1 18-19 score2 21-22;
datalines;
001 Tim Dyson    74 87 
002 Sam Larson   96 82 
003 Jane Miller  91 88 
004 Bikas Das    90 87 
; 
title 'Column input style, no infile statment';
proc print data=work.HAVE1 noobs; 
run;




In [None]:
*Ex1_Column_Input.sas;
* FISTOBS= option on the INFILE statement; 
options nocenter nonumber nodate;
data work.HAVE2;
 infile datalines firstobs=2;
 input id $ 1-3 name $ 5-16 
       score1 18-19 score2 21-22;
datalines;
1234567890123456789012
001 Tim Dyson    74 87 
002 Sam Larson   96 82 
003 Jane Miller  91 88 
004 Bikas Das    90 87 
; 
title 'Column input style, option on infile statment';
proc print data=work.HAVE2 noobs; 
run;
title;

The PAD option in the INFILE statement has caused 
the DATA step to correctly assign missing values
for the data field (score_value) at the end of the "shorter"
record.

In [None]:
*Ex2_column_Input_PAD_Option.sas;

*** To clean up the "work" library used;
proc datasets lib=work nolist kill; quit;

data HAVE;
*** LRECL= and PAD option on the INFILE statement;
 infile 'C:\SASCourse\Week2\short_records.txt'
         Lrecl=25 PAD;
 input id 1-3 name $ 5-16 
       score 18-19 @21 some_value 5.2;
proc print data=HAVE noobs; run;


The TRUNCOVER option on the INFILE statement “causes the 
DATA step to assign the raw data value to the variable even  
if the value is shorter than expected by the INPUT statement. 
If, when the DATA step encounters the end of an input record,
there are variables without values, the variables are assigned 
missing values for that observation.” SAS Documentation.

In [None]:
*Ex3_column_Input_TRUCOVER_Option.sas;

*** To clean up the "work" library;
proc datasets lib=work nolist kill; quit;

data HAVE;
*** TRUNCOVER option on the INFILE statement;
 infile 'C:\SASCourse\Week2\short_values.txt' TRUNCOVER;
 input id 1-3 name $ 5-16 
       score 18-19 @21 some_value 5.2;
proc print data=HAVE noobs;
run;

In [None]:
*Ex3_column_Input_TRUCOVER_Option.sas;
*Additional Examples;
DATA test2;
  INFILE "C:\SASCourse\Week2\test_data.txt" firstobs=2 truncover;
  INPUT lastn $1-10 Firstn $ 11-20
   Empid $21-30 Jobcode $31-40;
   put _ALL_;
RUN;
title "Option TRUNCOVER"; 
proc print data=test2; run;

In [None]:
*Ex4_Formatted_Input.sas;
OPTIONS nocenter nonumber nodate;
/*Formatted input - 
  Column pointer control that moves the pointer COLUMN n */

data have1;  
infile datalines firstobs=3;   /*Read from the 3rd record - Why? */
input software $5.             /* $w.informat left-justifies the value*/
      @1 x_software $char5.     /* $char informat right-justifies the value*/
      @7 book_titles 3.         /*SAS moves the pointer to column 7*/
      @11 date_searched mmddyy10.; /*Informat specified*/ 
format date_searched mmddyy10.;    /*Format specified*/
datalines; 
http://r4stats.com/articles/popularity/
12345678901234567890
  SAS 576 06/01/2015
 SPSS 339 07/01/2015
    R 240 08/01/2015
Stata  62 09/01/2015
;                     
title 'Formatted Input';
proc print data=Have1 noobs ; run;
proc contents data=Have1 p; 
ods select position;
run;

In [None]:
*Ex4_Formatted_Input.sas;
OPTIONS nocenter nonumber nodate;
/*Formatted input - 
  Column pointer control that moves the pointer n POSITIONS */

data Have2;  
infile datalines firstobs=3;   /*Read from the 3rd record - Why? */
input software $5.             /* $w.informat left-justifies the value*/
      @1 x_software $char5.     /* $char informat right-justifies the value*/
      +1 book_titles 3.         /*SAS moves the pointer to 1 position */
      +1 date_searched mmddyy10.; /*informat specified */ 
format date_searched mmddyy10.;    /*Format specified */
datalines; 
http://r4stats.com/articles/popularity/
12345678901234567890
  SAS 576 06/01/2015
 SPSS 339 07/01/2015
    R 240 08/01/2015
Stata  62 09/01/2015
;   
title 'Formatted Input';                                                                                           
proc print data=Have2 noobs ; run;


In [None]:
*Ex5_formatted_column_input.sas;
* Use of absolute pointer control;
data work.apc;
infile datalines firstobs=2;
input id $1.        @3 x1 5. 
      @9 x2 dollar7.   @9 a_x2 comma7.
      @17 x3 dollarx7. @17 a_x3 commax7. 
      @25 x4 6.        @32 x5 percent7.;
format x2 dollar7.  a_x2 comma7. 
       x3 dollarx7. a_x3 commax7. 
       x5 percent7.;
datalines;
1234567890123456789012345678901234
A 12909 $12,909 $12.909 12.909 12%
;
title 'Use of absolute pointer control';
proc print data=work.apc noobs; run;

In [None]:
*Ex5_formatted_column_input.sas;
data work.rpc;
infile datalines firstobs=2;
input id $1.           +1 x1 5. 
      +1 x2 dollar7.   +(-7) a_x2 comma7.
      +1 x3 dollarx7.  +(-7) a_x3 commax7. 
      +1 x4 6.         +1 x5 percent7.;
format x2 dollar7.  a_x2 comma7. 
       x3 dollarx7. a_x3 commax7. 
       x5 percent7.;

datalines;
1234567890123456789012345678901234
A 12909 $12,909 $12.909 12.909 12%
;
title 'Use of relative pointer control';
proc print data=work.rpc noobs; run;

In [None]:
*Ex6_Formated_Input_Dates.sas (Part 1);
DATA work.Have1;
INPUT            
            @1  date1 date11. 
            +1  date2 ddmmyy6.
            +1  date3 mmddyy10. 
            +1  date4 yymmdd8.
            +1  date5 ddmmyy10.
            +1  date6 mmddyy8.
            @1  c_date $11.;    /*re-read the very first field 
                              as a character variable */

 * Convert the character date variable into a numeric variable
   using the INPUT function;

    n_date = input(c_date,anydtdte11.);

 FORMAT date1 date2 date3 date4 date5 date6 mmddyy10.
       n_date date9. ; 
DATALINES;
14/JAN/2015 140115 01-14-2015 15 01 14 14.01.2015 01/14/15
;
title "Using the original permanent formats (originally added to the variables in the DATA step)";
proc print data=work.have1; 
run;

title "Using the temporary formats in the PROC step";
proc print data=have1;
Format date1 date9. 
      date2 WORDDATE. 
      date3 WORDDATX. 
      date4 WEEKDATE. 
      date5 MONYY.  
      date6 DOWNAME.
      n_date mmddyy10.;
run;




*Reading Dates Using ANYDATE Informat 

You can use an ANYDTDTE informat to read in dates
 of fifferent structures including: 
 
* DATE, DATETIME, TIME, DDMMYY, 
* MMDDYY, and YYMMDD 
* JULIAN, MONYY, and YYQ 

You can also use the following INFORMATs to extract parts of dates:

* ANYDTDTE. Extracts the date portion 
* ANYDTDTM. Extracts the datetime portion 
* ANYDTTME. Extracts the time portion

Adapted from Venky Chakraborty's PharmaSUG2010 paper


In [None]:
*Ex6_Formated_Input_Dates.sas (Part 2);
title ' ';
data work.date_data;
input @1 mix_dates anydtdte.;
format mix_dates anydtdte.;
datalines;
27Aug2018
08/27/2018
27Aug2018 3:30:32.8
180827
08272018
SEP2018
18Q4
;
proc print data=date_data;
run;


In [None]:
*Ex6_Formated_Input_Dates.sas (Part 3);
* Use of Colon Modifier;
data Modified_List_input_date;
   infile datalines DLM = ',';
   input date :mmddyy. copay_amount;
   month_name=put(date,monname3.);
   format date mmddyy10.;
datalines;
10/05/2004,25
11/5/2004,25
;
proc print data=Modified_List_input_date noobs;
run;

In [None]:
*Ex6_Formated_Input_Dates.sas (Part 4);
data Modified_List_input_x;
   input ID $ Date_Time :DATETIME.   In_out $ ;
        date = datepart(Date_Time);
        time = timepart(Date_Time);
 FORMAT Date_Time DATETIME. date date9. time time10.2;
DATALINES;
E1 18FEB15:07:35 In 
E2 20MAR15:09:15 In 
E3 28FEB05:19:05 Out 
E4 01MAR05:17:28 Out 
;
run;
proc print data=Modified_List_input_x noobs ;
run;

In [None]:
*Ex6_Formated_Input_Dates.sas (Part 5);
* Date Constatnt, and DHMS and DATEPART Functions;
data _null_;
  d='13JAN2016'd;
  put d date.;
  dt=dhms(d,0,0,0);
  put dt datetime.;
  d =datepart(dt);
  put d date.;
run;

data temp1;
   INFILE datalines DLM=',';
   INPUT state_name  : $ 22. dayOfweek : $ 10. 
         Monthday : $15. year ;
         date_entry=strip(dayofweek)||', '||
                    strip(monthday)||', '||
                    strip(year);
   DATALINES;
     Delaware, Friday,  December 7, 1787
     Pennsylvania, Wednesday,  December 12, 1787
     New Jersey, Tuesday,  December 18, 1787
     South Carolina, Friday,  May 23, 1788
;
proc print data=temp1 noobs; run;


In [None]:
*Ex6_Formated_Input_Dates.sas (Part 6);
data temp2;
   INFILE datalines DLM=',';
   INPUT state_name  : $ 22. dayOfweek : $ 10. 
         Monthday : $15. year ;
         date_entry=catx(',', dayofweek, monthday,year);
   DATALINES;
     Delaware, Friday,  December 7, 1787
     Pennsylvania, Wednesday,  December 12, 1787
     New Jersey, Tuesday,  December 18, 1787
     South Carolina, Friday,  May 23, 1788
;
proc print data=temp2 noobs; run;

Scenario: Read dates that fall in the 18th century 
using the YEARCUTOFF option, which defines the beginning 
of the 100-year period for those digit year.

In SAS 9.4, the SAS default value for this option
is 1926.

You use this option when your date variable contains
a 2-digit year value (e.g., 78 instead of 1778) and 
the year values are outside of the 100-year span from
1920 to 2019 that is implied by the SAS default option 
YEARCUTOFF=1920. 

In the example below, we read in SAS 
the dates when four States joined the Union.  
Since these dates are outside of the default 100-year span
(1920-2019), we need to override the default option by 
using the option YEARCUTOFF=1720 to ensure that all the
dates we are reading range from years 1720 to 1820.

In [None]:
*Ex6_Formated_Input_Dates.sas (Part 7);
options yearcutoff=1720;
data yc;
   INPUT state_name  & $22. date_entry :mmddyy.; 
   FORMAT date_entry :mmddyy10.;
DATALINES;
Delaware  12/07/87
Pennsylvania  12/12/87
New Jersey  12/18/87
South Carolina  05/23/88
;
proc print data=yc noobs; 
run;

In [None]:
*Ex7_Simple_List_Input.sas (Part 1);
OPTIONS nodate nonumber ps=58 ls=98;
DATA Work.Have1;
    INPUT  st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama 4833722          77
    ;
  PROC PRINT data=work.Have1 noobs; run;

* Use the $ option to read in character data
* Use the LENGTH statement to avoid unwanted
    truncation of the values of character variables 
    that are more than  8 chracters long

In [None]:
*Ex7_Simple_List_Input.sas (Part 2);
DATA Work.Have2;   
   LENGTH st_name $ 10;                        
    INPUT st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama            4833722  77
    California 38332521             76.1
  ;
 PROC PRINT data=work.Have2 noobs; run;
 proc contents data=Have2 varnum;
 ods select position;
 run;

* The INFORMAT statement has the same impact of 
  the LENGTH statement for character variables.

In [None]:
*Ex7_Simple_List_Input.sas (Part 3);
DATA Work.Have3;   
   INFORMAT st_name $ 10.;                        
    INPUT st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama 4833722  77
    California 38332521 76.1
  ;
 PROC PRINT data=work.Have3 noobs; run;
 proc contents data=Have3 varnum;
 ods select position;
 run;

In [None]:
*Ex7_Simple_List_Input.sas (Part 4);
* Use the DLM= option to read in comma delimited data;
 DATA Work.Have4;   
   LENGTH  st_name $ 10; 
    infile datalines DLM=','; 
    INPUT st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama, 4833722,  77
    California, 38332521, 76.1
  ;
  PROC PRINT data=work.Have4 noobs; run;

Use a placeholder for the missing value for
fields in the middle of the record of the space-delimited file,
as shown below.

In [None]:
*Ex7_Simple_List_Input.sas (Part 5);
DATA Work.Have5;   
   LENGTH st_name $ 10;                        
    INPUT st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama .    77
    California 38332521 76.1
  ;
 PROC PRINT data=work.Have5 noobs; run;

Use the @@ option to read in more than one record per line.

In [None]:
*Ex7_Simple_List_Input.sas (Part 6);
DATA Work.Have6;   
   LENGTH  st_name $ 10;                        
    INPUT st_name $ pop percent_pop18p  @@;
    DATALINES;
    Alabama 4833722  77   California 38332521 76.1
  ;
  PROC PRINT data =work.Have6 noobs; run;

Use the LABEL and FORMAT statements
 in a DATA step to apply the labels and formats
 to the data table.

In [None]:
*Ex7_Simple_List_Input.sas (Part 7);
DATA Work.Have7;   
    LENGTH  st_name $ 10;                        
    INPUT st_name $ pop percent_pop18p ;
    FORMAT pop comma10. percent_pop18p 5.1;
    LABEL st_name='State Name'
          pop='Population Size'
          percent_pop18p='Percentage of Population Aged 18 Years and Older';
    DATALINES;
    Alabama 4833722  77
    California 38332521 76.1
  ;
  proc print data=work.Have7 noobs labels; run;
  proc contents data=work.Have7 varnum;
  ods select position;
  run;

http://www.sascommunity.org/wiki/DATALINES4_statement

The DATALINES4 statement preceeds any lines of data that 
are going to be read into the DATA step. The lines of data, 
which may contain semicolons, that immediately follow 
this statement end when four consecutive semicolons are 
encountered on anew line. If the data itself does not 
contain any semicolons then the DATALINES statement 
can be used instead.

The DATALINES4 statement must be used at the end of 
a DATA Step after all the other statements in the 
data step but before the lines of data, 4 consecutive 
terminating semicolons on a new line and the RUN statement.

In [None]:
*Ex31_Datalines4;
data Have;
   input state_data $50. ;
   datalines4;
Alabama;  4833722; 77.0
    California;  38332521; 76.1
;;;;

  proc print data=work.HAVE noobs;
   run;

* Use the LABEL and FORMAT statement 
  in the PROC step to apply the labels and formats
  to the data table

* You must use a SPLIT= option with PROC PRINT 
    to display descriptive column headings with split text

In [None]:
*Ex7_Simple_List_Input.sas (Part 8);
 DATA Work.Have8;   
    LENGTH  st_name $ 10;                        
    INPUT st_name $ pop percent_pop18p ;
    DATALINES;
    Alabama 4833722  77
    California 38332521 76.1
  ;
PROC PRINT data=work.Have8 noobs split='*';
    FORMAT pop comma10. percent_pop18p 5.1;
    LABEL st_name='State Name'
          pop='Population Size'
          percent_pop18p='Percentage*of Population* Aged 18 Years* and Older';
     
  run;

In [None]:
*Ex8_List_Input_Modified_Input.sas (Part 1);
OPTIONS nodate nonumber ps=58 ls=98;

*List input style with LENGTH and INFORMAT statements; 
data work.Students_x; 
  length Id $6 Name $14 Address $16 City $20 State $2 zip $5 ; 
  informat Reg_date mmddyy10.;
  format Reg_date mmddyy10.;
  infile datalines dlm=',';
  input Id -- Reg_date;
datalines;
G009876, Ann Miller,2219 Pine St, Rockville,MD,28057, 08/20/2016
G008765, Rubi Tyson,6504 Spring St, Philadelphia,PA,19104,08/13/2016
;
PROC PRINT data=students_x noobs; 
RUN;

In [None]:
*Ex8_List_Input_Modified_Input.sas (Part 2);
*List input style with INFORMAT statement; 
data students_y; 
informat Id $6. Name $14. Address $16. City $20. State $2. zip $5. 
         Reg_date mmddyy10.;
format Reg_date mmddyy10.;
infile datalines dlm=',';
input Id -- Reg_date;
datalines;
G009876, Ann Miller,2219 Pine St, Rockville,MD,28057, 08/20/2016
G008765, Rubi Tyson,6504 Spring St, Philadelphia,PA,19104,08/13/2016
;
PROC PRINT data=students_y noobs;  
RUN;

Code Explnation (Part 3)

  The CITY variable is read in as a character variable using 
  the $20.  ($w. ) Informat.   This informat tells SAS that the variable 
  is character with a length of 22.  
  
  The REG_DATE variable is read as date informat MMDDYYw.  
  (the qualifier $w. is set to 10 since this date field occupies 10 spaces).


 Points to remember (Modified List Input)

 The colon (:) format modifier enables you to use list input 
 and also to specify an informat after a variable name, 
 whether character or numeric. 
 
 SAS reads until it encounters 
 a blank column, the defined length of the variable (character only), 
 or the end of the data line, whichever comes first.


In [None]:
*Ex8_List_Input_Modified_Input.sas (Part 3);
* List input style with colon modifier;
data students_z; 
infile datalines dlm=',';
input Id :$6. Name :$14. Address :$16. City :$20. 
      State :$2. zip :$5. Reg_date :mmddyy.;
format Reg_date mmddyy10.;
datalines;
G009876, Ann Miller,2219 Pine St, Rockville,MD,28057, 08/20/2016
G008765, Rubi Tyson,6504 Spring St, Philadelphia,PA,19104,08/13/2016
;
PROC PRINT data=students_z noobs;  
RUN;

Code Explanation (Part 5)

Notice that the mmddyyw.d informat does not specify a w value 
   and that the commaw.d informat does also not specify a w value.
   Here, we have used use the MMDDYY. INORMAT    instead of the 
   mmddyy10. INFORMAT. and the COMMA. INFORMAT instead of COMMA6.

   The rule is that you do not specify a w value to indicate the length 
   of a numeric variable when modifying list input with the 
   colon (:) modifier. 
   
  The quoted text below is from SAS Certification Preparation Guide: 
  Base Programming for SAS® 9 Third Edition (page 552).
  
  "...list input reads each value until the next blank is detected.  
  The default length of numeric variables is w, so you don’t need to 
  specify a w value to indicate the length of a numeric variable.
  This is different from using a numeric informat with formatted input.  
  In that case, you must specify a w value in order to indicate the 
  number of columns to be read."

In [None]:
*Ex8_List_Input_Modified_Input.sas (Part 5);

  DATA Work.Have2; 
    INPUT visit_date :mmddyy. amount :comma.;
    FORMAT visit_date mmddyy10. amount comma7.;
    DATALINES;
    01/01/2013      125,000
    03/02/2014      38,000
    12/18/2015     145,000
  ;
  PROC PRINT data=Have2 noobs; run;

In [None]:
*Ex9_DLM_DSD_MISSOVER.sas (Part 1);
data DLM_data;
infile datalines DLM=',';
input airport :$3. departures:8. airlines:8. date:mmddyy10.; 
format date date9.;
datalines;
DCA,617,16,05/18/2018
;
proc print data=DLM_data;
run;

In [None]:
*Ex9_DLM_DSD_MISSOVER.sas (Part 2);
*Use of the DLM= and DSD options;
data DSD_data;
infile datalines  DLM='/' DSD;
input airport: $3. departures :8. airlines :8. date :mmddyy10.; 
format date date9.;
datalines;
DCA/617//"05/18/2018"
;
proc print data=DSD_data;
run;

For the code chunk below, the DLM option is not needed,
 because we are using a comma-delimited file. 

Use the DSD Option on the INFILE statement, and 
the ampersand (&) and tilde modifiers (~) 
in the INPUT Statement for the following 
kinds of data values.


In [None]:
*Ex9_DLM_DSD_MISSOVER.sas (Part 3);
data DSD_data_X;
infile datalines  DSD;
input airport: $3. departures :8. airlines :8. date :mmddyy10.; 
format date date9.;
datalines;
DCA,617,,"05/18/2018"
;
proc print data=DSD_data_X;
run;

In the example-code below, the ~ (tilde) format modifier enables to read delimiter-embedded 
    numeric/character values within double quotation marks and 
    retain this kind of data values. 

The DSD option on the INFILE statement must be used to get the 
   desired effect of this format modifier.

In [None]:
*Ex9_DLM_DSD_MISSOVER.sas (Part 4);
  DATA Work.Quotation_Surrounded_Values;   
    INFILE DATALINES DSD;
    INPUT st_name ~ $33. percent_pop18p ;
 DATALINES;
 "Alabama, The Yellowhammer State", 77.0
 "California, The Golden State",  76.1
 ;
 PROC PRINT;RUN;

In the example-code below, the DSD option is not required because the missing data 
   is not marked by consecutive delimiters. The MISSOVER option, which is required,
   prevents from SAS from loading new record when the end of the 
   current record is reached.

In [None]:
*Ex9_DLM_DSD_MISSOVER.sas (Part5);
* MISSOVER option on the INFILE statement;
data M_data;
   infile datalines missover;
   input id:1. course :$9.;
   datalines; 
   1 Stat4197
   2 Stat4197
   3  
   4 Stat6197
   ;
proc print data=M_data;
run;

In the example-code below, the & modifier after the  variable ST_NAME, which indicates 
   that its value should be read until two consecutive blanks 
   are encountered. 

   Also note the comma. Informat for the variable pop. 
  The quoted text below is from SAS Certification Preparation Guide: 
  Base Programming for SAS® 9 Third Edition (page 552).

  "...list input reads each value until the next blank is detected.  
  The default length of numeric variables is w, so you don’t need to 
  specify a w value to indicate the length of a numeric variable.
  This is different from using a numeric informat with formatted input.  
  In that case, you must specify a w value in order to indicate the 
  number of columns to be read."

   In the data, there two blanks instead of one blank after each of 
   the data values: Alabama, California, and District of Columbia; 
   two blanks, which are required.


In [None]:
*Ex10_Modified_List_Input.sas;
OPTIONS nodate nonumber ps=58 ls=98;
  DATA work.Have1;   
    INPUT st_name & $20. pop :comma. percent_pop18p ;
    FORMAT pop comma10.;
     DATALINES;
    Alabama  4,833,722  77
    California  38,332,521 76.1
    District of Columbia  646,449 82.8
  ;
  PROC PRINT data=HAVE1 noobs;  RUN;

The single question mark (?) format modifier 
in the INPUT statement below suppresses the invalid data message.

The second data record has the invalid data in the “date” field.

In [None]:
*Ex11_Question_marks.sas (Part 1);
data temp2;
   infile datalines DLM = ',';
   input date ? :mmddyy.  copay_amount;
    format date mmddyy10.;
datalines;
10/05/2004,25
02/29/2015,25
;
proc print data=temp2; run;

The ?? format modifier also suppresses the invalid data message and, 
in addition, prevents the automatic variable _ERROR_ 
from being set to 1 when invalid data are read.
[See SAS® Documentation for details]

In [None]:
*Ex11_Question_marks.sas (Part 2);
data temp3;
   infile datalines DLM = ',';
   input date ?? :mmddyy.  copay_amount;
   format date mmddyy10.;
datalines;
10/05/2004,25
02/29/2015,25
;
proc print data=temp3; run;

In [None]:
*Ex13_Line_Pointer_controls.sas (Part 1);
options nocenter ls=132 nodate nonumber;
data address1;
      input name  & $ 30.
          /subname  & $ 20.
          /st_address1  & $ 30.
          /st_address2  & $ 30.
          /phone $ 14.;
datalines;
Air Force Personnel Center
HQ AFPC/DPSSRP
550 C Street West
Randolph AFB, TX 78150
1-800-525-0102
Navy Personnel Command
(PERS-312E)
5720 Integrity Drive
Millington, TN 38055
901-874-4885
;
proc print data= address1 noobs; run;

In [None]:
*Ex13_Line_Pointer_controls.sas (Part 2);
*Multiple records per observation using the pound (#) sign;
data address2;
   infile datalines ;
   input name  & $ 30.
         #3 st_address1  & $ 30.
         #4 st_address2  & $ 30. 
         #5 phone $ 14.;
datalines;
Air Force Personnel Center
HQ AFPC/DPSSRP
550 C Street West
Randolph AFB, TX 78150
1-800-525-0102
Navy Personnel Command
(PERS-312E)
5720 Integrity Drive
Millington, TN 38055
901-874-4885
;
proc print data= address2 noobs; run;


In [None]:
*Ex14_Column_Formatted_Input.sas;
data Mix_column_Formatted;    
input software $1-5 @9 date date9. @21 amount comma5.;     
format date date9. amount comma5.;      
datalines;                                                                                                                              
SAS     06jan1976   2,345       
Stata   05jan1998   1,560  
R       07jun1996   4,567  
;                                                                                                      
proc print data=Mix_column_Formatted noobs; 
run; 

In [None]:
*Ex15_Absolute_Relative_Pointer_controls.sas;
data Example_formatted_column_input;
input id $ 1 x1 3-7
     @9 x2 dollar7. 
     +1 x3 dollarx7. 
     +1 x4 6. 
     +1 x5 percent7.;
format x2 dollar7. x3 dollarx7. x5 percent7.;
datalines;
A 12909 $12,909 $12.909 12.909 12%
;
proc print data=Example_formatted_column_input noobs;
run;

In [None]:
*Ex16_Named_Input.sas;
options nocenter nodate nonumber ls=132;
DATA TEST;
input name = & $ 30. address = & $ 30.
      city_zip  = & $ 30. phone= $ 14.
      Num_employees = ;
      FORMAT Num_employees comma7.;    
DATALINES;
name=Air Force Personnel Center /
address=550 C Street West /
city_zip=Randolph AFB, TX 78150 /
phone=1-800-525-0102 /
Num_employees=5876 
name= Navy Personnel Command /
address= 5720 Integrity Drive /
city_zip= Millington, TN 38055 /
phone= 901-874-4885 /
Num_employees=3987 
;
proc print data=TEST noobs; 
run; 

In [None]:
*Ex17_Filename_Libname.sas ;
OPTIONS nocenter nodate nonumber;
%LET Path=C:\SASCourse\Week2;
FILENAME raw "&Path\pop2013_no_headers.txt";
LIBNAME SDS "&Path";
PROC FORMAT; 
     value regionfmt
       1='Northeast' 2='Midwest'
       3='South' 4='West';
run;
DATA SDS.Pop;
 INFILE raw DLM=',';
 input sumlev region division fips name :$22.
         pop :comma12.
         pop18p :comma12.  p_pop18p; 
  LABEL region ='Region'     
        FIPS ='State FIPS'
        name ='State Name'  
        pop ='Population*(All Ages)'
        pop18p ='Population*(Aged 18+)'
        p_pop18p = 'Percent*Population*(Aged 18+)'; 
  FORMAT pop pop18p comma12. FIPS z2.;
run;
proc sort data=SDS.pop out=pop; by descending pop; run;
title '8 most populaous states - United States of America, 2013';
proc print data=pop (obs=8) noobs split='*';
var name pop pop18p;
run;
title "United States Total Population, 2013";
LIBNAME SDS "C:\SASCourse\Week2";
proc sql;
 select sum(pop) format=comma12. as TotalPopulation
  from sds.pop
   quit;
title ' ';  

In [None]:
*Ex18_Read_Zipped_File2.sas;
Filename ZIPFILE SASZIPAM 'c:\SASCourse\Week2\names.zip';
DATA newdata;
  INFILE ZIPFILE(yob1920.txt) DLM=',';
  INPUT name $ gender $ number;
       
RUN;
proc sort data=newdata; by gender descending number;
title " 5 most common girls' names";
proc print data=newdata (obs=5) noobs; 
var name number;
format number comma9.;
where gender='F';
run;
title " 5 most common boys' names";
proc print data=newdata (obs=5) noobs; 
var name number;
format number comma9.;
where gender='M';
run;
title ' ';

In the example-code below (part 1), the & modifier indicates that the value for 
Address should be read until two consecutive blanks are encountered.
Therefore, in the FIRST RECORD, the value Address is read from 
column 1 to 39. When blanks are encountered in both columns 40 and 41, 
the value is written to the PDV.

The LIST INPUT reads until the next blank is detected.  The default
length of numeric variables is 8. So you don't need to specify a w value
to indicate the length of a numeric variable.

This is different from using a numeric informat with formatted input 
(see the second example below).  In that case, you must specify a w value
in order to indicate the number of columns to be read.

In [None]:
*Ex24_Informat_List_Input_Formatted_Input (Part 1);
Data MLI; 
infile datalines firstobs=2;
input Address & $50. income :comma.;
format income dollar10.2;
datalines;
123456789012345678901234567890123456789012345678901234567890
Ann Tye 2219 Pine St Rockville MD 28057  $89,500.50
Rubi Tyson 6504 Spring St Philadelphia PA 19104  $46,500.00
;
title 'Modified List Input';
PROC print data=MLI; run;
PROC contents data=MLI varnum; 
ods select position; 
RUN;

In [None]:
* Ex24_Informat_List_Input_Formatted_Input (Part 2);
Data FI; 
infile datalines firstobs=2;
input @1 Address $50. @51 income comma10.2;
format income dollar10.2;
datalines;
123456789012345678901234567890123456789012345678901234567890
Ann Tye 2219 Pine St Rockville MD 28057           $89,500.50
Rubi Tyson 6504 Spring St Philadelphia PA 19104   $46,500.00 
;
title 'Formatted Input';
PROC print data=FI; run;
PROC contents data=FI varnum; 
ods select position; 
RUN;

In [None]:
*Ex25_read_from_web.sas;
Filename raw url 
    'http://data.princeton.edu/wws509/datasets/effort.dat';
data have1;
   infile raw  firstobs=2 truncover ;
   input record $80. ;
   put _all_;
   if _n_=5 then stop;
run;
proc print data=have1; run;

data have2;
   infile raw  firstobs=2 obs=5 truncover ;
   input country $ setting  effort  change ;
   put _all_;
 run;
proc print data=have2; run;

In [None]:
*Ex26_Modified_List_Input;
Options nocenter nodate nonumber ls=132;
data annual_exp2013;
infile datalines;
input Age  (Food Housing Clothing Transportation Healthcare	
      Entertainment	Pension_S)(:comma.);

FORMAT Food Housing Clothing Transportation Healthcare
      Entertainment	Pension_S dollar10.;
datalines;
0  4,698    10,379  1,513   5,672     943   1,243   2,153
25 6,197    17,207  1,832   9,183   2,189   2,214   5,178
35 7,920    20,619  1,960  10,519   3,188   2,958   6,791
45 7,907    19,001  1,826  10,78    3,801   2,070   7,305
55 6,711    17,937  1,563   9,482   4,378   2,651   6,593
65 6,020    15,639  1,222   7,972   5,188   2,488   2,833
75 4,144    12,314    768   5,149   4,910   1,422     832
;
proc print data=annual_exp2013 noobs; run;

In [None]:
*Ex30_@@.sas;
data work.HAVE;
 input date: Anydtdte. name $ study_hours @@;
datalines;
27Aug2018 Doris 5.5 28Aug2018 Alice 4.0 
29Aug2018 Mike 2.0 29Aug2018 James 1.0 
30Jun2018 Doris 3.0 31Aug2018 Alice 3.0 
01Sep2018 Mike 3.0 
02Sep2018 James 1.0
;
proc print data=work.HAVE;
Format date mmddyy10.;
run;

In [None]:
*Ex27_Amper_Modifier.sas;
DATA Work.Have; 
    INPUT st_name & $ 22. visit_date :mmddyy. ;
    FORMAT visit_date mmddyy10.;
    DATALINES;
    Washington DC,  01/01/2013  
    ;
PROC PRINT data=work.Have noobs; 
run;

In [None]:
*Ex28_Reading_Multiple_Files.sas (Part 1);
*Reading multiple raw data files into a single SAS data set;
*Method 1; 
FILENAME test ('C:\SASCourse\Week2\testfile1.csv',
               'C:\SASCourse\Week2\testfile2.csv',
               'C:\SASCourse\Week2\testfile3.csv');
data a; 
infile test DLM=','; 
input var1 $ var2 var3; 
run;
title 'Reading multiple raw data files into a single SAS data set (Method 1)';
proc print data=a noobs; run;

In [None]:
*Ex28_Reading_Multiple_Files.sas (Part 2);
* Method 2;
FILENAME test 'C:\SASCourse\Week2\testfile*.csv'; 
data b; 
infile test DLM=','; 
input var1 $ var2 var3; 
run;
title 'Reading multiple raw data files into a single SAS data set (Method 2)';
proc print data= b noobs; run;

## Method 3 

* Use an INFILE statement with the FILEVAR= option
* FILEVAR=variable causes the INFILE statement 
   to close the current input file and open a new 
   input file whenever the value of variable changes
   (e.g., testfile1, testfile2, testfile3). 

* END= option
 
 * The LASTFILE is a variable.
 
     * LASTFILE=0 when  the current input data record is not the last 
       record in the input file
       
     * LASTFILE=1 when  the current input data record is not the last 
       record in the input file


In [None]:
*Ex28_Reading_Multiple_Files.sas (Part 3);
data c;
 do i=1 to 3;
    add= "C:\SASCourse\Week2\testfile" || put(i,1.)|| ".csv";
    do until (lastfile);
        infile dummy filevar=add end=lastfile DLM=',';
        filename=add;
        input var1 $ var2 var3;
      output;
     end;
  end;
  stop;
  run;
  title 'Reading multiple raw data files into a single SAS data set (Method 3)';
  proc print data=c noobs; run;

In [None]:
*Ex29_Multiple_Input_Statements.sas;
title ' ';
data work.HAVE(drop=i);
 input date: Anydtdte9. @;
 do i = 1 to 4;
 input name $ hours_studied @;
 label date= 'Date'
       name = "Student's name"
       hours_studied = 'Hours studied*for STAT 4197/6197';
 output;
 end;
datalines;
27Aug2018 Doris 5.5 Alice 4.0 Mike 2.0 James 1.0
28Jun2018 Doris 3.0 Alice 3.0 Mike 3.0 James 1.0
;
proc sort data=work.HAVE; by name; run;
proc print data=work.HAVE noobs split='*';
by name;
Format date worddate.;
run;

In [None]:
*Ex32_putlog_specify_decimals;
options nodate nonumber nosource;
ods exclude all;
data _null_;
set sashelp.class;
putlog name 1-8 sex 9 age 10-15 weight 16-22 1 height 23-30 2;
run;
ods exclude none;


[Splitting data based on the values of the variables:  Author: novinosrin](https://communities.sas.com/t5/SAS-Data-Management/Splitting-variables-based-on-variable-values/m-p/511956#M15903)

In [None]:
data work.have;
format date date9.;
infile cards;
if mod(_n_,2) ne 0 then input Information $;
else input date;
cards;
X1
13005
G8
15006
;
proc print data=work.have noobs;
run;

In [None]:
*Ex34_put_putlog.sas (Part 2);
/*Use the PUTLOG or PUT statement to write to the SAS log*/
options nocenter nodate nonumber nonotes nosource; 
ods exclude all;
data _null_;
  input;
  if _N_ =1 then putlog 'Address of the Stat Department:';
   putlog _INFILE_ ;
 datalines4;
Department of Statistics
Columbian College of Arts & Sciences
Rome Hall
801 22nd St NW, 7th Floor
Washington, DC, 20052
Phone: 202-994-6356 | Fax: 202-994-6917
;;;;
ods exclude none;


In [None]:
*Adapted from SAS Documentation ;
*Exa35_input_numeric_character_data.sas (Part 1);
options nocenter nodate nonumber nosource;
data Have1;
 input some_numbers 1-6;
 datalines;
   23    /*input right aligned*/
 23      /*input not aligned */
23       /*input left aligned*/
00023    /*input with leading zeros*/
23.0     /*input with decimal point*/
2.3E1    /*in E notation, 2.30*/
230E-1   /*in E notation, 230x10*/ 
-23
;
title 'Have1 Data Set';
proc print data=Have1 noobs; run;


The BZw.d informat reads numeric values, converts 
    any trailing or embedded blanks to 0s, and ignores leading blanks.
    
 The BZw.d informat ignores blanks between a minus sign and 
  a numeric value in an input field.


In [None]:
*Adapted from SAS Documentation ;
*Exa35_input_numeric_character_data.sas (Part 2);
options nocenter nodate nonumber nosource;
data Have2;
 input @1 some_numbers bz4.;
 datalines;
2 3      /*embedded blank in the data - COMMA. or BZ. informat*/
- 23     /*embedded blank in the data - COMMA. or BZ. informat*/
;
title 'Have2 Data Set';
proc print data=Have2 noobs; run;


In [None]:
*Adapted from SAS Documentation;
*Exa35_input_numeric_character_data.sas (Part 3);
options nocenter nodate nonumber nosource;
data Have3;
 input @1 some_numbers comma6.;
 /*comma in the data - COMMA. informat*/
 /*parentheses in the data - COMMA. informat*/
 datalines;
2,341
(23)  
;
title 'Have3 Data Set';
proc print data=Have3 noobs; run;

In [None]:
*Ex37_Formmatted_Input_Formatted_put (Part 1);
options nocenter nodate nonumber nosource;
data Have1;
input @1 date1 date11.  +(-11) date2  date11. 
      +(-11) date3  date11. +(-11) date4  date11.;
Format date1 date11. date2 date9. date3 yymmdd10. date4 comma7.;
datalines;
29JAN2019
;
title 'Have1 Data Set';
proc print data=Have1;
var date:;
run;

In [None]:
*Ex37_Formmatted_Input_Formatted_put (Part 2);
title1; title2;
options nocenter nodate nonumber nosource;
data _null_;
date='29JAN2019'd;
put (4*date) (date11. +1 date9. +1 yymmdd10. +1 comma7. ) ;
run;

In [None]:
*Ex23_Length.sas;
data temp;
length x 4 y 3 ;
     do x=9006 to 9010;
        y=x;
       output;
     end;
proc print data=temp noobs; run;

In [None]:
*** Ex36_Week_2_List_of_Files.sas;
PROC IML;
SUBMIT / R;
setwd ("C:/SASCourse/Week2")
list.files(pattern="SAS", 
           full.names = TRUE, 
           ignore.case = TRUE)
ENDSUBMIT;
QUIT;