# Chapter 17 - Reading Free-Format Data

## Table of Contents:

1. [Objectives](#Objectives)
2. [Free-Format Data](#Free-Format-Data)
3. [Reading Missing Values](#Reading-Missing-Values)
4. [Specifying the Length of Character Variables](#Specifying-the-Length-of-Character-Variables)
5. [Modifying List Input](#Modifying-List-Input)
6. [Creating Free-Format Data](#Creating-Free-Format-Data)
7. [Mixing Input Styles](#Mixing-Input-Styles)

### Objectives

* use the `INPUT` statement with list input to read free-format data (data that is not organized in fixed fields)
* use the `INPUT` statement with list input to read free-format data that is separated by nonblank delimiters, such as commas
* free-format data that contains missing values
* char values that exceed 8 characters
* nonstandard free-format data
* char values that contain embedded blanks
* mix column, formatted, and list input styles in a single `INPUT` statement

[(back to top)](#Table-of-Contents:)

### Free-Format Data

**Free-format data** contains data that is not arranged in fixed fields, though there is often times a delimiter that separates information in any given record.

The list input can be used to read standard and nonstandard free-format data. Its syntax is thus:

    input <variable list> <$ after every char variable>;

Simply list the variable names in the order corresponding to raw data fields. Fields must be read from left to right and cannot be skipped or re-read. For the variables to be created in the PDV, this generally comes after the `INFILE` statement:

    data <output dataset>;
        infile filename dlm = <delimiter>;
        input <variable list>;
    run;
    
Here, the DLM option specifies the delimiter to look for in the raw data file. As always, the `INPUT` statement can accomodate variable instances if variables have incremental names.

SAS reads search record, associating the data that comes before a delimiter as corresponding to a particular variable in the list. It scans until the variables generated in the PDV have been filled or it has reached the end-of-record.

[(back to top)](#Table-of-Contents:)

### Reading Missing Values

If missing values occur at the beginning or end of the record, the `MISSOVER` option in the `INFILE` statement can be used to assign missing values to variables with missing data at the beginning/end of a record. It prevents SAS from filling in variables with data from the next record if it does not find values in the current line. Its syntax is like this:

    infile filename missover;
    
For missing data elsewhere, use the Delimiter Sensitive Data (`DSD`) option in the `INFILE` statement:

    infile filename dsd;
    
This ssets the default delimiter to be a comma and treats 2 consecutive delimiters as a missing value. Quotation marks are removed from all values. The `DLM =` option can be used to specify the delimiter for DSD.

[(back to top)](#Table-of-Contents:)

### Specifying the Length of Character Variables

The `LENGTH` statement allows you to specify the length of character variables:

    length charvar1 $ n1 charvar2 $ n2 ...;
    
If the lengths for multiple variables are the same, list them and then place the dollar sign and length number after them. The `LENGTH` statement should precede the `INPUT`/`INFILE`/`SET` statements in the DATA step. A variable specified in the `LENGTH` statement will appear first in the dataset.

[(back to top)](#Table-of-Contents:)

### Modifying List Input

To **read char values that contain embedded blanks**, use the **ampersand (&) modifier**:

    input charvar &;
    
To **read nonstandard data values and/or char values that are longer than 8 chars**, use the **colon modifier (:)**:

    input charvar : <informat.>;

[(back to top)](#Table-of-Contents:)

### Creating Free-Format Data

Use the `PUT` statement with list output to create free-format raw data files:

    put variable <: format>;
    
where the colon (:) precedes every format. Specify the file using the `FILE` statement:

    data _null_;
        set <input dataset>;
        file <file path> dlm = <delimiter>;
        put <variable list with formats>;
    run;
    
where the `DLM =` option can be used in the `FILE` statement to create a delimiter.

The `EXPORT` procedure works much to the same effect:

    proc export data = <output dataset>;
        outfile = filename <delimiter = 'delimiter'>;
    run;

If your values contain commas, use the `DSD` option in the `FILE` statement like before.

[(back to top)](#Table-of-Contents:)

### Mixing Input Styles

Input styles can be mixed, like in this example:

    data sasuser.mixedstyles;  
       infile rawdata.dat;  
       input SSN $ 1-11 @13 HireDate date7.  
             @21 Salary comma6. Department : $9. Phone $;  
    run;

[(back to top)](#Table-of-Contents:)