Skip to content

a-sansom/awk-csv-dequote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

An AWK program that 'dequotes' CSV fields for various defined/configured record types.

Configuration of record types and the fields that are to be dequoted is placed in the dequote-config file, and should be a pair of values in the format record_type=comma separated field indexes. For example:

# TYPE_1 records to dequote second and fourth fields.
TYPE_1=2,4
# TYPE_2 records to dequote third, fourth and fifth fields.
TYPE_2=3,4,5

The configuration file should be the first file in the list to process.

The program accepts a single, optional, argument RECORD_TYPE_INDEX which should be the numerical index of which input file field the record type identifier can be found. If not supplied it defaults to the first field.

Usage:

awk -v RECORD_TYPE_INDEX=1 -f dequote.awk dequote-config test_data.csv

Where the dequote-config content is as the example above and the test_data.csv file contains:

"TYPE_1","a","b","c","d","e"
"TYPE_2","f","g","h","i","j"
"TYPE_1","\"k\"","l","m","n","o"
"TYPE_3","k","l","m","n","o

The result is:

"TYPE_1",a,"b",c,"d","e"
"TYPE_2","f",g,h,i,"j"
"TYPE_1",\"k\","l",m,"n","o"
"TYPE_3","k","l","m","n","o"

For more information, both dequote-config and dequote.awk are commented.

About

Use AWK to remove CSV field quotes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages