Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add library of helper functions for creating/processing CSV files #127

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

benhoyt
Copy link
Owner

@benhoyt benhoyt commented May 28, 2022

This PR adds a lib.awk library with various helper functions, intended mainly for CSV processing and CSV file creation. The idea here is to try these out in pure AWK form (with fairly inefficient implementations!) and then move to native Go implementations later if they prove useful.

printheader / printfields

The intention here is to allow you to create CSV files from scratch. Set OFIELDS and use printheader() in BEGIN to set up and print the field names, then use printfields(a) to output the rows. I'm not sure whether setfields() should be "public" or not. Or maybe we could just have setfields() and to print you'd just say setfields(a); print. The printfields() function is a bit handier though.

Note: I'm still not convinced we need these functions at all, and would like to see real-world use cases where they actually make the code clearer or simpler. See the "Why not just this?" comments under the examples below.

# Set fields from array a according to the order in OFIELDS, which must have
# field numbers as keys (from 1 to N) and field names as values, for example
# OFIELDS[1] = "name"; OFIELDS[2] = "age".
function setfields(a,    i) { ... }

# Call setfields(a) and then print the current row.
function printfields(a) { ... }

# Print the header (field names) from OFIELDS
function printheader(    i) { ... }

Example usage, to create a 3-row CSV file (plus header row):

BEGIN {
	OFIELDS[1] = "name"
	OFIELDS[2] = "age"
	printheader()

	a["name"] = "Smith, Bob"
	a["age"] = 42
	printfields(a)

	a["name"] = "Brown, Jill"
	a["age"] = 37
	printfields(a)

	delete a
	a["name"] = "Bug, June"
	printfields(a)
}

# Why not just this?
#BEGIN {
#	print "name", "age"
#
#	print "Smith, Bob", 42
#	print "Brown, Jill", 37
#	print "Bug, June", ""
#}

Or, to create a CSV file from a much larger input:

BEGIN {
	OFIELDS[1] = "ID"
	OFIELDS[2] = "Name"
	printheader()
}

{
	a["ID"] = @"School_Id"
	a["Name"] = @"Org_Name"
	printfields(a)
}

# Why not just this?
#BEGIN { print "ID", "Name" }
#{ print @"School_Id", @"Org_Name" }

delfield, insfield, fieldnum

The intention with delfield and insfield is to allow you to easily delete or insert columns from many-columned CSV files where that is simpler than re-printing all the fields.

I think the delete/insert one field would be the common case, hence the singular names, but it is a bit weird when you're using them to delete/insert multiple fields (maybe we should have both delfield(n) and delfields(n, num)? It's also arguably a bit unexpected that if you called delfield(n, 0) or insfield(n, 0) it would actually delete/insert 1 field, not 0 (because of how AWK "default" arguments work).

Both functions can be used with the fieldnum() helper that returns the number of a given field name from FIELDS.

# Delete the nth field from $0. If num is given, delete num fields starting
# from the nth field.
function delfield(n, num,    i) { ... }

# Insert a new empty field just before the nth field in $0. If num is given,
# insert num empty fields just before the nth field.
function insfield(n, num,    i) { ... }

# Return the number of the given named field, or 0 if there's no field with
# that name. Only works in -H/header mode.
function fieldnum(name,    i) { ... }

Examples:

# To delete the first two fields:
{ delfield(1, 2); print }

# To delete the field named "School_Id":
{ delfield(fieldnum("School_Id")) }

# To add a "Num" record number field as the first field:
{ insfield(1); $1 = NR==1?"Num":NR-1; print }

Fixes #125.

@benhoyt benhoyt changed the title Add library of helper functions (mainly for CSV processing/creation) Add library of helper functions for creating/processing CSV files May 28, 2022
@benhoyt benhoyt marked this pull request as draft March 25, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add helper functions for CSV processing
1 participant