Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV - loss of double quote when dataset is updated #197

Closed
patrickToca opened this issue Jul 13, 2023 · 2 comments
Closed

CSV - loss of double quote when dataset is updated #197

patrickToca opened this issue Jul 13, 2023 · 2 comments

Comments

@patrickToca
Copy link

patrickToca commented Jul 13, 2023

Using goawk version 1.23.1 on Macos Ventura.

I get a result that appears to modify fields unexpectedly. Though the change aimed for is correctly done.

The command used:
goawk -i csv 'BEGIN {FS=OFS=","}{if ($41==9999) {$41="NULL"}};{$33="""$33"""; print $0}' source.csv > target.csv

See the source.csv and the target.csv content is shown below.

In source, the field $3 is double quoted.
In target, the field $3 loses the double quotes. Consequence: the number of fields is modified on certain records.

The $33 fields are correctly modified.
The $41 fields are correctly modified.

Could it be an error in the command spec. ?
or is it a goawk real issue?

----source.csv----
99190867052015021115470500009697,,"13a Providence Street",,WF1 3BG,672570090000,170,G,B1 Offices and Workshop businesses,2015-02-06,E08000036,E14001009,,2015-02-11,Mandatory issue (Marketed sale).,31,83,3,Grid Supplied Electricity,,,,31,33.99,21.24,56.64,115.71,No,,,4,Heating and Natural Ventilation,"13a Providence Street",Wakefield,Wakefield,WAKEFIELD,2015-02-11 15:47:05,,,13a
99206150022015021213585120790090,,"The Pizza Shop","55 Lake Lock Road",WF3 4HP,927051080000,156,G,A3/A4/A5 Restaurant and Cafes/Drinking Establishments and Hot Food takeaways,2015-01-30,E08000036,E14000826,,2015-02-12,Mandatory issue (Non-marketed sale).,34,99,3,Grid Supplied Electricity,,,,64,102.08,68.93,201.99,317.54,No,,,4,Heating and Natural Ventilation,"The Pizza Shop, 55 Lake Lock Road",Wakefield,Morley and Outwood,WAKEFIELD,2015-02-12 13:58:51,,63063364,Address Matched,55
99208510022015021215584572990411,UNITS 1-12 AND S1-S10,Bizspace,"Headway Business Park, Denby Dale Road",WF2 7AZ,179615280001,93,D,B8 Storage or Distribution,2015-01-14,E08000036,E14001009,,2015-02-12,Mandatory issue (Marketed sale).,26,76,3,Natural Gas,,,,25906,44.99,23.19,67.95,83.27,No,,,4,Heating and Natural Ventilation,"UNITS 1-12 AND S1-S10, Bizspace, Headway Business Park, Denby Dale Road",Wakefield,Wakefield,WAKEFIELD,2015-02-12 15:58:45,,,9999

----target.csv-----
99190867052015021115470500009697,,13a Providence Street,,WF1 3BG,672570090000,170,G,B1 Offices and Workshop businesses,2015-02-06,E08000036,E14001009,,2015-02-11,Mandatory issue (Marketed sale).,31,83,3,Grid Supplied Electricity,,,,31,33.99,21.24,56.64,115.71,No,,,4,Heating and Natural Ventilation,"13a Providence Street",Wakefield,Wakefield,WAKEFIELD,2015-02-11 15:47:05,,,13a
99206150022015021213585120790090,,The Pizza Shop,55 Lake Lock Road,WF3 4HP,927051080000,156,G,A3/A4/A5 Restaurant and Cafes/Drinking Establishments and Hot Food takeaways,2015-01-30,E08000036,E14000826,,2015-02-12,Mandatory issue (Non-marketed sale).,34,99,3,Grid Supplied Electricity,,,,64,102.08,68.93,201.99,317.54,No,,,4,Heating and Natural Ventilation,"The Pizza Shop, 55 Lake Lock Road",Wakefield,Morley and Outwood,WAKEFIELD,2015-02-12 13:58:51,,63063364,Address Matched,55
99208510022015021215584572990411,UNITS 1-12 AND S1-S10,Bizspace,Headway Business Park, Denby Dale Road,WF2 7AZ,179615280001,93,D,B8 Storage or Distribution,2015-01-14,E08000036,E14001009,,2015-02-12,Mandatory issue (Marketed sale).,26,76,3,Natural Gas,,,,25906,44.99,23.19,67.95,83.27,No,,,4,Heating and Natural Ventilation,"UNITS 1-12 AND S1-S10, Bizspace, Headway Business Park, Denby Dale Road",Wakefield,Wakefield,WAKEFIELD,2015-02-12 15:58:45,,,NULL

@benhoyt
Copy link
Owner

benhoyt commented Jul 13, 2023

Hi @patrickToca, this is actually not a bug, but it happening because you're not outputting in CSV mode. You're using -i csv to set "CSV input mode", but you need -o csv as well, to set "CSV output mode". This will properly quote fields in the output that have commas in them. Note that in CSV input mode FS is ignored, and in CSV output mode OFS is ignored, so you don't need to set those.

The particular field that's tripping you up is field $4, which on line 3 has a comma in it: "Headway Business Park, Denby Dale Road". That's becoming two fields in the output, due to the comma. But in CSV output mode that is properly quoted.

See also the "NOTE" in the docs for CSV output mode -- you'll need to use a bare print rather than print $0 in CSV output mode.

Also, not that it's causing a problem, but you can shorten {if ($41==9999) {$41="NULL"}} to use an AWK pattern-action construct, instead of an if statement -- so it becomes $41==9999 {$41="NULL"}.

Overall, I believe the equivalent script to what you have, but handling quoting correctly, is as follows:

goawk -i csv -o csv '$41==9999 {$41="NULL"} { print }' source.csv > target2.csv

@benhoyt benhoyt closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023
@patrickToca
Copy link
Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants