Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to set a default delimiter #44

Open
HerbCSO opened this issue Sep 25, 2016 · 15 comments
Open

Option to set a default delimiter #44

HerbCSO opened this issue Sep 25, 2016 · 15 comments

Comments

@HerbCSO
Copy link

HerbCSO commented Sep 25, 2016

OK, can I just say first of all that I'm IN LOVE with this toolkit!? It is an absolute joy and it fills a gaping void for me! ;]

Anyway, I work with a lot of TSV files (tab-separated) and it's kind of annoying to have to type -d "\t" for every single command I run (but thank you for providing the option!). I'd love to be able to change the default separator from , to \t.

Unfortunately it's not easy to alias since the -d must come AFTER the command. I suppose I could do something like xsvtab() { xsv $1 -d "\t" ${@:2}; } (and maybe even alias xsv=xsvtab if I'm feeling really lazy - but then I lose the ability to override the delimiter if I actually do have a different type of file and I have to run it with \xsv to use the original command instead of the alias (specifying -d again with the above alias results in an Invalid arguments. error because now -d is specified twice)). That all feels a little clunky, although it does kind of work for my use case, it would just be slicker to be able to override the default.

@BurntSushi
Copy link
Owner

I'd support an environment variable to set this.

One trick you may not be aware of (and it may not work for your use case), but if your csv files have a .tsv extension, then xsv should use a tab delimiter automatically.

@HerbCSO
Copy link
Author

HerbCSO commented Sep 25, 2016

Oh, that's cool, I was indeed not aware of that. Unfortunately a lot of files I work with don't have a .tsv extension for... reasons...

But an environment variable for that would be totally awesome!

@ghuls
Copy link

ghuls commented Jan 5, 2017

Maybe environment variables based on awk variable names would be a good idea:

XSV_FS = input field separator (input field delimiter)
XSV_OFS = output field separator r (output field delimiter)

An environment variable for specifying additional file extensions for automatically interpreting files as TAB-separated files, would be useful too. A lot of bioinformatic related file formats are TAB-separated file formats.

XSV_TSV='bed|gtf|gff|tsv|vcf'

@link2xt
Copy link

link2xt commented Jan 18, 2017

How about defaulting to automatic separator detection? Separator is either ',' or '\t', whichever comes first.

@BurntSushi
Copy link
Owner

@ilabdsf Won't work because escaping/quoting permits either of those characters to be present before the first field separator.

@link2xt
Copy link

link2xt commented Jan 19, 2017

@BurntSushi right, that is why I say "default to". If someone wants a robust script, he can write "-d,". But when just using xsv from the command line, I think it is acceptable. It is very unlikely to have have tabs or commas in column names.

@iliekturtles
Copy link

A bash function can be setup to always use a specific delimiter:

function xsvt() {
    local cmd=$1
    shift && command xsv $cmd -d"\t" $@
}

@camerondavison
Copy link

I vote for an environmental variable that can set the default input/output for all of the pipes maybe just something like
XSV_DEFAULT_DELIMITER

@ghuls
Copy link

ghuls commented Aug 24, 2017

@iliekturtles
The bash function should be a a lot more complex than that.

Some things that don't work:

$ xsvt -h
Unknown flag: '-d'

Usage:
    xsv <command> [<args>...]
    xsv [options]

# No output (instead of help):
$ xsvt

# Overriding -d does not work:
$ printf '1\t2\n' | xsvt input -d '\t'
Invalid arguments.

Usage:
    xsv input [options] [<input>]

@camerondavison
Copy link

Not sure if you take pull requests or not, but I created one for my proposal. I have not written much rust code, but I thought that this seemed easy enough things to dive and and learn some. I was thinking about trying to document the variable somewhere, but was not really sure where. Do you want me to put it into all of the help strings?

@jimmywan
Copy link

@BurntSushi can you take a look at #94 ?

@richardjharris
Copy link

richardjharris commented Aug 19, 2018

alias tsv='xsv -d "\t"'

@nickray
Copy link

nickray commented Nov 29, 2018

The main issue for me is that the output delimiter cannot always be configured. If I get some TSV files (with embedded commas), and do something like xsv cat, then since there is no --out-delimiter flag, the files get corrupted.

So to make everything composable, it would be great if an env delimiter influences both input and output.

@BurntSushi
Copy link
Owner

@nickray You can add xsv fmt -t '\t' to the end of your pipeline.

@nickray
Copy link

nickray commented Nov 29, 2018

I didn't realize that xsv cat rows (and probably others) inserts double quotes, I would have expected both.csv to be broken in the following (my mental model of CSV is something like line.split(",")...):

printf "a\tb\nx,y\tz\n" > file1.tsv 
printf "a\tb\nx\ty,z\n" > file2.tsv

echo ":: both.csv"
xsv cat rows -d '\t' file?.tsv > both.csv
cat both.csv

echo ":: both.tsv"
xsv fmt -t '\t' both.csv > both.tsv
cat both.tsv

Good to know it's not!

:: both.csv
a,b
"x,y",z
x,"y,z"
:: both.tsv
a       b
x,y     z
x       y,z

Would still be helpful to avoid the quotation dance, and use tabs (or 0x1f) throughout by just doing something like export XSV_DELIMITER='\t'; xsv cat rows file?.tsv > both.tsv, particularly when chaining xsv calls (or, for instance, xsv partition would seem to need an xargs or parallel call to end up with TSV splits) . Maybe my Rust skills are soon good enough to contribute soon 🌱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants