Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.getopt: Add character-separated elements support for arrays and associative arrays #2049

Merged
merged 2 commits into from Apr 1, 2014
Merged

Conversation

ghost
Copy link

@ghost ghost commented Mar 24, 2014

I sure hope I got this right. I do love writing templates in D. :>

https://d.puremagic.com/issues/show_bug.cgi?id=5316

@ghost ghost mentioned this pull request Mar 24, 2014
@monarchdodra
Copy link
Collaborator

How does one "escape" the comma though? I'm thinking

./myDb --user=Robertson,Paul --user=Smith,John

Does this still work, or will it now create 4 users (instead of 2)?

We could decide to "split on sep" only if the field appears once? But then I'd expect this:

./myFun --data=1,2,3 --data=4,5,6 to produce a 6 element array.

Maybe it would be better to simply split on white? It's much easier to handle in terms of common usage (people know white is ambiguous), and breaks nothing. EG:

./myDb --user Robertson,Paul Smith,John, "Bush,George W."

@ghost
Copy link
Author

ghost commented Mar 25, 2014

./myDb --user Robertson,Paul Smith,John, "Bush,George W."

What would this produce?

@monarchdodra
Copy link
Collaborator

./myDb --user Robertson,Paul Smith,John, "Bush,George W."

What would this produce?

string[] users = ["Robertson,Paul", "Smith,John", "Bush,George W." ];

@ghost
Copy link
Author

ghost commented Mar 25, 2014

This is definitely a complicated issue. I'll wait for others to comment for suggestions.

@andralex
Copy link
Member

This is fine as is. We don't want to get into issues such as shell quoting etc. One fixed separator should be enough. Only question I have is - should be be string instead of dchar?

@monarchdodra
Copy link
Collaborator

This is fine as is. We don't want to get into issues such as shell quoting etc.

I realize it's much easier to just hand wave the problem away, but this is a breaking change, and there is no "opt-out".

We aren't getting into issues as such as "shell quoting", since these are already taken care of by said shell. Once "getopt" is invoked, it already has tokens. I think it is better to simply let the shell do the tokenizing (with/without escaping) on white.

From there, given my example: ./myDb --user Robertson,Paul Smith,John, "Bush,George W."
getopt would only have to "parse": ["myDb", "--user", "Robertson,Paul", "Smith,John", "Bush,George W."]
Notice how there is strictly no ambiguity in the separators: All this was already taken care of by shell.

As is, I can't give an LGTM.


Another (maybe more involved?) solution, would be to make it possible to work with an opt-in specified separator?

Something like:

string[] s;
getopt(args,
    "users", s, TokenSep(',')
);

This would trigger the opt-in request for token separating. This is an off the top of my head idea. The fact you can specify the separator is a very good thing, IMO, as I've seen programs that separate on ';', ':' or others.

@ghost
Copy link
Author

ghost commented Mar 26, 2014

Only question I have is - should be be string instead of dchar?

@andralex: I've tried to be consistent with the other options, assignChar is a dchar, but I never thought about using multi-char separators! I'll change it to a string and add another test-case.

This would trigger the opt-in request for token separating.

"users", s, TokenSep(',')

There is no need for this "TokenSep"-style API, we could just check whether arraySep was manipulated. I know it's better for a side-effect free API but we already have global assignChar and endOfOptions, I want us to be consistent. A future redesign of getopt should be side-effect free.

@ghost
Copy link
Author

ghost commented Mar 26, 2014

So essentially we could split by whitespace by default, unless the user does things like: arraySep = ",". @monarchdodra : would this work?

@monarchdodra
Copy link
Collaborator

So essentially we could split by whitespace by default, unless the user does things like: arraySep = "," . @monarchdodra : would this work?

Essentially, I don't think we should be splitting on anything by default. Especially not whites, that's the shell's job. You'd break this trivial use case: doit.txt --input "My File.txt" --input "My File 2.txt".

ideally I'd think that having multiple tokens after an array/AA parameter should simply take them into account (doit.txt --input 1 2 3). Currently, this simply sets the input to [1], but I think it should take [1, 2, 3]. However, when programs do this, I don't know what "magic" they use to know when to "stop". parsing parameters?

Having an opt-in separator would be ideal, but I really stress the "opt-in" part.

There is no need for this "TokenSep"-style API, we could just check whether arraySep was manipulated. I know it's better for a side-effect free API but we already have global assignChar and endOfOptions , I want us to be consistent. A future redesign of getopt should be side-effect free.

Hum... works for me as a workaround. It's not the "global" issue that I don't like, but potentially splitting two different fields that have nothing in common. But if we simply label it as a "limitation", I'm fine with it.

@andralex
Copy link
Member

Spec'ing this is getting out of hand. Probably it's best to just close it.

  1. This is already doable by passing the flag multiple times
  2. User code can always split etc. the way they want.

To make this a non-breaking change, we could just set the implicit separator to the empty string. I think it would be nice to accept this with a separator string (by default "" i.e. no multiple elements) with no escaping and no frills, and move on. If this is not to everybody liking let's close this.

@monarchdodra
Copy link
Collaborator

To make this a non-breaking change, we could just set the implicit separator to the empty string. I think it would be nice to accept this with a separator string (by default "" i.e. no multiple elements) with no escaping and no frills, and move on.

Sounds good to me.

If this is not to everybody liking let's close this.

It's to my liking. @AndrejMitrovic ? Seems easy enough to do, I think?

@andralex
Copy link
Member

@AndrejMitrovic let's convert this into a miraculous win!

@ghost
Copy link
Author

ghost commented Mar 31, 2014

I don't think it's that easy. Consider this:

import std.getopt;

void main()
{
    string[] files;
    auto args = (["program.name", "--files", "file1.txt", "file2.txt"]).dup;
    getopt(args, "files", &files);

    assert(files == ["file1.txt"]);
    assert(args == ["program.name", "file2.txt"]);
}

This would break with the proposed enhancement, which would assume "file2.txt" should be put into the files array.

@monarchdodra
Copy link
Collaborator

I don't think it's that easy.

Yeah... What I proposed would indeed be a change of behavior. Better to simply leave the current behavior as is. I'm fine with it.

So all that's left is making a global splitToken variable, and split on that when non-empty? I'm fine with that. To paraphrase andrei, I think it would be an overall improvement made of win.

@ghost
Copy link
Author

ghost commented Mar 31, 2014

Ok, so the supported set of features are:

string[] files;  // files gets "file1.txt" *only* (same behavior as old behavior)
auto args = (["program.name", "--files", "file1.txt", "file2.txt"]).dup;

Ditto, old behavior:

string[] files;  // files gets ["file1.txt", "file2.txt"]
auto args = (["program.name", "--files", "file1.txt", "--files", "file2.txt"]).dup;

New feature:

arraySep = ",";
string[] files;  // files gets ["file1.txt", "file2.txt"]
auto args = (["program.name", "--files=file1.txt,file2.txt"]).dup;

@ghost
Copy link
Author

ghost commented Mar 31, 2014

Is that it?

@monarchdodra
Copy link
Collaborator

Is that it?

Sounds good to me!

What about:

arraySep = ",";
auto args = (["program.name", "--files=file1.txt,file2.txt"]).dup;
vs
auto args = (["program.name", "--files file1.txt,file2.txt"]).dup;

Same behavior for both?

@ghost
Copy link
Author

ghost commented Mar 31, 2014

auto args = (["program.name", "--files file1.txt,file2.txt"]).dup;

That one will throw an exception thinking the entire second string is a switch (current behavior).

Did you mean the following?:

auto args = (["program.name", "--files", "file1.txt,file2.txt"]).dup;

@monarchdodra
Copy link
Collaborator

Did you mean the following?:

Yes. Same behavior for both?

@ghost
Copy link
Author

ghost commented Mar 31, 2014

Yes. Same behavior for both?

Yep. Updated pull request.


if (arraySep == "")
{
*receiver ~= to!(typeof((*receiver)[0]))(val);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I can use to!E here. Should I fix it up (since the tester is green)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

andralex added a commit that referenced this pull request Apr 1, 2014
std.getopt: Add character-separated elements support for arrays and associative arrays
@andralex andralex merged commit 74d3947 into dlang:master Apr 1, 2014
@monarchdodra
Copy link
Collaborator

andralex merged commit

Awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants