-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Allow setnames() to skip column names that don't exist #3030
Comments
It would be much easier to follow your request if you would include minimal reproducible example and expected output.
I think your case is not very common one. |
@jangorecki I can't give a reproducible example because it isn't fitting in this case; the function I have built gathers data from several external backend systems and then several other operations are carried out from there and the parameters used in the function/queries can affect the outcome in several ways, most of which can't be predicted accurately. The problem I have is how In terms of an expected outcome, what I want isn't an outcome, per se; my issue revolves more around the behaviour/execution of In that case, my expected outcome is for In terms of how common my case is, I think it's more common than you realise because people have been posting issues about very similar things for a number of years, the flavour of which is typically focused on wanting |
If your workflow is complicated enough that its behavior cannot be predicted, you can afford a one-liner to define your own setnames, I guess:
This is a reproducible example (illustrating what I assume you're after):
It is fitting to make an example to be clear about the change in behavior that you're requesting. Within your wall of text, it may be a chore to parse out which parts are describing the request and which are arguing for it. |
Probably an extra argument to setnames(DT, old, new, halt_if_absent = TRUE) library(data.table)
DT = data.table(a = 1, b = 2, d = 3)
setnames(DT, "A", "B")
#> Error ...
setnames(DT, "A", "B", halt_if_not_found = FALSE)
#> (Returns DT) Note that the current behaviour of I appreciate that it sucks when a long-running script is halted by a seemingly inconsequential error, but it's hard to see that skipping in the absence of a direction to change names is a useful default. In a similar vein, one could imagine someone writing DT[, x := y / 1000] when |
@HughParsonage I concur; this is precisely the sort of functionality I had in mind.
An additional argument ( |
I'm really not sold on the utility of such an argument. IMHO it's an error to try and supply non-existent names. The workaround only takes one line of code for those, so the convenience factor is next to 0 as well. |
I disagree because people won't be trying to supply non-existent names deliberately. In my specific case, all of the names in I appreciate that not everybody will want this, which is why the argument should be set to Also, I anticipate that there will be lots of people using Ultimately, I expect that this would provide a vehicle for a much more self-sufficient resolution because the whole investigative process for newcomers can all happen in a shorter space of time compared to having to trawl through Google/StackOverflow/GitHub in search of a solution. |
R is not really a language like Python where you might try to make errors deliberately... Adding an argument that will benefit a tiny % of users is not really "for free" as it adds another part of the API for us to potentially maintain. It might be another thing if the workaround was particularly |
IMO, the implication of I'd much rather the base function remain explicit, which allows me to write a custom version that is less explicit if needed. Especially when, as mentioned by others, doing so is quite trivial in this case. |
Of course you could because the default behaviour will remain exactly the same unless the additional argument is specifically changed to In which scenario(s) do you imagine the default |
It may help that I write down that on this project we've always tried to lean on the side of 'yes'. If a user wants something, especially this strongly, then why not? If 5% or even 1% of users would find it useful, that's good enough for me. Unless it could harm the other 95%, or it's particularly difficult or time-consuming to implement. In this case though, adding |
@mattdowle I'm glad that you agree and that you see value in this. Yes, admittedly the original post was too long; I wanted to make a good case is all. I would be delighted to submit a PR; before doing so I will first take a look at the current codebase to familiarise myself with it. |
For the sake of completeness, there is a question on SO, Rename variables based on values in another dataframe, which was answered using this new feature. |
There have been a number of similar posts to this, but nothing that seems to directly address the issue that I am proposing and which I think would be a very useful update.
The issue I am having is that I have implemented
setnames()
as part of a large and complicated function (500 lines) that I have built and to be presented with an error after having spent quite some time waiting for it to execute is unhelpful and also annoying because it means that the time I had spent waiting for it to complete was for nothing.What would have been really helpful in this scenario is an
if/else
-type condition when checkingold
columns and skipping them if they didn't exist.Something along the lines of:
The reason I would check it with a
! (NOT)
is because the very nature ofsetnames()
checks to see if the names do exist; if so, it continues as normal because that is literally what it is designed for. Therefore, checking for existing names withinsetnames()
is futile because that is whatsetnames()
is doing in the first instance.The trouble with the current setup of
setnames()
is that it works on the assumption that I know precisely which columns exist at any given point.I don't.
I have a good idea, but it isn't always correct because there are literally dozens of potential scenarios that could occur based on numerous variations of how the data I am extracting are gathered.
Approximately 90.00%-95.00% of the time, the columns that I am working with are typically the same ones, but there are those odd occasions (the other 5.00%-10.00%) where this is not the case and all of a sudden, something breaks unexpectedly and I am left trying to figure out a new way to hack another version of this function together to handle this single use case just one time, which is neither a good thing to have to do nor good practice.
All of this could be avoided with a simple check inside
setnames()
to see if the columns exist or not.Referring to columns by number might work in some cases, but I think it's neither here nor there because different people will need to refer to columns in different ways; names work best in some scenarios and numbers in others.
In my case, names work best because the columns aren't always in the same order and as such using column numbers could break my data frame and it's possible that I wouldn't know until the end.
Also, in my case if the column doesn't exist then it isn't a problem for me and I am happy to proceed onto the next name in the list. I'm sure that some people would like a warning notification if this happens, and I agree that it is a helpful feature to have, which is why I'm not advocating for it to be removed - not at all. Perhaps a more condensed version of the warning can appear on the fly, or even a summary at the end would be good to have.
An
if/else
-type condition like this would most certainly make things easier for me, at least, and I'm sure several others would agree.Update:
@HughParsonage suggested an additional argument and this is precisely what I was thinking of.
Something along the lines of:
It would include an additional argument (skip_absent) which is set to FALSE by default and which won't affect that the way that setnames() functions for anybody unless the
skip_absent
flag is specifically set toTRUE
at the time of calling.The text was updated successfully, but these errors were encountered: