Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modulation of appending/overwriting standardized variables in standardize() not working #31

Closed
strengejacke opened this issue Nov 4, 2021 · 12 comments

Comments

@strengejacke
Copy link
Member

Related to #30

d <- iris[1:4, ]

## NOT WORKING
# this should only return the two standardized variables, including suffix
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = FALSE, suffix = "_z")
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1    1.2402159   1.3887301          1.4         0.2  setosa
#> 2    0.3382407  -0.9258201          1.4         0.2  setosa
#> 3   -0.5637345   0.0000000          1.3         0.2  setosa
#> 4   -1.0147221  -0.4629100          1.5         0.2  setosa

## working
# this should return the original data frame and column bound 
# the standardized variables, including suffix
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = TRUE, suffix = "_z")
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z
#> 1          5.1         3.5          1.4         0.2  setosa      1.2402159
#> 2          4.9         3.0          1.4         0.2  setosa      0.3382407
#> 3          4.7         3.2          1.3         0.2  setosa     -0.5637345
#> 4          4.6         3.1          1.5         0.2  setosa     -1.0147221
#>   Sepal.Width_z
#> 1     1.3887301
#> 2    -0.9258201
#> 3     0.0000000
#> 4    -0.4629100

## NOT WORKING
# this should only return the standardized variables, w/o suffix
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = FALSE, suffix = NULL)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1    1.2402159   1.3887301          1.4         0.2  setosa
#> 2    0.3382407  -0.9258201          1.4         0.2  setosa
#> 3   -0.5637345   0.0000000          1.3         0.2  setosa
#> 4   -1.0147221  -0.4629100          1.5         0.2  setosa

## NOT WORKING
# this should return the original data frame and the the standardized variables
# *overwrite* the related variables
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = TRUE, suffix = NULL)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#> 1    1.2402159   1.3887301          1.4         0.2  setosa          5.1
#> 2    0.3382407  -0.9258201          1.4         0.2  setosa          4.9
#> 3   -0.5637345   0.0000000          1.3         0.2  setosa          4.7
#> 4   -1.0147221  -0.4629100          1.5         0.2  setosa          4.6
#>   Sepal.Width
#> 1         3.5
#> 2         3.0
#> 3         3.2
#> 4         3.1

Created on 2021-11-04 by the reprex package (v2.0.1)

@strengejacke
Copy link
Member Author

@mattansb @DominiqueMakowski If we change anything here, we need to be explicit in setting arguments in our other packages where we call standardize() on data frames.

@strengejacke
Copy link
Member Author

expected behaviour (from my point of view):

d <- iris[1:4, ]

# this should only return the two standardized variables, including suffix
sjmisc::std(d, Sepal.Length, Sepal.Width, append = FALSE, suffix = "_z")
#>   Sepal.Length_z Sepal.Width_z
#> 1      1.2402159     1.3887301
#> 2      0.3382407    -0.9258201
#> 3     -0.5637345     0.0000000
#> 4     -1.0147221    -0.4629100

# this should return the original data frame and column bound 
# the standardized variables, including suffix
sjmisc::std(d, Sepal.Length, Sepal.Width, append = TRUE, suffix = "_z")
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z
#> 1          5.1         3.5          1.4         0.2  setosa      1.2402159
#> 2          4.9         3.0          1.4         0.2  setosa      0.3382407
#> 3          4.7         3.2          1.3         0.2  setosa     -0.5637345
#> 4          4.6         3.1          1.5         0.2  setosa     -1.0147221
#>   Sepal.Width_z
#> 1     1.3887301
#> 2    -0.9258201
#> 3     0.0000000
#> 4    -0.4629100

# this should only return the standardized variables, w/o suffix
sjmisc::std(d, Sepal.Length, Sepal.Width, append = FALSE, suffix = NULL)
#>   Sepal.Length Sepal.Width
#> 1    1.2402159   1.3887301
#> 2    0.3382407  -0.9258201
#> 3   -0.5637345   0.0000000
#> 4   -1.0147221  -0.4629100

# suffix = NULL doesn't work here, need "" to overwrite

# this should return the original data frame and the the standardized variables
# *overwrite* the related variables
sjmisc::std(d, Sepal.Length, Sepal.Width, append = TRUE, suffix = "")
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1    1.2402159   1.3887301          1.4         0.2  setosa
#> 2    0.3382407  -0.9258201          1.4         0.2  setosa
#> 3   -0.5637345   0.0000000          1.3         0.2  setosa
#> 4   -1.0147221  -0.4629100          1.5         0.2  setosa

Created on 2021-11-04 by the reprex package (v2.0.1)

@strengejacke
Copy link
Member Author

We could reduce the functionality to either overwrite or append standardized variables, then we would need only suffix. If we want the above shown behaviour, we would need append as well. So we should first decide how flexible this function should/needs to be, and then we can decide on the choice of arguments.

@strengejacke
Copy link
Member Author

See #32

@strengejacke strengejacke mentioned this issue Nov 4, 2021
@mattansb
Copy link
Member

mattansb commented Nov 4, 2021

Not sure I follow - are we talking about dropping the append arg, and having the suffix arg control whether the std-cols are appended or not, but in any case non-std cols are always returned?

If so, that sounds fine to me.

@strengejacke
Copy link
Member Author

We can have two options: either, drop append, and then we can either overwrite or append standardized variables (see #32). Or we keep append, and can then also include/drop the full data frame (see #31 (comment)).

@DominiqueMakowski
Copy link
Member

Mmmh in this light I would say that we only keep append as an argument.

  • append=FALSE (default): only keep standardized; no renaming
  • append=T: original data kept, standardized added with a default suffix (_Std or _z or whatnot)
  • append="_standardized", original data kept, std. added with the said suffix.

The only option not available out-of-the-box is to rename & replace, but it's fine i think since i hardly see a common usecase for that and that just standardizing and renaming in a different step is a minor added trouble.

@strengejacke
Copy link
Member Author

strengejacke commented Nov 4, 2021

Why append instead of suffix? How can be define the suffix then?

@DominiqueMakowski
Copy link
Member

because append suggests that it will append the standardized data to the other data, which is the primary effect (the suffix is just a "side-effect")

@strengejacke
Copy link
Member Author

Ah, you mean append = "<my suffix>" will use the suffix.

@DominiqueMakowski
Copy link
Member

yep

@strengejacke
Copy link
Member Author

d <- iris[1:4, ]

# overwrite
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1    1.2402159   1.3887301          1.4         0.2  setosa
#> 2    0.3382407  -0.9258201          1.4         0.2  setosa
#> 3   -0.5637345   0.0000000          1.3         0.2  setosa
#> 4   -1.0147221  -0.4629100          1.5         0.2  setosa

# append
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = TRUE)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z
#> 1          5.1         3.5          1.4         0.2  setosa      1.2402159
#> 2          4.9         3.0          1.4         0.2  setosa      0.3382407
#> 3          4.7         3.2          1.3         0.2  setosa     -0.5637345
#> 4          4.6         3.1          1.5         0.2  setosa     -1.0147221
#>   Sepal.Width_z
#> 1     1.3887301
#> 2    -0.9258201
#> 3     0.0000000
#> 4    -0.4629100

# append, suffix
datawizard::standardise(d, select = c("Sepal.Length", "Sepal.Width"), append = "_std")
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_std
#> 1          5.1         3.5          1.4         0.2  setosa        1.2402159
#> 2          4.9         3.0          1.4         0.2  setosa        0.3382407
#> 3          4.7         3.2          1.3         0.2  setosa       -0.5637345
#> 4          4.6         3.1          1.5         0.2  setosa       -1.0147221
#>   Sepal.Width_std
#> 1       1.3887301
#> 2      -0.9258201
#> 3       0.0000000
#> 4      -0.4629100

Created on 2021-11-04 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants