Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forcing generated column names to lower case is unnecessary #41

Closed
OssiLehtinen opened this issue Dec 9, 2019 · 8 comments
Closed

Forcing generated column names to lower case is unnecessary #41

OssiLehtinen opened this issue Dec 9, 2019 · 8 comments

Comments

@OssiLehtinen
Copy link

@OssiLehtinen OssiLehtinen commented Dec 9, 2019

Issue Description

Here's another one:
When creating new columns with, e.g., dplyr's mutate, the column names are forced to lower case. This happens silently and deviates from the 'default' behaviour of dplyr (at least with the other DBI-backends I've been using).

With the previously implemented quoting of column names, forcing lower case is not necessary, and could be removed.

Reproducible Example

Example:

library(dplyr)

a <-
tbl(con, "some_table) %>%   
mutate(AaBb = 1) %>% 
select(AaBb) %>% 
collect

b <- tibble(AaBb = 1:5)

a %>% inner_join(b)

This leads to the error:

Error: by required, because the data sources have no common variables

I think this is a bit unexpected.

If I'm not missing something, the 'tolower' part from line

Names <- tolower(sapply(data_type, function(x) x$Name))

could be removed and everything would still work OK and the user would get what was requested.

What do you think?

p.s. One would still get surprises when using copy_to with upper-case column names. Athena seems to accept quoted upper case letters in the DDL, but replaces them with lower case automatically, so this is not something that can be helped in RAthena. Still, I think this is a separate issue.

@OssiLehtinen OssiLehtinen changed the title Forcing generated column names to lower case is unecessary Forcing generated column names to lower case is unnecessary Dec 9, 2019
@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 9, 2019

The initial idea of setting colnames tolower is to create a consistent format of colnames returned from Athena. However if this is causing issue when integrating other packages i.e. dplyr I am happy to remove it. For uploading data into Athena users are informed about the column name conversion in an information message. This is more for use cases were users upload a data.frame with column names that don't conform to Athena's naming conversion. The setting of tolower in this case is only to keep with the consistency of the package.

Saying that I am happy to remove the tolower conversion in both cases if it is better for users.

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 9, 2019

Created a branch: https://github.com/DyfanJones/RAthena/tree/column_name

As the package has recently been released on the cran, I won't push these changes to the cran just yet and try and capture anymore issues/ bugs before pushing to the cran.

In the meantime you can install the dev version with the following command.

remotes::install_github("dyfanjones/rathena", ref = "column_name")

Note: I haven't ran the unit tests just yet. if you notice anything that comes up from this change. Please let me know :)

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 9, 2019

@DyfanJones thanks for your fast response as always! I understand your point too and in some situations things would probably go smoother with the 'forced' lower case. Still, I think it is good to have things aligned with the rest of dplyr, so obviously will not protest the changes you made and :)

Will let you know if I bump into something.

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 9, 2019

@OssiLehtinen the branch is currently a little buggy, i am troubleshooting about it is kicking up the error. so bewared 👍


This has now been fixed

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 9, 2019

This branch should be able to cater for the example you have provided above.

@OssiLehtinen
Copy link
Author

@OssiLehtinen OssiLehtinen commented Dec 10, 2019

The example works well now! Now something seems to go haywire with copy_to, however.

For example a simple
tp <- copy_to(con, mtcars, overwrite = T)
leads to notifications of modified column names and another message stating "NA_". The columns get mixed up in the resulting Athena table too.

Not at all sure why, but it seems the changes to table.R are behind this. Are these changes needed for addressing the original issue?

DyfanJones added a commit that referenced this issue Dec 10, 2019
@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

Found it, this is a bug due to my change in the last pull request. My apologies, I have now fixed issue. The issue came from data.frame with row.names, this isn't covered in unit tests, hence my bad. I will add this to the unit tests.

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Dec 10, 2019

closing this ticket as branch column_name has now unit tests to prevent this from happening in the future

@DyfanJones DyfanJones closed this Dec 10, 2019
DyfanJones added a commit that referenced this issue Dec 10, 2019
DyfanJones added a commit that referenced this issue Dec 10, 2019
#41 request for column name not to be converted to lower case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.