-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Lists in the New Data Frame dialog #7468
Added Lists in the New Data Frame dialog #7468
Conversation
updating master
Updating Master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anastasia-mbithe this is great, and it mostly works.
a) I suggest adding a checkbox, default unchecked, with label something like R Command
as the label. If checked, then it shows to box with the command, and that could be shorter as well.
b) It mostly works fine and gives the lists we need. A few do not work, and I think they all give the same error. Here is an example:
Others with the same error include words/word_clues, where none of words_five, four, six work.
This error could be a bit more of a puzzle to diagnose?
@rdstern, can we have the lists as a tibble instead of as a data frame? |
I assume that could be nice, but I am not sure why you are asking for this? What's the advantage? I am adding @lilyclements who is much better placed to answer/confirm this. |
@rdstern, a tibble can accommodate the unequal number of rows, and that could fix our issue here. Though the general output looks different from what data.frame gives. |
@anastasia-mbithe can you post the R code for if we used a tibble here and for if we used a table here? Might give me a better idea of the aim of the dialog. I know we do want to use tibbles instead of data frames in R Instat. It does introduce some differences (eg no row names). So this could be a good place to start on that. I’m not entirely sure what the “hold up” is on that. |
@lilyclements, The other option we have is filling the shortest column with NAs to match the number of rows of the longest column, then having it as a data frame. I found cbind.fill()/ rbind.fill(), though I couldn't find the functions in the current R version. Do you know of another function that could do the same? |
@anastasia-mbithe I really like your search for solutions here. I was happy with the data layout for the dinosaurs in a data frame, and would not want to lose that layout. I'd like to solve the importing of these troublesome lists with something that could be used by schoolchildren, even if they have to learn some data manipulation. So, making the data into a proper list, or into multiple variables would be ideal. |
@anastasia-mbithe any progress? |
@N-thony Not yet, am still looking for a way to have the "particular" lists as a data frame. |
@rdstern @anastasia-mbithe I don't think I fully understand what we want the output to be in R-Instat for something like In R, if I run
The "if" statement can be in VB code not R code, and so this can be run in a sub. (side note:
|
@rdstern I've noticed this is in "blocker" - would you mind having a look at my comment above when you get the time? |
I am not sure whether I am answering your points above. I hope: Here is another example, that I quite like - more than cluedo - so I'd be just slightly disappointed if it can't be read?
and so on. It is a long list. I was hoping that could be stored in 3 variables, namely the first - with the description, the second with the name, and the third with the list (e.g. "take advantage of" "bully", etc. . Then we can always use our text handling facilities to split the list into the separate bits if we wish. (We may have to extend our existing dialogues and that would be fine..) |
@rdstern we have three different examples here. A list of two, but with multiple entries in the second part (e.g. "take advantage of", "bully", ...) A list of more than two To handle these three cases, I've written some R code (below):
We could make this into a function to be called into the dialog for this particular package? Or would we rather avoid writing our own functions? |
@lilyclements I like the idea of functions! I don't see why we are avoiding them. |
@rdstern great, should we store the function in I have tested it with some of the data sets in the |
@lilyclements, I made the changes although there's a consistent error for all datasets like I had explained to you before. Kindly have a look at the code. |
@lilyclements please could you help here? Thanks. |
@anastasia-mbithe the issue with the R function was just that the parameter name is `data`, but in the function itself is referred to as `df`. This should fix the issue with the R function!
@lilyclements Thank you for the changes. The function is now working well and even for the "special" case lists. As I was testing randomly, everything was well until I got to "religion/christian_saints" and it gives this error; I checked the list in R and noticed it has two variables with many missing values. Is there a way we can add NAs to the empty cells? |
@anastasia-mbithe good find, and good investigation. The issue here wasn't to do with the NA values, but instead that the function doesn't have a way to handle this "case". The function, until now, has only handled lists. However, this is a data frame. ``` x <- rcorpora::corpora("religion/christian_saints") head(x) class(x) ``` So we just have to add in a new `if` statement for how to handle the data if it is a data frame (that is, in it's simplest case!). I've done this and made the changes in this commit. This is ready for you again now.
Thank you so much @lilyclements. |
@rdstern had a small bug in the last version which prevented some of the functions from working - including
@rdstern had a small bug in the last version which prevented some of the functions from working - including I'll look at "Transportation > Commerical airlines" and "Words/Verbs with conjugations" tomorrow. |
Looks very good now. religion works as do the word clues. The only 2 I can find that don't work now are |
@rdstern i suggest we write them up in an issue. I’m happy to write the issue tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and almost everything is working now. I am approving in its current form, though there may be a new issue to enable the final couple of lists to be imported - or at least trapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilyclements Thanks for the recent changes, I just have a couple of comments about readability
if (length(data[[i]]) == 0) { | ||
data_unlist[[i]] <- data.frame(NA) | ||
} else { | ||
for (j in 1:length(data[[i]])){ | ||
if (class(data[[i]][[j]]) %in% c("character", "factor", "logical", "numeric", "integer")){ | ||
data_unlist_2[[j]] <- data.frame(list = data[[i]][[j]]) | ||
|
||
} else if (class(data[[i]][[j]]) == "list"){ | ||
if (length(data[[i]][[j]]) == 0) { | ||
data_unlist_3[[j]] <- data.frame(list = NA) | ||
} else { | ||
for (k in 1:length(data[[i]][[j]])){ | ||
if (class(data[[i]][[j]][[k]]) %in% c("character", "factor", "logical", "numeric", "integer")){ | ||
data_unlist_3[[k]] <- data.frame(list = data[[i]][[j]][[k]]) | ||
} else if (class(data[[i]][[j]][[k]]) == "list"){ | ||
if (length(data[[i]][[j]][[k]]) == 0){ | ||
data_unlist_4[[k]] <- data.frame(list = NA) | ||
} else { | ||
for (l in 1:length(data[[i]][[j]][[k]])){ | ||
if (class(data[[i]][[j]][[k]][[l]]) %in% c("character", "factor", "logical", "numeric", "integer")){ | ||
data_unlist_4[[l]] <- data.frame(list = data[[i]][[j]][[k]][[l]]) | ||
} else if (class(data[[i]][[j]][[k]][[l]]) == "list"){ | ||
if (length(data[[i]][[j]][[k]][[l]]) == 0) { | ||
data_unlist_4[[l]] <- data.frame(list = NA) | ||
} else { | ||
if (!is.null(names(data[[i]][[j]][[k]][[l]]))){ | ||
data_unlist_2_i <- purrr::map(.x = names(data[[i]][[j]][[k]][[l]]), .f = ~data.frame(list = data[[i]][[j]][[k]][[l]][[.x]])) | ||
names(data_unlist_2_i) <- names(data[[i]][[j]][[k]][[l]]) | ||
data_unlist_4[[l]] <- plyr::ldply(data_unlist_2_i, .id = "variable4") | ||
} else { | ||
data_unlist_4[[l]] <- (plyr::ldply(data[[i]][[j]][[k]][[l]], rbind, .id = "variable4")) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
names(data_unlist_4) <- names(data[[i]][[j]][[k]][1:length(data_unlist_4)]) | ||
data_unlist_3[[k]] <- plyr::ldply(data_unlist_4, .id = "variable4") | ||
} | ||
} | ||
} | ||
names(data_unlist_3) <- names(data[[i]][[j]][1:length(data_unlist_3)]) | ||
data_unlist_2[[j]] <- plyr::ldply(data_unlist_3, .id = "variable3") | ||
} | ||
} | ||
} | ||
names(data_unlist_2) <- names(data[[i]][1:length(data_unlist_2)]) | ||
data_unlist[[i]] <- plyr::ldply(data_unlist_2, .id = "variable2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilyclements Would it be more readable to move this code into a separate function?
Also, this code seems to be going down 4 levels and doing similar things at each level (with the exception of the 4th level).
Some lines of code are repeated several times.
Could we replace this nested/duplicated code with a smaller recursive function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was thinking this too. I wrote a comment, but just realised I wrote it in the "commit" so it was lost!
"these should all work now. The main issue was with the religion one - it contains a list in a list in a list in a list in a list (data$Indigenous Traditional$Historical Polytheism$Indo-European$Hellenistic$Pythagoreanism
) so it took a little while.
I think there is a much nicer way to do it than I am doing. I have seen, for example, Danny call his own function in the function. I think that is what we would want here. I can spend some time on cleaning this up and looking into that, if you think it is worth it? Alternatively, I can do that on another branch once this is merged."
I can do a recursive function. Shall I do it on this branch or elsewhere? I do not know how long it will take as I have not written recursive before. I also have a few other priorities to get to this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some edits to the function to add in recursive and avoid repeating as much!
Co-authored-by: lloyddewit <57253949+lloyddewit@users.noreply.github.com>
Co-authored-by: lloyddewit <57253949+lloyddewit@users.noreply.github.com>
@lloyddewit after our call, I found a solution which avoids recursion but is hopefully still readable - or at least, a lot less complex, and works for infinite list lengths. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, the latest commit is a big reduction in size/complexity. It looks like you used the power of the R libraries to do the heavy lifting, which is great.
I approved, there's just one open question.
Also, you mentioned that you might add some comments to explain what's happening. Do you still plan to do that? An example of a mult-level list would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great
@lloyddewit thanks for the reminder - I have just added some comments into the function. However, this has dismissed yours and @rdstern's reviews. Sorry for that - would you mind reapproving? |
Fixes #7387
@rdstern, @N-thony @lloyddewit.
I have added the second part of the issue to the dialog. It is ready for review.