-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review comments #5
Comments
Thanks a lot for your review @LTLA |
Thanks a lot for the review, @LTLA. @mcjmigdal will reply to the more technical comments, here's some feedback on the vignette:
|
@LTLA thank you for the revision and some insightful comments :) I've created issues to the most obvious ones. Vignettes
Answered by Christian R documentation 👍 #9 R code
That was to focus those methods more on the problem of HLA/KIR data. Although, most of the concepts we are using in our analyses could be easily extended to other types of data, we used some assumptions that holds in the frame of the package. One example is that experiments can have only 1 assay, or that the rownames cannot be duplicated. One of the functions that heavily takes advantage of those assumptions is the
In the package the use of
Why it is not a good idea? 😅
Hard to disagree, however at this point it would be very costly to go back to base R. EDIT: However, the use of tidy functions from broom packages is critical as we are using it to get standardized outputs from statistical models. This actually allows us to be flexible when it comes to model choice. As tidy output is tibble, the use of tidyverse stuff is to some extend explained. I still do agree that it could be much reduced, but it would be very costly at this point.
That is really only to prevent warnings on
|
Sounds restrictive. A better approach would be for your functions to handle these situations gracefully rather than coercing all users to use a custom class. Typical behaviors would be to emit warnings on duplicated row names, and then taking the first instance; or providing options to choose the assay to be used from each experiment, either as a string or integer scalar. Unnecessary subclassing really breaks interoperability. Imagine a situation where someone else writes another package for this type of dataset, and they make their own MAE subclass; then anyone who wants to use both packages will have to convert from one class to another, possibly lossily. If your package can accept an MAE, you'll never be the offending party in any exchange.
Why not just
Because it'll make your maintenance programmer think that there's something else going on under the hood. I only mix S3 and S4 when I absolutely have to, e.g., because I want my class object to be used in functions that use S3 dispatch. Typical examples would be You would make your maintainer's life easier by either using pure S4 or, TBH, just making an ordinary function with a check for the class of the incoming argument. If you're talking about developer simplicity, you can't really beat the latter.
I doubt it's that critical, I and many others get along with base R just fine. |
Except object creation and filtration, all of the functionalities outputs a data.frame or a list. Then after creating MiDAS, being MAE child it is still usable to functions handling MAE. So interoperability is maintained, perhaps a nice addition would be a method for MAE -> MiDAS conversion.
I would say it is not the ease of typing we are after but readability of the code, as model definition is integral part of our workflow it seem natural to have as.data.frame method. What also differs this function from *toWide, is that the result data frame must contain more complete description of parent MAE (including information on experiments). This transformation is one of the assumptions we take in MiDAS class, so that we are able to easily create statistical models out of it.
I doubt it's that confusing for maintenance programmers. The as.data.frame method is the only S3 method I've defined for MiDAS, the other S3 methods are defined for matrixces and SummarizedExperiments. By using S3 mechanism it is now easy to accommodate if addition of new experiment would be better handled by say DelayedArray.
In our use cases of MiDAS we are commonly using at least 4 statistical models (lm, glm, coxph, coxme). One way to handle this problem would be to handle each of these models separately, possibly in base R. The use of tidy gives us easy way to do it, at the same time we can take the advantage of the many other models that tidy handles. We could not beat that ourselves implementing dedicated methods to each different statistical model. |
As requested, I've gone through the package. Points in no particular order beyond when I encountered them.
Vignettes
Suggests:
on ggpubr, used in the vignette.MiDAS_tutorial.Rmd
is presumably a typo.include=FALSE
, which makes for weird reading in the compiled HTML as you are referring to code chunks and results that do not appear.runMIDAS
. The vignette says "for each step", but why is there only one table? The non-significant genes showed up in the previous table - where did they go?R documentation
The
@return
is often very sparse:This doesn't give a lot to go on. What are the rows? What are the columns?
R code
There is no clear reason for the
MiDAS
class to exist, given that it does not add any new slots to the MAE. Why not just use an MAE directly? This would make it easier to interoperate with other packages that produce MAEs; but more importantly, it would save you from having to explain to the user what your customMiDAS
class is.I see you have overloaded
as.data.frame
forMiDAS
class objects. I would not consider this a good idea; I don't have a good mental model for what it means to coerce an MAE-class object into adata.frame
. If you need to make a wide DF, you should use a function that is explicitly named as such, see, for example,MultiAssayExperiment::wideFormat()
.What's with all the S3 logic in
MiDAS_transformationFunctions.R
? Mixing S3 and S4 is generally not a good idea. Seems like you could make your life much simpler by just writing a regular function and having a little clause at the top to check whether you're dealing with a matrix or an SE, extracting the former from the assays of the latter if necessary. If you're not expecting other people to extend your methods, there's really no need to make them generics.I would also say that most of your dependencies are unnecessary, in particular all the tidyverse stuff, and you could make the package leaner and more performant with base R.
Don't understand the point of
global.R
, your objects seem to be perfectly accessible without it.Tests
Not all of them pass on my machine.
The text was updated successfully, but these errors were encountered: