Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback to blog post "Mastering the Many Models Approach" #4

Closed
17 tasks done
TimTeaFan opened this issue Jun 14, 2023 · 2 comments · Fixed by #6
Closed
17 tasks done

Feedback to blog post "Mastering the Many Models Approach" #4

TimTeaFan opened this issue Jun 14, 2023 · 2 comments · Fixed by #6

Comments

@TimTeaFan
Copy link
Owner

TimTeaFan commented Jun 14, 2023

I received some good feedback from Twitter user @isabellaghement to improve the blog post.

Below is a list of points for documentation. I try to revise the blog post to account for the points below where possible.

  • 1. Would it be possible to define at the beginning of the blog post what the “many models” approach entails?

“Many models” could refer to fitting the SAME model to DIFFERENT data subgroups (e.g., one model for males, one model for females, etc.). 

But it could also refer to fitting DIFFERENT models to the SAME data (e.g., a linear model, a quadratic model). 

Both cases are possible.

  • 2. Can you clarify what version of dplyr you are using for your blog post? I think in the newest version, the rowwise() function is no longer needed? If that is true, a comment on how users should modify the code to accommodate the newest version of dplyr would be great.

  • 3. Early on, it would also be good to provide an overview of how the “many models” works from a computational perspective: the input datasets for fitting each model are saved in separate rows of a nested data frame, the model fitting function is applied iteratively to each row, the model fits are appended to a new column of the nested data frame (one summary per row), further processing of the model fits can be performed and the results of that processing can also be appended to a new column of the nested data frame, etc.

  • 4. It’s worth mentioning that users can define their own named or unnamed functions for processing each model fit; if possible, I would show a small example of how one could test if a named function works on one unnested dataset before deploying it to a nested dataset.

  • 5. Is there ever a need for NOT wrapping up a function applied to each row of a nested data frame inside a list() call? Personally, I would love to know the answer to that question. 😜

  • 6. Speaking of functions deployed to rows of a nested data frame, can you clarify in your post that these functions can accept as arguments either one or more of the columns of the nested data frame, as well as arguments that are independent of these columns?

  • 7. In the section “Tidy results with broom”, you no longer use the rowwise() function. Can you explain why?

  • 8. Should the section “Nesting results” be titled “Unnesting results”?

  • 9. What does .after do when used inside mutate()? Can you explain that in the blog post?

  • 10. For the section on “building formulas” in your blog post, can you clarify early on that you are referring to “model formulas”?

  • 11. For reformulate(), if the argument are named, I think they can be passed in any order? So conceivably, one could use reformulate(response = , termlabels = ), which is the more natural use?

  • 12. So expand_grid() works with one nested data frame as the first input and one vector as the second output? And it only combines the groups in the nested data frame with the values of the vector? Can that be clarified explicitly in the blog post? It’s a pretty cool feature!

  • 13. So much of the stuff that I read seems to be inductive, but my mind really likes it when it is given upfront the big picture - it can then recognize elements of that picture in the ensuing write up and piece them together like a puzzle. It flounders without that. 🤣

With that in mind, is it possible to provide a big picture in the section on model formulas? Short and sweet should do it: 

“In what follows, I will illustrate two possible use cases of model formulas in the context of many models.” 1/n

“The first use case covers a situation where we need to fit multiple models to the data in each row of a nested data frame (i.e., multiple models per row). All of these models have the same response variable but different predictor variables, etc.”

"The second use case covers a situation where we also need to fit multiple models to the data in each row of a nested data frame (i.e., multiple models per row). Each of these models is an updated, augmented version of a base model.” 1/n

“The formula for the base model is defined outside of the nested data frame; the updates to this formula are then made inside the nested data frame.” 2/n. 

Can even give a small example for each use case to illustrate what the reader can expect.

I confess that I was left wanting more for the second use case! Which is a sign that you can add a few more goodies there. 😁

  • 14. How do we simultaneously update the base model formula to include “just email rating”, “just website rating”, “both with/without interaction”?

  • 15. The name my_formula2() didn’t resonate with me. Why not give this function a more informative name such as: base_formula()? 

The name of the function should help the reader understand what is going on.

  • 17. I liked that mutate() is being used to simultaneously create model formulas AND fit the models. This may warrant a side comment for the reader: 

‘When using mutate() with a nested data frame (df), we can simultaneously define multiple columns and append them to the nested df.’

  • 18. In the section on model formulas, you sometimes use 

select(!filter_ls)

after deploying those formulas and sometimes you don’t. Why the inconsistency? When/why should this command be used? Can this be explained to the reader?

@TimTeaFan
Copy link
Owner Author

Regarding 2:

The original many models approach used purrr::map() instead of dplyr::rowwise(). The later is still needed and should not be omitted, although it is still possible to use purrr::map() instead of rowwise(). However, I wouldn't recommend that, since the readability is much better with rowwise().

@TimTeaFan
Copy link
Owner Author

closed by PR #6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant