Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refectoring for Extensibility #177

Open
JaySumners opened this issue Jun 30, 2021 · 30 comments
Open

Refectoring for Extensibility #177

JaySumners opened this issue Jun 30, 2021 · 30 comments
Assignees
Labels
enhancement New feature or request

Comments

@JaySumners
Copy link
Collaborator

Added as an issue to track decisions and discussion.

Given that the internals of the main functions may be significantly different for different models, my first thought is to refactor the .R files into model-specific files. This way a contributor can add a new model in a single file so long at is matches the interface for the required functions.

Package-wide generics and helper functions would remain in separate .R files (e.g. type-safe versions of mapply in utlis.R).

This will involve a decent amount of things getting moved and likely the implementation of a few classes/subclasses to build an interface-like infrastructure.

@JaySumners JaySumners self-assigned this Jun 30, 2021
@JaySumners JaySumners added the enhancement New feature or request label Jun 30, 2021
@datalorax
Copy link
Owner

I like all of that. Looking forward to what you come up with! And I'm definitely up for contributing to this directly as well. Thanks for being willing to take the initial crack at it.

@JaySumners
Copy link
Collaborator Author

First question, do you have a preference between the S3 or S4 OOP systems in R?

@datalorax
Copy link
Owner

I think S3 is probably fine. I'm definitely more familiar and comfortable with it than S4. Unless there's any specific reason you can forsee needing to go to S4?

@JaySumners
Copy link
Collaborator Author

S4 is just more formal and verbose. I'll give it a whack with S3 and see if I run into anywhere that it may cause ambiguity. Switching isn't that difficult on a small project if we decide on S4 later.

@JaySumners
Copy link
Collaborator Author

Ok, I've tested a few things and have a general idea of what I want to do.

Right now we have things like rhs, lhs, etc. which may not fit how a particular model is represented mathematically. Arima is an example where I had to do a lot of work combining things after the rhs/lhs functions.

Rather, I'd like to have each model have its own .R file. So lm, lmer, arima, etc. would each have their own file. The package would provide a series of helper classes and methods to structure the equation without the contributor having to write latex. This constrains the code, but allows enough flexibility for a variety of models.

Think of having a base class--eq_term--that has properties like coefficient, greek_representation, superscript, and subscript. These terms combine together in operator classes like eq_multiplication and eq_division. With additional classes like eq_equals(), eq_newLine, etc., the final output of a contributor's file will be a vector of these classes representing their equation.

extract_eq will use eq_convert2latex.() methods to iteratively convert these nested classes to latex.

The result is that a contributor can write equations by combining these classes in a way like you might combine things in ggplot2 rather than manually writing the latex. This also gives us a predictable way of combining all of the terms into a final latex equation.

@datalorax
Copy link
Owner

I like that idea a lot. The only thing is I'm not 100% sure how well it will always work. For the lme4 models, for example, I had to go through a whole bunch of gymnastics to make sure the variables were all going to the right level and the subscripts were all coming in right etc. It would probably still be possible the way you're suggesting, but I'm not sure if it would be any easier. If it's more consistent, though, then it's probably still worth it. But we would also end up with a huge amount of functions to manage if we essentially tried to port LaTex to R functions.

Right now, we use broom to ensure that everything comes in consistently. I like that and would like to make sure we continue that way. I would love if we could do a similar sort of thing with equations where we, or contributors, write functions output a data frame in a consistent format and then that would move through a set of consistent functions that would process the data frame into either an equation or a final equation like thing that could be manipulated one last time by the contributor before being output as an equation. But... I'm not sure how realistic that is either because every equation is so different.

So I guess I'm game to try anything, and maybe I'm not seeing the big picture quite as clearly as you are, but it seems like a really difficult task. I'm not sure I want to completely rewrite the lme4 extraction, for example.

Sorry if this is coming off as less than enthusiastic. I do like the idea of a consistent interface a lot but I am just unsure of how well any general solution will actually work. If you want to give it a go, I'd be more than happy to review. If we can make it work for both ARIMA and lme4 models without a crazy amount of effort, that would be a really promising sign.

@JaySumners
Copy link
Collaborator Author

We're on the same page and those 2 (lmer and Arima) were going to be my test cases. The hope is that we have a number of functions at most equal to the number of major operators in latex (that are usable in R), but that these core functions are relatively simple paste() operations. I'd feel bad for anybody who had to read and update my arima code the way it is built today because it doesn't have a common syntax to other parts of the package.

@datalorax
Copy link
Owner

Yup, totally with you there. Okay, let's give it a shot and see how it goes! It would be amazing if we could actually get it to work. And it would make it a whole lot more likely to grow organically through the community.

@JaySumners
Copy link
Collaborator Author

I've done a decent amount of testing. Building out a latex parser isn't all that difficult, but will feel clunky to people who already know latex since you have to nest and build equations one-step-at-a-time. I've simplified it down and have a working example with Arima.

Question: Do you have a preference for where functions live?

I'd like each model to have it's own file and to contain all of the helper and method functions in that file. So that the package would look something like this:

/R
/defaults.R
/eq_term.R
/utils.R
/lm.R
/Arima.R
/....

Where defaults.R would contain the .default and dispatch functions for methods contained in the package. Internal classes would get their own R file (e.g. eq_term.R) containing the definition, any methods, and any class-specific helper functions. Each model would get their own R file (e.g. Arima.R) and would contain all of the methods and helper functions for that model class. So extract_eq() and extract_eq.default() would live in defaults.R and extract_eq.Arima() would live in Arima.R. utils.R would contain what it does now.

There are groups of helper functions to deal with the eq_term class that might get their own R files as well, just to make tracking them easier. I'm struggling a bit between wanting all the internal developer functions held together and then individual R files for each model on the one hand and wanting small, more manageable files on other hand.

@datalorax
Copy link
Owner

Okay, I think that all sounds good from an organizational standpoint. I agree with the struggle on manageable files versus ease of file structures. It's always a bit of a struggle. But I'm happy to try whatever you think is best.

@datalorax
Copy link
Owner

Hey Jay, just wanted to check where you're at here. There are a few things I'd like to address but I don't want to start making changes if we're going to have to just migrate it over to this new structure anyway. No worries if you've run out of time for this currently.

@JaySumners
Copy link
Collaborator Author

I've been plotting along. Refactoring now that I have a class and structure hasn't been that bad and I've been trying to tackle things like wrapping that were in progress before. Feel free to make edits. It'll take me a while to get this in the right place.

@datalorax
Copy link
Owner

Okay, great, I'll take a look probably early next week.

@datalorax
Copy link
Owner

Hey Jay - would you mind creating a branch on the main repo and include the changes you have currently, rather than working from your fork?

@JaySumners
Copy link
Collaborator Author

Sure, I can do that.

@JaySumners
Copy link
Collaborator Author

I think I found a good way to combine terms at the end. Let me finish up arima and LM here in the next few days then I'll add the fork.

@JaySumners
Copy link
Collaborator Author

Ok, one last thing to do and then I'll publish what I have along with. A list of TODO. It's been a big project but I've learned quite a bit.

@datalorax
Copy link
Owner

Sounds great. I've been addressing a lot of issues as you've probably seen so hopefully we don't run into too many conflicts. I've also had some ideas on how to increase extensibility but I'm holding off on anything there until we go through your branch.

Also, just as an FYI, I'm thinking I'm going to publish v0.3 to CRAN after I finish addresses all the non-new model issues. But the refactoring should probably hold off until the v0.4 CRAN release unless it ends up being really easy to incorporate.

@JaySumners
Copy link
Collaborator Author

I would hold off for 0.4. The branch will get put up by Wednesday night with cursory instructions and TODOs.

One question is whether or not you make a breaking change to extract_eq. As I have it built, extract_eq is a good place to get the constructed equation object while convert2latex convert that object to a latex compliant string. comvert2latex with a model will both extract the equation and convert it to latex. This feels more consistent and allows users to edit anything in the equation before conversion, but is a breaking change.

@datalorax
Copy link
Owner

I'm not sure I totally follow, but I am okay with having breaking changes at this point. I think if we want to export two functions (again, not sure I'm totally following) I'd like to keep extract_eq() as the final function that can be used as it is currently, but we could export create_eq() that has all the guts of the equation before it gets converted, if that's what you mean.

@JaySumners
Copy link
Collaborator Author

That works too. In short, I built a series of classes that fit together like lego blocks carrying attributes with them (like wrapping behavior). Developers use these and some utility functions to build put an equation object (a list of these blocks) that has a built in series of parsers from the building blocks to latex.

After I post it up on Wednesday, we can walk through it.

@datalorax
Copy link
Owner

Sounds great. Looking forward to checking it out!

@JaySumners
Copy link
Collaborator Author

I've pushed up what I've put together so far. It is still rough and missing some things, like proper alignment, escaping tex, and fixing coefs signs. forecast_Arima.R has the basic concept:

Convert model to data.frame with broom::tidy
Get any information you need from there (i.e. term types, etc.) - easiest to do here.
Convert to a list of "term" objects with convert2terms.
Edit terms and add variable objects as needed using built in functions.
Deal with any special cases specific to the model.
Wrap and align using dispatched functions or the .default if appropriate.
Convert to latex using convert2latex with either the standard dispatch or using model-specific versions of the function (dispatch).

The following are the built-in classes:
eq_variable
eq_term
c_term
eq_line
eq_equation

The logic for converting these to latex is built in to the package and usable freely by developers to avoid the particulars of converting to latex. Additional functions, like eq_divide, can be included to provide standard ways of combining things other than combine_by and c_term.

It is a lot to process. Take a look and let me know if you have any questions. I'm on the west coast, so if you ever want to jump on a call we can do that too.

@datalorax
Copy link
Owner

Thank Jay, I am super slammed with work again for and then am taking off time next week. But I'd love to connect on this more when I get back. I've poked around the code a bit but haven't actually run anything yet.

@datalorax
Copy link
Owner

Just stopping by to say I'm back at it again for a bit but I think I'm going to hold off on this all together for a little while. I'd like to get the next version out there first and then come back to this. I am a little worried about how much work may be involved with moving to this approach, but I also think it could really help grow the package if others are more easily able to contribute.

@JaySumners
Copy link
Collaborator Author

That's fine by me. I need to give some more thought to things like wrapping, etc anyway. I'm hoping that these classes make the integration of new models easier since it gives a more general interface to the rest of the model.

I like functional programming, but do miss interfaces and encapsulated classes lol.

@JaySumners
Copy link
Collaborator Author

I checked the refactoring code for compatibility with the pipe and it looks good. I can rewrite using the pipe. Do you want me to use the native pipe or the magnittr pipe?

@datalorax
Copy link
Owner

That's a good question. I guess since it probably won't matter in actual functionality let's go with the native pipe. That'll be a good nudge for me to actually start using it as well.

@JaySumners
Copy link
Collaborator Author

I haven't implemented the pipe yet as I wanted to get your feedback on the refactoring branch as it is. If it doesn't look like the right direction, I won't be offended if we scrap it and try something different.

@datalorax
Copy link
Owner

I'm sorry for the delay. I'm way behind on things and honestly kind of lacking motivation to work on this at the moment. Do you want to maybe email me at daniela at uoregon dot edu and we can maybe setup a zoom session to talk through it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants