Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make check in RSMTool more explicit #393

Closed
desilinguist opened this issue Mar 5, 2020 — with Slack · 8 comments · Fixed by #428
Closed

Make check in RSMTool more explicit #393

desilinguist opened this issue Mar 5, 2020 — with Slack · 8 comments · Fixed by #428
Labels
Projects

Comments

Copy link
Member

desilinguist commented Mar 5, 2020

There's a currently a place in RSMTool where we raise a ValueError if a coefficients file already exists in the output directory. There's no documentation as to why this is being checked.

From @aloukina, this is why we do this:

This is what I think it’s doing in a roundabout way: it uses the existence of coefficients file as a shortcut to establish that we are dealing with a linear model. If you first run a LR experiment and then followed it with SVR using the same id and the same output_directory, you’ll get this error. The coefficients file will be there but the modeler will not be able to scale them because these lines will fail.

We should replace this roundabout check with a much more explicit check, e.g., simply testingwhether predconfig.get_coefficients() fails? That would make it much more readable.

@desilinguist desilinguist added the bug label Mar 5, 2020 — with Slack
@desilinguist desilinguist added this to To do in RSMTool 8 via automation Mar 5, 2020
@desilinguist
Copy link
Member Author

@aloukina perhaps what we should do here is to just delete the original coefficients file and print a warning saying that an old coefficients file was detected and removed?

@desilinguist desilinguist moved this from To do to In progress in RSMTool 8 May 1, 2020
@aloukina
Copy link
Collaborator

aloukina commented May 4, 2020

I am more in favor or raising an error and letting the user do the clean up: if there is indeed an old LR experiment there, there would be other files too: coefficients_scaled, _model_fit, _betas etc. If these stay, it can create a lot of confusion down the line.

@aloukina
Copy link
Collaborator

aloukina commented May 4, 2020

Another solution is if the user specifies -f when running rsmtool, we first clean up any output with the same id from the output directories.

@aloukina
Copy link
Collaborator

aloukina commented May 4, 2020

The danger here is that we cannot really use glob: if the user is not being very smart and has two experiments in the same folder, "exp1" and "exp1_updated" and then decide to re-run "exp1", we'll end up deleting both outputs unless we actually look for specific filenames.

@desilinguist
Copy link
Member Author

desilinguist commented May 4, 2020

Hmm, I am not sure I want to change the semantics of -f at this juncture.

How about if I modify Modeler.scale_coefficients() to raise an error if its call to get_cofficients() fails since should be captured in the API too? And then tweak the error message in rsmtool.py? Would that work?

@aloukina
Copy link
Collaborator

aloukina commented May 4, 2020

So keep the general current logic but make the code more readable by testing for the right thing, right?

@desilinguist
Copy link
Member Author

Yes!

@aloukina
Copy link
Collaborator

aloukina commented May 4, 2020

Yes, I like that!

RSMTool 8 automation moved this from In progress to Done May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
RSMTool 8
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants