Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples/documentation fails to set runNumber correctly #1045

Closed
EinarElen opened this issue Jan 6, 2022 · 10 comments · Fixed by #1075
Closed

Examples/documentation fails to set runNumber correctly #1045

EinarElen opened this issue Jan 6, 2022 · 10 comments · Fixed by #1075

Comments

@EinarElen
Copy link
Contributor

Not entirely sure where to place this issue, it touches multiple repositories although perhaps
primarily https://github.com/LDMX-Software/ldmx-software.github.io

When working with some of the simulation samples that were produced for the
prototype testrun, https://github.com/PeterGy noticed that the samples that were
supposed to have different runNumbers didn’t. As it turns out, several parts of
the documentation/examples are telling you to set runNumbers with something like
the following

sim = simulator.simulator('...')
sim.runNumber = 1

I’m assuming this was how things were done before but as far as I can tell, this
is supposed to be done with the process object instead. Since LDMX-sw currently
has no way to warn you if you provide an argument that isn’t used, this ends up
with setting the run number to the default for every run.

I know that we aren’t the only ones that have ran into trouble with runNumbers
not being set so it might be a good idea to have a dedicated warning in SimCore
at least as long as there is no way for LDMX-sw to warn for unused parameters as
a whole.

There are both some uses of this in some of our examples and test code, but the
primary problem is the website which most people, new or old, will look to when
they want to figure out how to configure ldmx-sw.

I think this is a rather urgent thing to fix because mistakes can cost us quite
a lot. I am willing to go through both the website and the sources for all the
occasions where we are currently doing this. Personally, I think it would also
be useful to add a dedicated warning for this in SimCore. If other people agree,
I can implement that as well.

@tomeichlersmith
Copy link
Member

Thank you for offering to fix up the documentation! I think that is very important and I agree that many others have probably run into this issue. You are also correct that the new method for configuring the run number is using the Process object p.run rather than the simulator directly.

As for the more long term solution, I've been trying to figure this out for some time. The basic issue is two fold:

  1. In Python (unlike C++), assigning a value to a previously undefined member variable just creates a new member variable.
  2. As you've implied, the C++ handling of the Parameters does not check if Parameters are not accessed by the C++.

I've spent some time trying to look into solving either of these issues and I've gotten stuck.

  1. Changing Python classes to behave differently is pretty antithetical to how Python is designed and so we'd need to do some pretty fancy nonsense in order to get it working properly. The best idea I could come up with is defining our own metaclass, but I'm not sure how this would operate.
  2. It is pretty simple to add a variable in the Parameters class that keeps track of which variables have been accessed, but then the question is when do you check for unaccessed variables? Currently (especially in SimCore), there are instances of the Parameters class which exist for the entire program execution with variables that are accessed after Simulator::configure is done. The best idea I had for this was to make Parameters a no-copy class so that we know centrally that all variables accessed are done at the end of configuration. This would require some pretty heavy re-working of how some SimCore objects are configured.

In the medium-term, I like the idea of a dedicated warning (or even exception?) within SimCore::configure because changing the run number is so integral to much of our work. Something like

if (parameters.exists("runNumber")) {
  EXCEPTION_RAISE("InvalidParam",
    "Remove old-style of setting the simulation run number (sim.runNumber). Replace with using the Process object (p.run).");
}

I believe you have push access to the documentation website repository, but if you don't feel free to just open a PR and request my review.

As a side note, I'm happy to hear people are using the website! It is beneficial to our collaboration to have good software documentation, and people using it is the best way to catch typos and old bug snippets.

@EinarElen
Copy link
Contributor Author

Yes, I've also been scratching my head a bit trying to think of good ways to deal with this in general. My sense is that, as you say, doing it at the python level is going to get weird. So I've also been thinking more along the lines of doing some bookkeeping at the C++ layer. Ideally, you would want everything to happen at configuration time but a warning at the end of a run would probably still be better than no warning at all?

@tomeichlersmith
Copy link
Member

That's a good point... I'd put the warning in the destructor then. Most instances will be destructed at the end of configuration, but there will be a few copies that won't be destructed until the end of the run. This also forces the check to happen without any additional necessary code.

@EinarElen
Copy link
Contributor Author

I can make an attempt to implement that after I've done the cleanup!

@jmmans
Copy link
Contributor

jmmans commented Jan 6, 2022

I would strongly consider the unintended consequences here. A strict requirement that all parameters are used in all cases is going to be a major pain I expect. No leaving anything in there when trying two different cases, etc. I would even imagine there may be cases where a parameter is only needed for some types of runs, but that may not be clear until the system is running. A "general solution" may not have the desired impact.

In a similar system, however, there is the ability to generate an audit report, indicating which parameters have been used and which have not. This seems useful, but a regular warning seems likely to be ignored -- ignored warnings are a bad trend in systems... :-)

The case in question is a very specific one, though I see the point of inserting similar exceptions whenever a parameter is /removed/ from the system or moved to another place in the software.

@EinarElen
Copy link
Contributor Author

I have been thinking about all of those things. First of all, to be clear, I don't think anyone is advocating for any strict requirements.

This is also not the only case where you can run into similar trouble. The other thing that I've seen a couple of people run into which can be incredibly painful to debug is typos in the python config. These kinds of things have a very real cost to ignore. Best case scenario we waste CPU time and work hours from the person running the simulation. Worst case scenario we both do this and end up relying on results that are wrong.

Warnings that are always ignored are indeed a problem, but these kinds of mistakes are precisely the kind of issue that a warning is good for. Mistakes that are easy to make, hard to spot for a human, but easy to spot for a program. I don't think an audit report is a good solution to that kind of problem.

Whether or not such things are best kept as opt-in or opt-out is worth discussing, I lean towards opt-in but with examples and tutorials demonstrating them being used precisely for the reasons stated above with special setups that both a) have a parameter that can't be determined if its needed until runtime b) isn't handled at the python level c) has some reason that means its value can't be stored regardless of if it is used or not. In other words, handled similar to how compiler warnings are typically used.

@EinarElen
Copy link
Contributor Author

So, I've been looking into where we are doing things like sim.runNumber in our python code and examples. We have a bunch of places that look like https://github.com/LDMX-Software/ldmx-sw/blob/trunk/Biasing/python/target.py where we are defining functions that produce a simulator object. Since these functions don't take a process object, they have no way of actually setting the runNumber anymore.

Is that a concern? Otherwise, the patch would just be to remove the sim.runNumber = bla line, possibly with a comment.

It looks like most of these things are examples, are they used for anything else (e.g. in tests or even production stuff)?

@tomeichlersmith
Copy link
Member

The python objects that create a simulator are used in the test scripts test/*.py and used in some production although many people doing production just use those modules as examples and then write their own creation for the simulator themselves.

I think having the run number not set by those functions is a feature because the run number should be centrally controlled and those functions are just for configuring the simulation. I think the patch of removing those lines with a comment would be helpful to redirect anyone looking at the source code.

@EinarElen
Copy link
Contributor Author

Sounds good!

@EinarElen
Copy link
Contributor Author

EinarElen commented Jan 27, 2022

The following repos have examples that need to be patched

In addition to this, there are a couple of cases in the Biasing module

Patches for these are here: https://github.com/LDMX-Software/ldmx-sw/tree/iss1045-runnumber-documentation

@tomeichlersmith tomeichlersmith linked a pull request Aug 9, 2022 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants