New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochasticity #2
Comments
How are you planning to do this? It seems to me that matlab positional-only argument specification (that is, the lack of support for named parameters) makes this change (and in general any other parameter addition/removal) challenging. Options that come to my mind:
|
Thanks for the ideas, Santi. I'm now preparing to do this next, as a priority. Yes, this is where Matlab functions are a bit restrictive -- would be great to have named parameters in this case. Anyway, this is where we are -- in Matlab land for now. For the purposes of how the engine works, I don't see a problem with just resetting the random seed at the beginning of every function call. This could be done: (i) in TSQ_brawn (i.e., not requiring any changes to any functions at all), (ii) inside all stochastic functions. As long as the seed is reset the same way each time, this would ensure reproducibility. Regarding your options for (ii):
I think for now I'm favouring using (i), if it's not too slow just for a quick-to-implement solution (or 1./2.). Looking to a future in which the first input is a structure (that can also deal with vectors if just a vector is provided). Of course, external C++/fortran mex functions that use randomization will not be reproducible, unless I go into their sources and impose seed resetting... |
This is underway in the DeterministicStochasticity branch (sub-branch of OperationChanges). Will happen for now through a uniform BF_ResetSeed function, that all stochastic operations will reference for inputs specifying a seed reset (the default will be the Matlab default: reset the seed to Mersenne Twister with seed 0). Am yet to specify defaults or incorporate input arguments controlling this, but hopefully I'll find time to get to this by the end of the week. |
If I understand correctly, touching TSQ_Brawn would not help library users. So not useful to me ;-). But that is indeed the approach I would initially take in pyopy: reset the global rng state to the specified seed (in python land) before running the computation. That is ugly in two ways: 1) touching global state is ugly per se 2) having seed-unaware random functions makes a tad more difficult to batch operation calls deterministically (e.g. one would need to interleave calls to set the seed between calls to the operator) and with clear provenance (as the seed is always part of the signature of the operation, even if implicit). So for me, the only way to get this correct is to get seed-aware functions (ii). Making the seed the last parameter of the functions is definitely less work, so probably the best way to go. My thinking is: seed at the end means enforcing users to specify all other parameters if they want to play with stochasticity (with what that means in functions were the defaults are something more elaborated than a number or a string). This is always true for any nth parameter when using vanilla matlab function dispatch. So the question would be: what will a user want to change from the default parameters more often? That depends on the scenario, so there is no good answer. I often will want to change the seed when I find an interesting (aka discriminative) feature that is stochastic, instead of touching any of the other parameters. Check if the appeal of the feature is just because Mars was aligned with Pluto or if it really encodes something useful. What I would definitely try, regardless of where the seed parameter goes is:
|
My tests tell me that these operators need to be tagged as stochastic:
There can be false positives, I might have bugs and other sources of stochasticity might be in place. Also I might miss some more because of the simplicity of the series I'm using to test. I still have some operators failing to compute too. I will give a second look, hopefully, at the end of the week, but of course, nothing can replace careful thinking. |
The only remaining (labeled) stochastic operations are the fractaldimensions operations (which rely on mexed code from TSTOOL). I don't know how to control random seeds in C++ (e.g., in |
Some operations output results that depend on the random seed, and thus running the same operation on the same time series can produce different results if run multiple times.
A solution to this is required, and could be done by allowing a random seed input to each non-deterministic function, to allow reproducible results. If none is provided, a default could be rng('default') at the start of each function.
I should implement this as a priority going forward.
The text was updated successfully, but these errors were encountered: