-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To reproduce results for the ground-truth datasets #55
Comments
Hi @yoshitomo-matsubara sorry for the delay, and also thanks for taking a deep dive. I will try to answer all your questions below. Some of them involve improvements to the repo workflow that we should separate out as issues.
You're welcome
These are the full steps:
The
Here we limited to 2 hours and 8GB RAM. the script will only submit jobs for results it finds in the results folder (i.e., This step will create
a) correct, also R2 defined in section 4.3
No, solution rates are derived computationally, as show in groundtruth_results.ipynb.
paging @foolnotion.
we used 9:00, but methods are also limited internally (where possible) to 8 hours for symbolic problems.
We used a cluster with ~1100 cores, meaning we were able to run ~1000 jobs simultaneously. The maximum core hours for training the models is given in Table 2, assuming all methods use the whole budget (which is not the case). For ground truth datasets, ~440K core hours, with 1000 cores is 440 hours or 18 days. I recall it actually taking about 2 weeks and symbolic model assessment taking about 2 days.
That is not surprising to me. Some of the benchmarked methods have very slow implementations. Because of my access to a big cluster, I have offered to benchmark methods that are submitted to this repo, pending cluster availability. It is a lot of compute.
Thanks for the detailed questions and let me know if anything is unclear. |
regarding Operon maybe running In any case, I had another compilation problem and @foolnotion helped me out with a patch: remove the following from environment.yml:
then add the following into environment.yml:
in
|
@yoshitomo-matsubara It's best to let conda handle all the dependencies and not mix system libraries with conda/vcpkgs/etc. The Ceres cmake module will complain if the Eigen version detected is different from the version Ceres itself was compiled with. I will have to do a PR soon to update operon. There is one last thing to finish before I do that, namely to integrate NSGA2 into the python module. Since operon switched from |
@lacavaThank you for the detailed answers! I think many of my questions above are resolved now.
Could you clarify this point little bit more? Does it mean the equation comparison (between estimated and ground-truth equations) is completely left to sympy and the constant parameters
At which level is the
Unfortunately, I do not have access to such a big cluster to distribute jobs and it would take forever to completely follow the experimental design. @folivetti and @foolnotionThank you for the suggestion! I followed the suggestions from both of you, but unfortunately it didn't resolve the issue of Eigen and libceres-dev. |
Yes. Sympy evaluates the equations symbolically, meaning without any real values being passed. So the result of comparing the true equation and model is a symbolic equation, and assessed as such.
This limits the training of a model on a single dataset for 1 trial. This time limit is sent to the job scheduler.
Here are some options, depending on your goals. Are your goals to reproduce the entire experiment or to compare to another method? We provided the
"reasonable" is subjective and algorithm/problem dependent. IMO running three trials won't give a very good estimate of the likelihood of a randomized algorithm finding an exact solution to a specific problem. Averaged over all problems is perhaps less problematic, but could be misleading. |
Thank you for the further clarification and suggestions. My goal is 1) to apply the SR methods to my internal datasets (which cannot be shared at this moment) and 2) compare the performance with my proposed SR model for a paper I'm working on. This is why I've been seeking for the way to leverage this great project. |
@yoshitomo-matsubara I will try to provide a docker image for you, but I don't know how long that'll take as I don't have a lot of experience with docker. In the meanwhile, please create an issue on our project's page and describe your installation steps. Regarding the computational costs of running the benchmark, I've recently had a good experience with the AWS cloud. You can get a good price if you wait for spot instances to be available. Setting up an ubuntu cloud machine with conda/srbench is pretty easy. I was able to run Operon/SRBench over one weekend on an AMD Epyc 96-core machine for just a little over 20€. Spot instances do have some caveats (need to checkpoint your work often) but overall are a good alternative. |
@foolnotion Thank you for the offer. I just created a new issue for Dockerfile #56 |
Inconsistency between the numbers above and those in Table 2
Table 2 of the paper (when accepted) says Exact commands to complete train-to-evaluate pipelineBesides the commands for training, could you please complete the instructions (exact commands) in README to 1) train, 2) postprocess the training results, and 3) evaluate? Reopened the issue for datasetSince I found an issue of Feynman datasets in PMLB, I reopened the issue #54 . Could you please address this issue as well? Thank you so much |
You'll notice there aren't 15 seeds up there. As I mentioned above,
This is probably unnecessarily complicated, so I'm planning to update SEEDS.py to make the seeds contiguous for reproducing.
I'll update the readme. In the mean time please read my steps above. |
My bad, I didn't count them up actually.
Thank you, I'll be looking forward to the updates |
hi @yoshitomo-matsubara , see the README updates |
Hi @lacava |
Thank @lacava for helping me resolve the dataset issue last time.
Based on the command in README, I tried to reproduce the results reported in Figure 3 of your recently accepted paper for both Strogatz and Feynman datasets, and found some concerns/questions.
1. How should we see the produced results?
For Strogatz dataset, I ran
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
Following that, there were many json files produced. In
strogatz_bacres1_tuned.FE_AFPRegressor_15795.json
(AFP_FE), I found the following values:I think a)
r2_test
is called Accuracy in the paper, b)symbolic_model
means the symbolic expression as a result of training onstrogatz_bacres1
c) whereas the true symbolic expression is associated withtrue_model
.Is my understanding correct for all a), b), c)?
Also, is the above
symbolic_model
expected as output of AFP_FE forstrogatz_bacres1
? Since the method is the 2nd best for ground-truth datasets shown in Fig. 3, and I expected a clearer expression.2. How is the solution rate derived?
Could you please clarify how the solution rate in Fig. 3 is derived?
Did you manually compare the produced expression
symbolic_model
to the true expressiontrue_model
and consider it solved only when the produced expression exactly matches the true one?Or if it is fully based on
Definition 4.1 (Symbolic Solution).
in the paper, what values ofa
andb
are used in Fig. 3?3. Operon build failed
On Ubuntu 18.04 and 20.04, operon build with your provided
install.sh
failed due to version discrepancy between libceres-dev (expects Eigen 3.4.0) and libeigen3-dev (the latest available version is 3.3.7). I even tried to build Eigen v3.4.0 from source, but still the build failed.Do you remember how you setup the dependencies for Operon?
4. Commands to reproduce the results in Fig. 3
Could you provide the exact commands to reproduce the results in Fig. 3?
For Strogatz datasets with target noise = 0.0, I think the following command was used
python analyze.py -results ../results_sym_data -target_noise 0.0 "/path/to/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned
but how about Feynman datasets?
Also, how should we determine
-time_limit
?5. Computing resource and estimated runtime
To estimate how long it will take to reproduce the results in Fig. 3, could you share the detail of computing resource used in the paper e.g., how many machines of
24-28 core Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz chipsets and 250 GB of RAM
are used and (rough) estimated runtime to get the results if you remember?On a machine with 4-core CPU, 128GB RAM and 2 GPUs, even
strogatz_bacres1
(400 samples) is taking more than a day to completepython analyze.py -results ../results_sym_data -target_noise 0.0 /path/to/pmlb/datasets/strogatz_bacres1/strogatz_bacres1.tsv.gz -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
Sorry for many questions, but your responses would be really appreciated and helpful for using this great work in my research.
Thank you!
The text was updated successfully, but these errors were encountered: