-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ground-truth datasets are broken? #54
Comments
can you confirm you ran 'git lfs fetch' in the pmlb repo? looks like they may be git lfs references still. i need to update the instructions as well since feynman and strogatz datasets are now in master in pmlb |
Hi @lacava Yes, I did run |
I think we need |
glad you found a solution. i believe |
Thank you for updating the repo! I'll close this issue |
@lacava
|
thanks for checking. hm, some of the changes didn't make it into master... i'll look into it. |
issued a PR on PMLB to resolve: EpistasisLab/pmlb#158 |
@lacava |
merged, please update PMLB |
Hi, I am trying this out myself now, and getting an error with all Strogatz problems this time (Feynman's run fine).
I do see that the "true_model" field in the .json results for Strogatz includes a trailing $ at the end. Perhaps it suffices to add a
in symbolic_utils.get_sym_model? I'd do a PR but I am not sure whether this is (somehow) a problem only I got, since I see nobody else raising it. EDIT: removing the $ is not enough |
you caught a set of changes I hadn't pushed into PMLB. once the checks complete on EpistasisLab/pmlb#160, you can update from the pmlb master branch. srbench/experiment$ python assess_symbolic_model.py ../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz -ml tuned.GPGOMEARegressor -results ../../analysis/results_sym_data_new/strogatz_shearflow2/ -seed 860
{'INPUT_FILE': '../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz', 'ALG': 'tuned.GPGOMEARegressor', 'RDIR': '../../analysis/results_sym_data_new/strogatz_shearflow2/', 'RANDOM_STATE': 860, 'TEST': False, 'Y_NOISE': 0.0, 'X_NOISE': 0.0, 'SYM_DATA': False, 'JSON_FILE': ''}
========================================
Assessing tuned.GPGOMEARegressor model for
../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz
========================================
looking for: ../../analysis/results_sym_data_new/strogatz_shearflow2//strogatz_shearflow2_tuned.GPGOMEARegressor_860.json
> /mnt/d/projects/symbolic-regression/srbench/experiment/symbolic_utils.py(244)get_sym_model()
-> model_sym = parse_expr(model_str,
(Pdb) c
compression: gzip
filename: ../../../pmlb/datasets/strogatz_shearflow2/strogatz_shearflow2.tsv.gz
replacing feature 0 with x
replacing feature 1 with y
parsing 0.000170+2.307729*(((((cos(sin(y))*PLOG(PLOG(14.465000)))*cos((cos(y)/(-11.097000--13.964000))))+cos(-20.929000))*sin(x)))
{'x': x, 'y': y, 'add': <class 'sympy.core.add.Add'>, 'mul': <class 'sympy.core.mul.Mul'>, 'max': Max, 'min': Min, 'sub': <function sub at 0x7f4e8ed32790>, 'div': <function div at 0x7f4e8d7e2040>, 'square': <function square at 0x7f4e8d7e20d0>, 'cube': <function cube at 0x7f4e8d7e2160>, 'quart': <function quart at 0x7f4e8d7e21f0>, 'PLOG': <function PLOG at 0x7f4e8d7e2280>, 'PLOG10': <function PLOG at 0x7f4e8d7e2280>, 'PSQRT': <function PSQRT at 0x7f4e8d7e23a0>}
round_floats
rounded: 2.31*(0.983*cos(sin(y))*cos(0.349*cos(y)) - 0.487)*sin(x)
simplify...
simplified: (2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x)
saving...
sym_diff: -(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x) + (0.1*sin(y)**2 + cos(y)**2)*sin(x)
sym_frac: (2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)/(0.1*sin(y)**2 + cos(y)**2)
simplified sym_diff: (-0.9*sin(y)**2 - 2.27*cos(sin(y))*cos(0.349*cos(y)) + 2.12)*sin(x)
{
"dataset": "strogatz_shearflow2",
"algorithm": "tuned.GPGOMEARegressor",
"params": {
"caching": false,
"classweights": false,
"elitism": 1,
"erc": true,
"evaluations": 1000000,
"functions": "+_-_*_p/_plog_sqrt_sin_cos",
"generations": -1,
"gomea": true,
"gomfos": "LT",
"ims": false,
"initmaxtreeheight": 6,
"linearscaling": true,
"maxsize": 1000,
"maxtreeheight": 17,
"parallel": false,
"popsize": 1000,
"prob": "symbreg",
"reproduction": 0.0,
"sbagx": 0.0,
"sblibtype": false,
"sbrdo": 0.0,
"seed": -1,
"silent": true,
"subcross": 0.5,
"submut": 0.5,
"syntuniqinit": 1000,
"time": 28800,
"tournament": 4,
"unifdepthvar": true
},
"random_state": 860,
"process_time": 133.882689869,
"time_time": 133.97960495948792,
"target_noise": 0.0,
"feature_noise": 0.0,
"true_model": "(0.1*sin(y)**2 + cos(y)**2)*sin(x)",
"model_size": 21,
"symbolic_model": "0.000170+2.307729*(((((cos(sin(x1))*plog(plog(14.465000)))*cos((cos(x1)p/(-11.097000--13.964000))))+cos(-20.929000))*sin(x0)))",
"mse_train": 1.2889293272279269e-06,
"mae_train": 0.0008988973869460174,
"r2_train": 0.9999751463811574,
"mse_test": 1.3140537910085879e-06,
"mae_test": 0.0009068998433162603,
"r2_test": 0.9999816769614173,
"simplified_symbolic_model": "(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)*sin(x)",
"simplified_complexity": 15,
"symbolic_error": "(-0.9*sin(y)**2 - 2.27*cos(sin(y))*cos(0.349*cos(y)) + 2.12)*sin(x)",
"symbolic_fraction": "(2.27*cos(sin(y))*cos(0.349*cos(y)) - 1.12)/(0.1*sin(y)**2 + cos(y)**2)",
"symbolic_error_is_zero": false,
"symbolic_error_is_constant": false,
"symbolic_fraction_is_constant": false
}
saving...
done. |
EpistasisLab/pmlb#160 was merged. update PMLB from git and you should be good to go. |
Hi!
Thank you for your great work and framework!
I wanted to try the benchmarked methods for the ground-truth datasets (i.e., Feynman and Strogatz datasets) and followed the instructions in README.
Is each of the datasets not in gzip format?
However, the datasets fetched from the pmlb repository look broken. Here is one of the errors I got when running
python analyze.py -results ../results_sym_data -target_noise 0.0 "/data/pmlb/datasets/strogatz*" -sym_data -n_trials 10 -time_limit 9:00 -tuned --local
for Strogatz dataset. (Same errors occurred for Feynman dataset by "/data/pmlb/datasets/feynman_*" as well)
I also tried to manually gunzip the file, but the error message still says it's not in gzip format
Could you please resolve this issue for both Feynman and Strogatz datasets?
Thank you!
The text was updated successfully, but these errors were encountered: