Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not handeling negative log evidence #3

Closed
harpsoe opened this issue Feb 17, 2012 · 11 comments
Closed

Not handeling negative log evidence #3

harpsoe opened this issue Feb 17, 2012 · 11 comments

Comments

@harpsoe
Copy link

harpsoe commented Feb 17, 2012

Dear Johannes

It seems that the analyse module break at negative log evidence.

def _read_error_line(self, l):
#print '_read_error_line', l
name, values = l.split(' ', 1)
name = name.strip(': ').strip()
v, error = values.split(" +/- ")
return name, float(v), float(error)

The strip(' ',1) breaks as negative log evidence gives a ' -' not a ' ', I tried a fix along the lines of:

def _read_error_line(self, l):
#print '_read_error_line', l
index = re.search("[-0-9]",l).start()
name = l[0:index]
values = l[index:]
name, values = l.split(':', 1)
name = name.rstrip().lstrip().strip(':')
v, error = values.split(" +/- ")
return name, float(v), float(error)

But it break the code when trying to detect number of modes somewhere in

def get_stats(self):
"""
information about the modes found:
mean, sigma, maximum a posterior in each dimension
"""
lines = file(self.stats_file).readlines()
text = "".join(lines)
parts = text.split("\n\n\n")
del parts[0]
stats = {
'modes':[]
}
# Global Evidence
self._read_error_into_dict(lines[0], stats)
i = 0
for p in parts:
modelines = p.split("\n\n")
mode = {
'index':i
}
i = i + 1
modelines1 = modelines[0].split("\n")
# Strictly local evidence
self._read_error_into_dict(modelines1[1], mode)
self._read_error_into_dict(modelines1[2], mode)
t = self._read_table(modelines[1], title = "Parameters")
mode['mean'] = t[:,1].tolist()
mode['sigma'] = t[:,2].tolist()
mode['maximum'] = self._read_table(modelines[1])[:,1].tolist()
mode['maximum a posterior'] = self._read_table(modelines[1])[:,1].tolist()
stats['modes'].append(mode)
return stats

Do you have a suggestion for a quick fix?

@JohannesBuchner
Copy link
Owner

Thanks for testing pymultinest so thoroughly and getting back to me. It will make pymultinest better for everyone.
Can you please post the MultiNest output file as well?

@harpsoe
Copy link
Author

harpsoe commented Feb 18, 2012

Hmm now I see all the files in /chains are generated by MultiNest not pymultinest. I seems like the 1-stats.dat is not fully formed:

Global Evidence: -0.443130807308E+01 +/- 0.747079403150E-01

Local Mode Properties

Total Modes Found: 1

End of File

So is this a bug or a feature in MultiNest

@JohannesBuchner
Copy link
Owner

I would expect it to give you the properties of that found mode. Perhaps MultiNest doesn't exit properly.

@harpsoe
Copy link
Author

harpsoe commented Feb 18, 2012

MultiNest seems to exit with:

ln(Z): -11.006900
ln(ev)= -11.006900334307353 +/- 7.38961598990944130E-002
Total Likelihood Evaluations: 22391
Sampling finished. Exiting MultiNest

Having thought more about it, is the value that MultiNest print the log evidence or the direct evidence? As a negative direct evidence is clearly nonsensical. But it does print ln(ev) while running, which is negative but that should be ok.

Hmm I tried this with version 2.14 of multinest too, and it seems to generate this result too, I have written to author of multinest and asked him if this is to be expected.

@JohannesBuchner
Copy link
Owner

I believe this is an issue outside of pymultinest somehow; The stats.dat file shouldn't end like that. The demo produces a file like this:

Global Evidence:    0.235825006777046781E+03  +/-    0.784753324484598613E-01

Local Mode Properties
-------------------------------------------

Total Modes Found:          18


Mode   1
Strictly Local Evidence    0.232979067703204976E+03 +/-    0.355212277502706675E+00
Local Evidence    0.232979067703204976E+03 +/-    0.946492425130537052E-01

Dim No.       Mean        Sigma
   1    0.125649213878636452E+02    0.102996570384157202E+00
   2    0.251418925939135036E+02    0.101946463036047305E+00

Maximum Likelihood Parameters
Dim No.        Parameter
   1    0.125692364179202940E+02
   2    0.251398510720762829E+02

MAP Parameters
Dim No.        Parameter
   1    0.124322864796754615E+02
   2    0.251018775839391743E+02


Mode   2
and so on....

I am unsure how I can help here.

Regarding the other question, MultiNest works on log evidences, and will not print without the log. so always expect ln(ev).

@harpsoe
Copy link
Author

harpsoe commented Feb 25, 2012

I have written to the guy who made MultiNest he said he would look at it.

@harpsoe
Copy link
Author

harpsoe commented Mar 7, 2012

I have investigated the issue some more. It seems that multinest by it self, when made into pure compiled code handles, negative loglikelihoods just fine. Even when I hack the eggboxC example into give negative loglikelihoods, so it does not seem to be the C interface eigther. It is something with the interplay between python and MultiNest.

I still get the issue with a ubuntu machine and python 2.6.
Also the file post-seperate is empty.

Can you confirm weather or not you get a similar error when solving a problem that ends up having negative log evidence? What version of compilers python and linux are you using?

@harpsoe
Copy link
Author

harpsoe commented Mar 8, 2012

Ok I think I figured it out. There is to tolerance parameters in the call to nestRun, one called tol and a another one called Ztol. It is not documented what they do, but apparently form reading the code it seems that modes which as a log evidence larger than Ztol, gets printed out in the output files, (I still have no idea what the parameter tol does)

So changeing the code in run.py form:

lib.run(c_int(multimodal), c_int(const_efficiency_mode),
c_int(n_live_points), c_double(evidence_tolerance),
c_double(sampling_efficiency), c_int(n_dims), c_int(n_params),
c_int(n_clustering_params), c_int(max_modes),
c_int(n_iter_before_update), c_double(evidence_tolerance),
outputfiles_basename, c_int(seed), wraps,
c_int(verbose), c_int(resume),
c_int(context))

to:

lib.run(c_int(multimodal), c_int(const_efficiency_mode),
c_int(n_live_points), c_double(evidence_tolerance),
c_double(sampling_efficiency), c_int(n_dims), c_int(n_params),
c_int(n_clustering_params), c_int(max_modes),
c_int(n_iter_before_update), c_double(-1e99),
outputfiles_basename, c_int(seed), wraps,
c_int(verbose), c_int(resume),
c_int(context))

So it seems that this parameter should be explicitly available to set, so that one can get the statistics out for modes that have small log-evidences.

@JohannesBuchner
Copy link
Owner

Great catch!

@JohannesBuchner
Copy link
Owner

tol determines to which accuracy the evidence should be calculated. If you look at the table of Jeffreys factors for model comparisons, https://en.wikipedia.org/wiki/Bayes_factor#Interpretation you see that a accuracy better than 0.5 in the logarithm of the evidence is not necessary.

@JohannesBuchner
Copy link
Owner

Thank you for investigating and reporting this issue.
It should be resolved by commit 1a2e15c; if not, please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants