Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: remove output messages, restricted result range #3

Closed
vkhodygo opened this issue Mar 23, 2018 · 11 comments
Closed

Feature request: remove output messages, restricted result range #3

vkhodygo opened this issue Mar 23, 2018 · 11 comments

Comments

@vkhodygo
Copy link
Contributor

Hi again!

Sorry for disturbing, but I would like to ask you a few more things.
First, is it possible to add a key parameter which turns off all the output messages (except errors)?
I have to deal with massive numbers of files and I need to see my own print messages.

Second, can your algorithm in general use boundaries for resulting parameters? Say, I know in general, that the values of slopes a priori lie in range [0:2], thus, the result has to be in the range.

Sincerely,
V.

@cjekel
Copy link
Owner

cjekel commented Mar 24, 2018

First, is it possible to add a key parameter which turns off all the output messages (except errors)?
I have to deal with massive numbers of files and I need to see my own print messages.

you can use something like

fit(4,disp=False)

to fit to four line segments while turning off the optimization output...

What is sometimes printed often are numpy warnings, i.e. divide by zero. You can look at https://stackoverflow.com/questions/14463277/how-to-disable-python-warnings to disable warnings in Python.

Does this help?

The only print() in the code is

print(res)

which displays the optimization results?

Would you like a keyword to turn this off?

Second, can your algorithm in general use boundaries for resulting parameters? Say, I know in general, that the values of slopes a priori lie in range [0:2], thus, the result has to be in the range.

The fit doesn't solve for the slopes, it actually solves for the y locations provided x break points. From this solution the slopes can be calculated.

I'm working on implementing fixing the x,y locations at the boundaries (beginning and end). There has been a few people who have requested this, but it's not ready (or working) at the moment.

@vkhodygo
Copy link
Contributor Author

you can use something like

fit(4,disp=False)

No, I still get messages like

fun: 4.895025824377757e-07
message: 'Optimization terminated successfully.'
nfev: 393
nit: 12
success: True
x: array([2.6693644 , 3.32241918])
And when I try to use fitfast, it becomes even worse (especially with 4 cores running)

What is sometimes printed often are numpy warnings, i.e. divide by zero.

I observe some messages, indeed, they look like this:

RuntimeWarning: invalid value encountered in double_scalars
  A[i,i] = A[i,i] + sum(((sepDataX[i] - breaks[i+1]) ** 2)) / ((breaks[i+1] - breaks[i]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:248: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i+1] = A[i,i+1] - sum((sepDataX[i] - breaks[i]) * (sepDataX[i] - breaks[i+1])) / ((breaks[i+1] - breaks[i]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:249: RuntimeWarning: invalid value encountered in double_scalars
  B[i] = B[i] + (-sum(sepDataX[i] * sepDataY[i]) + breaks[i+1] * sum(sepDataY[i])) / (breaks[i+1] - breaks[i])
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:241: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i-1] = A[i,i-1] - sum((sepDataX[i-1] - breaks[i-1]) * (sepDataX[i-1] - breaks[i])) / ((breaks[i] - breaks[i-1]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:242: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i] = A[i,i] + sum((sepDataX[i-1] - breaks[i-1]) ** 2) / ((breaks[i] - breaks[i-1]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:243: RuntimeWarning: invalid value encountered in double_scalars
  B[i] = B[i] + (sum(sepDataX[i-1] * sepDataY[i-1]) - breaks[i-1] * sum(sepDataY[i-1])) / (breaks[i] - breaks[i-1])

and

/home/user/.local/lib/python3.6/site-packages/numpy/core/_methods.py:112: RuntimeWarning: invalid value encountered in subtract
  x = asanyarray(arr - arrmean)

Is that what you mean?

Would you like a keyword to turn this off?

Yes, that would be great. Since I can't be sure that results in such cases are correct, I need to know what datasets lead to this (and I have a few thousands =/).

I'm working on implementing fixing the x,y locations at the boundaries (beginning and end). There has been a few people who have requested this, but it's not ready (or working) at the moment.

Well, that means that I have to do more things manually.

P.S. It seems that your fitfast doesn't work as planned, however, I need to check, what data breaks it, and open a new issue.

@cjekel
Copy link
Owner

cjekel commented Mar 25, 2018

Thanks for the PR!

  1. 66c0245 now defaults to prints being off. Use
piecewise_lin_fit(x, y, disp_res=True)

to turn prints on. This doesn't get rid of numpy warnings.

  1. 9959ec2 fitfast() now defaults to a population of 2. This should be faster than the differential evolution for all cases, at the cost of possibly not finding a good solution. Increase the population of fitfast() to find a better solution.

  2. Can you describe you application for boundary slopes? Are you trying to force a solution range, or speed up the optimization implementation?

@vkhodygo
Copy link
Contributor Author

Good, now I can see only important messages!
I haven't tried to use updated fitfast yet, hope it works properly now.
I know, that slopes can't be, say, negative and greater than 2. Thus, all values that are not in range can't be accepted. I know, that, for example, scipy allows to use boundaries for ranges, but your package is better for my purposes.

P.S. Your updated version definitely looks faster, however, now I get very strange results. This drives me crazy %)
msd_plot 6
_msd_plot 0 1 3_0 22
Same dataset (msd shifted), but correct (at least acceptable) results only in the case of old algo. I use default code from your examples and 3 segments to fit the data.

@cjekel
Copy link
Owner

cjekel commented Apr 16, 2018

Edit*:
I've found a working example that breaks... will be working on a hotfix. Sorry about this.

Can you send me that msd shifted dataset to troubleshoot? Or reproduce on a simple data set? Are you using version 0.2.3?

In the meantime you can revert to the old release by running

[sudo] pip uninstall pwlf
[sudo] pip install pwlf==0.1.7 

@cjekel
Copy link
Owner

cjekel commented Apr 16, 2018

Fixed the weird prediction issues in 0.2.4. Sorry about that, not sure how that escaped my test function! I need to think about that more...

I know, that slopes can't be, say, negative and greater than 2. Thus, all values that are not in range can't be accepted. I know, that, for example, scipy allows to use boundaries for ranges, but your package is better for my purposes.

Okay. I need to think about this more, but I think it can be done by setting up inequality constraints.

They would look something like
b_l <= b_1 <= b_h
Then
1 <= b_1/b_l
and
1 >= b_1/b_h

this might be useful for future https://math.stackexchange.com/questions/69613/linear-least-squares-with-inequality-constraints

@vkhodygo
Copy link
Contributor Author

Okay. I need to think about this more, but I think it can be done by setting up inequality constraints.

Thank you.
The attached file is one of those with strange values. The first column contains time, the last one is the data I need. I import it, skip the first row with zero values and use numpy.log10() to linearize it.

data = pd.read_csv(file, sep='\s+', engine='python', usecols=(0, 3), skiprows=1, names=('lag', 'msd_shift'))
lag = np.log10(data['lag'].as_matrix())
msd_shift = np.log10(data['msd_shift'].as_matrix())

data.zip

@cjekel
Copy link
Owner

cjekel commented Apr 16, 2018

Works good in version 0.2.4!
temp

import numpy as np
import matplotlib.pyplot as plt
import pwlf
import pandas as pd
data = pd.read_csv('ref_msd.16.bin', sep='\s+', engine='python', usecols=(0, 3), skiprows=1, names=('lag', 'msd_shift'))
lag = np.log10(data['lag'].as_matrix())
msd_shift = np.log10(data['msd_shift'].as_matrix())
myPWLF = pwlf.PiecewiseLinFit(lag, msd_shift)
myPWLF.fit(3)
yhat = myPWLF.predict(lag)
plt.figure()
plt.plot(lag, msd_shift, 'o')
plt.plot(lag, yhat)
plt.show()

@vkhodygo
Copy link
Contributor Author

vkhodygo commented Apr 16, 2018

Great!
However, could you please take a look at the values of slopes: they are a little bit strange:
myPWLF.slopes
array([ 1.89755427, -0.67540119, 2.00421613])

Upd. I'm pretty sure, that the plot itself is correct. I have found the old one based on the initial algorithm. It seems that the first slope is identical in both cases, however, it's not clear what gives such behaviour.
msd_plot 16

Upd2. I feel that this is related to this question.

@cjekel
Copy link
Owner

cjekel commented Apr 16, 2018

I ended up creating a new function to evaluate the slopes by predicting at the break points. The results should be similar to the previous verision. See d59e0be for the function.

I will push 0.2.5 to pypi shortly .

Edit: My previous interpretation of the beta parameters as slopes was incorrect. New 0.2.5 release should give you similar slope values as you had before.

@vkhodygo
Copy link
Contributor Author

vkhodygo commented Apr 16, 2018

I've updated it but no result so far. I still get the same values:
>>> myPWLF.slopes
array([ 1.89755342, -0.67540057, 2.00421982])
>>> myPWLF.beta
array([-4.09022781, 1.89755342, -0.67540057, 2.00421982])

Edit. Sorry, everything works fine, I'm simply used to python3 (and pip3) and your instructions for the upgrade are for p2.
Edit #2: I think, that it works like it has to. You can close this issue.

@vkhodygo vkhodygo closed this as completed May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants