Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get polynomials coefficients? #44

Closed
joaoantoniocardoso opened this issue Aug 3, 2019 · 12 comments
Closed

How to get polynomials coefficients? #44

joaoantoniocardoso opened this issue Aug 3, 2019 · 12 comments

Comments

@joaoantoniocardoso
Copy link

joaoantoniocardoso commented Aug 3, 2019

Hi,

Is there a way to get the polynomial coefficients in a similar way it was provided in numpy.polyfit?

I did the following as a simple example to illustrate what I was thinking:

import numpy as np
import matplotlib.pyplot as plt
import pwlf

# create the data
x = np.arange(10)
y = np.zeros(10)
y[:5] = 0 + 2 * x[:5]
y[5:] = 20 - 3 * x[5:]

degree = 1
segments = 2

my_pwlf = pwlf.PiecewiseLinFit(x, y, degree=degree, disp_res=True)
res = my_pwlf.fit(segments)

# predict
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)

# get polynomial coefficients
def get_coeffs(pwlf) -> np.ndarray:
    coefficients = np.array(list(zip(pwlf.slopes, pwlf.intercepts)))
    return coefficients

coeffs = get_coeffs(my_pwlf)
p1 = np.poly1d(coeffs[0])
p2 = np.poly1d(coeffs[1])

breaks = my_pwlf.fit_breaks[1]

# test polynomial
ytest = np.zeros(len(x))
for i in range(len(x)):
    if x[i] <= breaks:
        ytest[i] = p1(x[i])
    else:
        ytest[i] = p2(x[i])

plt.figure()
plt.plot(x, y, 'o', label='data')
plt.plot(xHat, yHat, '-', label='predict')
plt.plot(x, ytest, '--', label='fit test')
plt.legend()
plt.show()

Thank you

@joaoantoniocardoso joaoantoniocardoso changed the title How to get coefficients in the example "fit constants or polynomials"? How to get polynomials coefficients? Aug 3, 2019
@cjekel
Copy link
Owner

cjekel commented Aug 3, 2019

The coefficients should be stored in my_pwlf.beta, you can also take a my_pwlf.slopes but it won't make sense with higher degree terms.

@joaoantoniocardoso
Copy link
Author

joaoantoniocardoso commented Aug 3, 2019

@cjekel thank you for your attention, could you provide more information about how to interpret my_pwlf.beta for n segments? I am really confused..

As in the example that i gave above, I got:

my_pwlf.beta: [-3.91201469e-07,  2.00000039e+00, -5.00000028e+00]  
my_pwlf.fit_breaks: [0.        , 3.99999968, 9.        ]  
my_pwlf.intercepts: [-3.91201469e-07,  1.99999991e+01]  
my_pwlf.slopes: [ 2.00000039, -2.99999989]

For this my_pwlf.beta, how do I get the coefficients [2, 0] for the first segment and [-3, 20] for the second segment?

@joaoantoniocardoso
Copy link
Author

joaoantoniocardoso commented Aug 3, 2019

Maybe right now I got something - for a polynomial described as y(x) = a + b * x + c * x^2 + ..., the my_pwlf.beta is in this format:

beta = np.array([
    [???, # really didn't know what is this first element
    [b0, b1 - b0, b2 - b1 - b0, ...],
    [c0, c1 - c0, c2 - c1 - c0, ...],
    ...
]).flatten()

To illustrate how I would extract the coefficients from it:

print('Polynomial coefficient in a format of: y(x) = a + b*x + c*x^2 ...')

symbol = list('abcdefghijklmnopqrstuvwxyz')
betas = my_pwlf.beta[1:].reshape((my_pwlf.degree, my_pwlf.n_segments))
alphas = my_pwlf.intercepts

p = np.zeros((my_pwlf.degree + 1, my_pwlf.n_segments))

for segment in range(my_pwlf.n_segments):
    print(f'segment {segment}:')
    for coeff in range(my_pwlf.degree +1):
        if coeff is my_pwlf.degree:
            p[segment, coeff] = alphas[segment]
        else:
            p[segment, coeff] = betas[coeff-1,:segment+1].sum()
        print(f'\t {symbol[coeff]}{segment} = {p[segment, coeff]}')

print(p)

let me know if it is partially correct in some way

@cjekel
Copy link
Owner

cjekel commented Aug 4, 2019

For degree = 1, the beta vector is seen in Equation 4 https://github.com/cjekel/piecewise_linear_fit_py/blob/master/paper/pwlf_Jekel_Venter_v2.pdf

Essentially the order goes from left to right. The first value is the y intecept. The seocond value the slope, the third value is the change of slope of the next line, the fourth value is the change of slope between the third and second line...

I can give more detailed answer tomorrow.

@joaoantoniocardoso
Copy link
Author

joaoantoniocardoso commented Aug 4, 2019

Hi, I have refactored in a more organized way in this .ipynb gist.

So far I think my coefficients is not fully corrected because it fails at higher degree than 3 for some simple fake data. For some more complex (but smooth - continous) real data it is only working for degree = 1. Although the pwlf.predict() method is always working for these data for various degrees and segments.

I am kind new to all this regression stuff so I don't fully understand your paper yet, I would appreciate with some help to extract the correct coefficients in the classical format for polynomials.

Thanks again for your attention

@cjekel
Copy link
Owner

cjekel commented Aug 4, 2019

@cjekel thank you for your attention, could you provide more information about how to interpret my_pwlf.beta for n segments? I am really confused..

As in the example that i gave above, I got:

my_pwlf.beta: [-3.91201469e-07,  2.00000039e+00, -5.00000028e+00]  
my_pwlf.fit_breaks: [0.        , 3.99999968, 9.        ]  
my_pwlf.intercepts: [-3.91201469e-07,  1.99999991e+01]  
my_pwlf.slopes: [ 2.00000039, -2.99999989]

For this my_pwlf.beta, how do I get the coefficients [2, 0] for the first segment and [-3, 20] for the second segment?

So in this case, the first line involves the first two parameters, where .beta[0] is the y-intercept, and .beta[1] is the slope. Then the second line is defined by .beta[2], where the slope of the second line is .beta[1] + .beta[2]. This pattern continues for additional line segments.

You could then use point-slope form to obtain the equations of each line.


Now for higher degrees which is a bit more complicated. I essentially assemble all of the degree=1 terms first, then all of the degree=2 second. The code to do this is in the assemble_regression_matrix method, and looks like this:

        # Assemble the regression matrix
        A_list = [np.ones_like(x)]
        if self.degree >= 1:
            A_list.append(x - self.fit_breaks[0])
            for i in range(self.n_segments - 1):
                A_list.append(np.where(x > self.fit_breaks[i+1],
                                       x - self.fit_breaks[i+1],
                                       0.0))
            if self.degree >= 2:
                for k in range(2, self.degree + 1):
                    A_list.append((x - self.fit_breaks[0])**k)
                    for i in range(self.n_segments - 1):
                        A_list.append(np.where(x > self.fit_breaks[i+1],
                                               (x - self.fit_breaks[i+1])**k,
                                               0.0))

Let's take a look at this example which fits five line segments, each of degree=2. The mathematical equation of such a line is shown here. I've uploaded the jupyter notebook of this here.

import numpy as np
import pwlf
# generate sin wave data
x = np.linspace(0, 10, num=100)
y = np.sin(x * np.pi / 2)
# add noise to the data
y = np.random.normal(0, 0.05, 100) + y
my_pwlf_2 = pwlf.PiecewiseLinFit(x, y, degree=2)
res2 = my_pwlf_2.fitfast(5, pop=50)

The first line is defined by:

# first line
xhat = np.linspace(my_pwlf_2.fit_breaks[0], my_pwlf_2.fit_breaks[1], 100)
yhat = (my_pwlf_2.beta[0] +
        my_pwlf_2.beta[1]*(xhat-my_pwlf_2.fit_breaks[0]) +
        my_pwlf_2.beta[6]*(xhat-my_pwlf_2.fit_breaks[0])**2)

The second line is defined by:

# second segment
xhat2 = np.linspace(my_pwlf_2.fit_breaks[1], my_pwlf_2.fit_breaks[2], 100)
yhat2 = (my_pwlf_2.beta[0] +
         (my_pwlf_2.beta[1])*(xhat2-my_pwlf_2.fit_breaks[0]) +
         (my_pwlf_2.beta[2])*(xhat2-my_pwlf_2.fit_breaks[1]) +
         (my_pwlf_2.beta[6])*(xhat2-my_pwlf_2.fit_breaks[0])**2 +
         (my_pwlf_2.beta[7])*(xhat2-my_pwlf_2.fit_breaks[1])**2)

The third line is defined by:

# Third segment
xhat3 = np.linspace(my_pwlf_2.fit_breaks[2], my_pwlf_2.fit_breaks[3], 100)
yhat3 = (my_pwlf_2.beta[0] +
         (my_pwlf_2.beta[1])*(xhat3-my_pwlf_2.fit_breaks[0]) +
         (my_pwlf_2.beta[2])*(xhat3-my_pwlf_2.fit_breaks[1]) +
         (my_pwlf_2.beta[3])*(xhat3-my_pwlf_2.fit_breaks[2]) +
         (my_pwlf_2.beta[6])*(xhat3-my_pwlf_2.fit_breaks[0])**2 +
         (my_pwlf_2.beta[7])*(xhat3-my_pwlf_2.fit_breaks[1])**2 +
         (my_pwlf_2.beta[8])*(xhat3-my_pwlf_2.fit_breaks[2])**2)

The fourth line is defined by:

# Fourth segment
xhat4 = np.linspace(my_pwlf_2.fit_breaks[3], my_pwlf_2.fit_breaks[4], 100)
yhat4 = (my_pwlf_2.beta[0] +
         (my_pwlf_2.beta[1])*(xhat4-my_pwlf_2.fit_breaks[0]) +
         (my_pwlf_2.beta[2])*(xhat4-my_pwlf_2.fit_breaks[1]) +
         (my_pwlf_2.beta[3])*(xhat4-my_pwlf_2.fit_breaks[2]) +
         (my_pwlf_2.beta[4])*(xhat4-my_pwlf_2.fit_breaks[3]) +
         (my_pwlf_2.beta[6])*(xhat4-my_pwlf_2.fit_breaks[0])**2 +
         (my_pwlf_2.beta[7])*(xhat4-my_pwlf_2.fit_breaks[1])**2 +
         (my_pwlf_2.beta[8])*(xhat4-my_pwlf_2.fit_breaks[2])**2 +
         (my_pwlf_2.beta[9])*(xhat4-my_pwlf_2.fit_breaks[3])**2)

The fifth line is defined by:

# Fifth segment
xhat5 = np.linspace(my_pwlf_2.fit_breaks[4], my_pwlf_2.fit_breaks[5], 100)
yhat5 = (my_pwlf_2.beta[0] +
         (my_pwlf_2.beta[1])*(xhat5-my_pwlf_2.fit_breaks[0]) +
         (my_pwlf_2.beta[2])*(xhat5-my_pwlf_2.fit_breaks[1]) +
         (my_pwlf_2.beta[3])*(xhat5-my_pwlf_2.fit_breaks[2]) +
         (my_pwlf_2.beta[4])*(xhat5-my_pwlf_2.fit_breaks[3]) +
         (my_pwlf_2.beta[5])*(xhat5-my_pwlf_2.fit_breaks[4]) +
         (my_pwlf_2.beta[6])*(xhat5-my_pwlf_2.fit_breaks[0])**2 +
         (my_pwlf_2.beta[7])*(xhat5-my_pwlf_2.fit_breaks[1])**2 +
         (my_pwlf_2.beta[8])*(xhat5-my_pwlf_2.fit_breaks[2])**2 +
         (my_pwlf_2.beta[9])*(xhat5-my_pwlf_2.fit_breaks[3])**2 +
         (my_pwlf_2.beta[10])*(xhat5-my_pwlf_2.fit_breaks[4])**2)

And the final picture of the fit is
Pretty picture of five segments with degree=2 fit to sine wave

Hopefully this will help you understand how polynomial coefficients are handled in pwlf. When you look at the mathematical expression, pwlf.predict feels almost magical!

@cjekel
Copy link
Owner

cjekel commented Aug 4, 2019

@joaoantoniocardoso Please let me know if this helps! I really need to update the paper. I've made some significant changes to the library since the 0.4.1 release that aren't covered in the paper.

@joaoantoniocardoso
Copy link
Author

@cjekel Hi! wow, thank you so much, your explanation was very good and now I understood the standard of this beta coefficients.

In a similar math I was able to code a function to convert to the parameters in the way that I need in my application (y(x) = a + b*x + c*x^2 +...) so I have refactored this .ipynb gist to demonstrate it working.

But because it seems very complex to me to find a generalized pattern for a 'n' degree, it only works for degrees of 1 to 3, that fit very well in my application.

Again, thank you for such attention, you done an incredible job programming and explaining it, congrats!

@cjekel
Copy link
Owner

cjekel commented Aug 8, 2019

@joaoantoniocardoso I was looking at your ipynb and noticed that the very first data point has a discrepancy between predict and coeffs test? Is this an artifact in plotting, or a real discrepancy? If there is an issue, it appears it's only in the first segment.

I think perhaps the logic of test_coefs should be swapped to:

(x[i] >= breaks[segment]) and (x[i] < breaks[segment+1])

Otherwise, your implementation shows that you have a full working understanding of this library!


I had an epiphany this morning about how to use sympy to construct these equations. (If you don't mind using sympy). It should work for degrees > 3 if you ever need them.

from sympy import Symbol
from sympy.utilities import lambdify
x = Symbol('x')

def get_symbolic_eqn(pwlf_, segment_number):
    if pwlf_.degree < 1:
        raise ValueError('Degree must be at least 1')
    if segment_number < 1 or segment_number > pwlf_.n_segments:
        raise ValueError('segment_number not possible')
    # assemble degree = 1 first
    for line in range(segment_number):
        if line == 0:
            my_eqn = pwlf_.beta[0] + (pwlf_.beta[1])*(x-pwlf_.fit_breaks[0])
        else:
            my_eqn += (pwlf_.beta[line+1])*(x-pwlf_.fit_breaks[line])
    # assemble all other degrees
    if pwlf_.degree > 1:
        for k in range(2, pwlf_.degree + 1):
            for line in range(segment_number):
                beta_index = pwlf_.n_segments*(k-1) + line + 1 
                my_eqn += (pwlf_.beta[beta_index])*(x-pwlf_.fit_breaks[line])**k
    return my_eqn.simplify()

Note that you could improve the performance of this by saving the previous segment equation.

I added code to run this example at the end of this notebook

@joaoantoniocardoso
Copy link
Author

Man your solution is just amazing, I though going into this direction but I have a very limited time window for using this in a project so I didn't tried.. Don't you think that maybe it would be interest to add it to your library as a feature to be compatible with numpy.poly1d?

And yeah, you was right about test_coefs, thank you again!

@cjekel
Copy link
Owner

cjekel commented Aug 25, 2019

I don't think (right now) I want to add sympy as a requirement to this library, but maybe I'll change my mind in the future.

At the very least, I should add this as an example, since people have asked about equations before.

You bring up a good point, let me reopen this until something is added to the library.

@cjekel cjekel reopened this Aug 25, 2019
@cjekel
Copy link
Owner

cjekel commented Jan 23, 2020

Added example here e57d106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants