## What I'm trying to do...
I am trying to fit the LPPL model to an S&P 500 dataset using the new fitting method outlined in section IV of [this paper](https://arxiv.org/pdf/1108.0099.pdf). The model can currently be fit to a dataset using a metaheuristic such as taboo search or genetic algorithm. However, in the paper linked to above, the researchers modified the model so as to avoid the use of metaheuristics in fitting the algorithm. Below has more info about the LPPL model; it is paraphrased from the paper.  

___

## The LPPL Model
The LPPL model provides a flexible framework to detect bubbles and predict regime changes of a financial asset. A bubble is defined as a faster-than-exponential increase in asset price, that reflects positive feedback loop of higher return anticipations competing with negative feedback spirals of crash expectations. It models a bubble price as a power law with a finite-time singularity decorated by oscillations with a frequency increasing with time. Here is an example of the LPPL model fitted to the Hang Seng Index from ~87-89.


<img src="images/hang_seng_index_87-89.png" alt="Hang Seng Index 87-89" width="500"/>


Here is the model:

$$E[\text{ln }p(t)] = A + B(t_c - t)^m + C(t_c - t)^m cos(\omega ln(t_c - t) - \phi) \tag{1}$$

where:

- $E[\text{ln }p(t)] :=$ expected log price at the date of the termination of the bubble
- $t_c :=$ critical time (date of termination of the bubble and transition in a new regime) 
- $A :=$ expected log price at the peak when the end of the bubble is reached at $t_c$
- $B :=$ amplitude of the power law acceleration
- $C :=$ amplitude of the log-periodic oscillations
- $m :=$ degree of the super exponential growth
- $\omega :=$ scaling ratio of the temporal hierarchy of osciallations
- $\phi :=$ time scale of the oscillations

The model has three components representing a bubble. The first, $A + B(t_c - t)^m$, handles the hyperbolic power law. For $m < 1$ when the price growth becomes unsustainable, and at $t_c$ the growth rate becomes infinite. The second term, $C(t_c - t)^m$, controls the amplitude of the oscillations. It drops to zero at the critical time $t_c$. The third term, $cos(\omega ln(t_c - t) - \phi)$, models the frequency of the osciallations. They become infinite at $t_c$.

___

## Fitting Procedure

The LPPL model has 4 non-linear parameters $(t_c,m,\omega,\phi)$ and 3 linear parameters $(A,B,C)$. They should be chosen with the goal to minimize the difference between the predicted values of the model $ln(\hat{p})$ and the real value $ln(p)$. This repersents a minimization problem with 3 linear and 4 non-linear parameters which have to be found. To decrease complexity of this task, equation (1) is rewritten. For this, two new parameters are introduced:

$$C_1 = C cos\phi, C_2 = C sin\phi \tag{2}$$

and now the equation (1) can be rewritten as:

$$\text{ln }E[p(t)] = A+B(t_c-t)^{m}+C_1(t_c-t)^{m}cos(\omega ln(t_c-t))+C_2(t_c-t)^{m} sin(\omega ln(t_c-t)) \tag{3}$$

By doing so, the model (3) now has 3 non-linear $(t_c,\omega,m)$ and 4 linear parameters $(A,B,C_1,C_2)$. To estimate the parameters which are fitted to the time series the least squares method with the following cost function (4) is used.

$$F(t_c,m,\omega,A,B,C_1,C_2) = \sum_{i=1}^{N} \left[\text{ln }p(\tau_{i}) - A - B(t_c-\tau_{i})^{m} - C_1(t_c-\tau_{i})^{m} cos(\omega ln(t_c-\tau_{i})) - C_2(t_c-\tau_{i})^{m} sin(\omega ln(t_c-\tau_{i}))\right]^{2} \tag{4}$$

where:

- $\tau_1 = t_1$
- $\tau_N = t_2$

Slaving the 4 linear parameters $A, B, C_1, C_2$ to the 3 nonlinear $t_c, \omega, m$ we obtain the nonlinear
optimization problem

$$\{\hat{t_c},\hat{m},\hat{\omega}\} = arg \min\limits_{t_c,m,\omega} F_1(t_c,m,\omega), \tag{5}$$

where the cost function $F_1(t_c,m,\omega)$ is given by

$$F_1(t_c,m,\omega) = \min\limits_{A,B,C_1,C_2} F(t_c,m,\omega,A,B,C_1,C_2) \tag{6}$$ 

The optimization problem $(\{\hat{A},\hat{B},\hat{C_1},\hat{C_2}\} = \text{arg} \min_{A,B,C_1,C_2} F(t_c,m,\omega,A,B,C_1,C_2))$ has a unique solution obtained from the matrix equation:


$$
    \begin{pmatrix}
        N & \sum{f_i} & \sum{g_i} & \sum{h_i}\\ 
        \sum{f_i} & \sum{f_i^{2}} & \sum{f_i g_i} & \sum{f_i h_i}\\
        \sum{g_i} & \sum{f_i g_i} & \sum{g_i^{2}} & \sum{g_i h_i}\\
        \sum{h_i} & \sum{f_i h_i} & \sum{g_i h_i} & \sum{h_i^{2}}\\
    \end{pmatrix}
    \begin{pmatrix}
        \hat{A}\\ 
        \hat{B}\\
        \hat{C_1}\\
        \hat{C_2}\\
    \end{pmatrix}
    =
    \begin{pmatrix}
        \sum{y_i}\\ 
        \sum{y_i f_i}\\
        \sum{y_i g_i}\\
        \sum{y_i h_i}\\
    \end{pmatrix}
    \tag{7}
$$

where:

- $y_i = \text{ln } p(\tau_i)$
- $f_i = (t_c - \tau_i)^{m}$
- $g_i = (t_c - \tau_i)^{m} cos(\omega \text{ln }(t_c-\tau_i))$
- $h_i = (t_c - \tau_i)^{m} sin(\omega \text{ln }(t_c-\tau_i))$

___

## Here's what I have so far...

In [1]:
import datetime
import itertools
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import random
from scipy.optimize import minimize
import seaborn as sns

In [2]:
data = pd.read_csv("data/sp500_10.2013-10.2018.csv", index_col="Date")
data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-10-22,2773.939941,2778.939941,2749.219971,2755.879883,2755.879883,3307140000
2018-10-23,2721.030029,2753.590088,2691.429932,2740.689941,2740.689941,4348580000
2018-10-24,2737.870117,2742.590088,2651.889893,2656.100098,2656.100098,4709310000
2018-10-25,2674.879883,2722.699951,2667.840088,2705.570068,2705.570068,4634770000
2018-10-26,2667.860107,2692.379883,2628.159912,2658.689941,2658.689941,4803150000


In [3]:
# organize DataSeries
date = data.index
tLen = len(data)
time = np.linspace(0, tLen-1, tLen)
close = [data["Adj Close"][i] for i in range(len(data["Adj Close"]))]
DataSeries = [time, close]

# revised version of the LPPL without φ
# found on page 11 as equation (13)
def lppl(t, x):
    a  = x[0] 
    b  = x[1]
    tc = x[2]
    m  = x[3]
    w  = x[4]
    c1 = x[5]
    c2 = x[6]
    return a - (b * np.power(tc - t, m)) - (c1 * np.power(tc - t, m)) * np.cos(w * np.log(tc - t)) - (c2 * np.power(tc - t, m)) * np.sin(w * np.log(tc - t))

# minimization func
def func(x):
    delta = [lppl(t,x) for t in DataSeries[0]]
    delta = np.subtract(DataSeries[1], delta)
    delta = np.power(delta, 2)
    return np.sum(delta)

# set limits for minimization func
limits = (
    [2655, 2660],   # A : plus or minus price at Critical Time ???
    [-1, -0.1],     # B :  B < 0
    [tLen-(tLen*.2), tLen+(tLen*.2)],  # Critical Time :
    [.1, .9],       # m : 0.1 ≤ m ≤ 0.9
    [6, 13],        # ω : 6 ≤ ω ≤ 13
    [-1, 1],        # c1 : |c1| < 1 ???
    [-1, 1],        # c2 : |c2| < 1 ???
)

In [None]:
size = 5
result = []
for i in range(size):
    print("Running {}".format(i))
    seed = [random.uniform(a[0], a[1]) for a in limits]
    
    # scipy optimize minimize
    # Minimize a function with variables subject to bounds, using gradient information in a truncated Newton algorithm. 
    # This method wraps a C implementation of the algorithm.
    cofs = minimize(fun=func, x0=seed, method='Nelder-Mead')
    
    # add to result for use in matrix equation ???
    result.append({
        'fit': func(cofs.x),
        'cof': cofs.x
    })
    print("Success: {}\nMessage: {}".format(cofs.success, cofs.message))
    print("Number of iterations: {}".format(cofs.nit))
    print("Number of evaluations of obj funcs: {}".format(cofs.nfev))
    print("-"*25)

print(result)

Running 0




Success: False
Message: Maximum number of function evaluations has been exceeded.
Number of iterations: 156
Number of evaluations of obj funcs: 1403
-------------------------
Running 1
Success: False
Message: Maximum number of function evaluations has been exceeded.
Number of iterations: 945
Number of evaluations of obj funcs: 1400
-------------------------
Running 2
Success: False
Message: Maximum number of function evaluations has been exceeded.
Number of iterations: 925
Number of evaluations of obj funcs: 1401
-------------------------
Running 3
Success: False
Message: Maximum number of function evaluations has been exceeded.
Number of iterations: 156
Number of evaluations of obj funcs: 1403
-------------------------
Running 4
