# EF3M Implementation Testing
## Abstract
This notebook tests the implementation of the EF3M algorithm using synthetic data as well as the example used in the source literature. This notebook is intended to provide convincing evidence of the accuracy of this EF3M implementation.
## Introduction
The Exact Fit of the first 3 Moments (EF3M) algorithm allows the parameters of a mixture of Gaussian distributions to be estimated given the first 5 moments of the mixture distribution, as well as the assumption that the mixture distribution is composed of a number of Gaussian distributions. The algorithm, its development, and its original implementation are described in López de Prado, M. and M. Foreman (2014): "A mixture of Gaussians approach to mathematical portfolio oversight: The EF3M algorithm." _Quantitative Finance_, Vol. 14, No. 5, pp. 913-930. The implementation tested here can be found in the `EF3M\ef3m.py` module located in this directory of the repository. Three test cases are presented. First, the example from the algorithm's source paper is presented to confirm that we can replicate the results of literature. Second, a user-defined mixture of 2 Gaussian distributions is generated and the EF3M implementation is tasked to recover the original mixture parameters from the raw moments of the synthetic distribution. Third, the second case is executed using a series of randomly chosen parameters from which to compose the mixture distribution. The notebook user can alter this at will but should keep in mind that running a large number of trials with hundreds or thousands of rounds each will take hours or days even using the current multiprocessing implementation.
## Conclusion
The tests run in this notebook conclude that using the EF3M algorithm to generate a series of estimates followed by a kernel density estimation to determine the most likely value for each parameter from said series results in a recovery of the original mixture parameters to within a very close tolerance.
## Next Steps
The implementation tested here supports only distributions assumed to be composed of two Gaussian distributions. A natural next step would be to extend this algorithm to $n$ Gaussian distributions. Though this implementation will likely take exponentially more computational power, the utility in being able to fit more complex distributions would prove useful in more advanced cases.

## Test Cases

In [None]:
# imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from dask.diagnostics import ProgressBar
ProgressBar().register()
from scipy.stats import gaussian_kde

# import function from our custom module
from EF3M.ef3m import M2N, rawMoment

# helper function for finding most likely parameter estimates
def most_likely_parameters(data, ignore_columns='error', res=10_000):
    # Determines the most likely parameter estimate using a KDE
    # :param data: (pandas.DataFrame) contains parameter estimates from all runs
    # :param ignore_columns: (string, list) column or columns to exclude from analysis
    # :param res: (int) resolution of the kernel density estimate
    # :return: (dict) labels and most likely estimates for parameters
    # ===================================================
    df = data.copy()
    if isinstance(ignore_columns, str):
        ignore_columns = [ignore_columns]
    columns = [c for c in df.columns if c not in ignore_columns]
    d_results = {}
    for col in columns:
        x_range = np.linspace(df[col].min(), df[col].max(), num=res)
        kde = gaussian_kde(df[col].to_numpy())
        y_kde = kde.evaluate(x_range)
        top_value = round(x_range[np.argmax(y_kde)], 5)
        d_results[col] = top_value
        fig, ax = plt.subplots(figsize=(10,10))
        ax.plot(x_range, y_kde)
        ax = sns.distplot(df[col].to_numpy(), ax=ax, bins=100)
        plt.show()
    return d_results


### Case 1: Example data from source literature
López de Prado, M. and M. Foreman (2014): "A mixture of Gaussians approach to mathematical portfolio oversight: The EF3M algorithm." _Quantitative Finance_, Vol. 14, No. 5, pp. 913-930.

In [None]:
# moments and parameters from source literature
moments = [0.7, 2.6, 0.4, 25, -59.8]  # about the origin
epsilon = 10**-5
factor = 5  # this is the 'lambda' referred to in the paper

In [None]:
n_runs = 20_000

m2n = M2N(moments)
df2 = m2n.mpFit(moments, epsilon=10**-6, factor=5, n_runs=n_runs, variant=2, maxIter=10_000_000)
df2 = df2.sort_values('error')

results = most_likely_parameters(df2)

print(df2.head())
print(results)

### Case 2: User-defined single example

In [None]:
# testing algorithm for raw moments
central_moments = [0, 2.11, -4.3740, 30.8037, -153.5857]
dist_mean = 0.7
# central_moments: the first n (1...n) central moments as a list
# dist_mean: the mean of the distribution
# ====================================
# the first n (0...n) raw moments (about the origin) will be 
# calculated and returned
raw_moments = [dist_mean]
central_moments = [1] + central_moments  # add the zeroth moment
for n in range(2, len(central_moments)):
    moment_n_parts = []
    for k in range(n+1):
        sum_part = comb(n, k) * central_moments[k] * dist_mean**(n-k)
        moment_n_parts.append(sum_part)
    moment_n = sum(moment_n_parts)
    raw_moments.append(moment_n)




### Case 3: Series of randomly generated examples