In [20]:
import ticktack
from jax.numpy import array, pi, exp, sin
from chainconsumer import ChainConsumer
import matplotlib.pyplot as plt

The basic question is what is the probability of detecting consecutive events based on the distribution of the data. The first step then will be to determine the distribution of the data. This will be done be resampling the points after the event has been removed.

So I need to have a dictionary to keep track of the units for the different models.

In [21]:
model_units = { # This dictionary contains the units for the fluxes and production function
    "Guttler14": {  # Units of the Guttler 2014 paper
        "production_rate_units": "atoms/cm^2/s",    # Units of the production rate 
        "flow_rate_units": "Gt/yr"                  # Units of the fluxes
    },
    "Brehm21": {    # Units used by the Brehm, et. al. paper
        "production_rate_units": "",    # Units of the production rate 
        "flow_rate_units": ""           # Units of the fluxes
    },
    "Buntgen18": {  # The units used by the Buntgen 2018 paper
        "production_rate_units": "",    # Units of the production function
        "flow_rate_units": ""           # Units of the fluxes 
    },
    "Miyake17": {   # The units used by the Miyake 2017 et. al. paper
        "production_rate_units": "",    # Units of the production function 
        "flow_rate_units": ""           # Units of the fluxes.
    }
}

The function below will also have a `shape` parameter eventually. First however I want to get this running. Damn this really is well suited to a class structure since then I can set `self.set_annual_samples()` need to check if this has actually been implemented. The answer is __No__. I need to add the growth seasons to the `model_units` (which I might just rename `models`). This will lead to ?two? extra field `hemisphere_model` (bool) and `growth_seasons`. Alternatively this could result in a further nested dictionary like `hemispheres = {"NH_growth": array([]), "SH_growth": array([])}`

In [28]:
def get_production_function(model: str, data: str):
    """
    Parameters:
        model: `str` - The `CarbonBoxModel` that is to be used
        data: `str` - The dataset that the production function is to be fitted to (.csv)
    Returns:
        production: `function` - The ideal production function 
    """
    params = array([775., 1./12, pi/2., 81./12]) # An array containing the intial parameters of the production funtion #? This needs major work.

    cbm = ticktack.load_presaved_model( # Generating the CarbobBoxModel using ticktack
        model,  # Name of the model as looped from the models dictionary 
        production_rate_units=model_units[model]["production_rate_units"], 
        flow_rate_units=model_units[model]["flow_rate_units"]
    )
    cbm.compile()   # Generating the transfer operator via the compile() command 

    #? Need to double check that this will work
    mcmc_model = ticktack.fitting.SingleFitter(cbm)     # Fitting a model 
    mcmc_model.load_data(data)                          # Loading the data into the model 
    mcmc_model.prepare_function(model="simple_sinusoid")# Generating the simple sin model 

    sampler = mcmc_model.MarkovChainSampler(
        params, # Initial position within the parameters space    
        likelihood=mcmc_model.log_likelihood,   # likelihood function 
    )

    #? So I need to do some editting on this section. 
    #? I also need to do some profiling and see if I can't speed things up to fuel my ambition.

    labels = ["Start Date (yr)", "Duration (yr)", "$\phi$ (yr)", "Area"]# Parameters of the model
    c = ChainConsumer().add_chain(sampler, walkers=20, parameters=labels)   # Running samples

    for parameter in labels:
        c.analysis.get_parameter_summary_max(*c.get_mcmc_chains(), parameter=parameter)

In [None]:
def production(t):
    """
    The best fit production function as estimated using `mcmc`
    """
    middle = truth[0] + truth[1] / 2.0
    height = truth[3] / truth[1]
    gauss = height * exp(- ((t - middle) / (1. / 1.93516 * truth[1])) ** 16.)
    sine = mcmc_model.steady_state_production + \
        0.18 * mcmc_model.steady_state_production * \
            sin(2 * pi / 11 * t + truth[2])
    return sine + gauss

We are getting some inefficiency by loading the data twice. I may look into avoiding the use of the `SingleFitter` to get around this but it is not ideal.

In [26]:
def get_residual_distribution(production, data):
    return None

NameError: name 'function' is not defined

In [None]:
guttler_production = get_production_function("Guttler14", "Miyake12.csv")

So from here the plan is to first of all find the ideal sinusoidal model based on the data points until the event. I then run this model and subtract it away from the data allowing me to determine the signal to noise ratio of the entire data instead of a subset. Then I simulate a series of events and minimize the $\chi^{2}$ to determine the best fitting rectangular event. In addition I will have some criterion for the $\chi^{2}$ as to when an event is detected allowing me to identify the minimum parameters of the event. I will then plot a contour plot of the $\chi^{2}$ in the 2 dimensional parameter space. This will be the final product.

The next step is to minimize the $\chi^{2}$. After working so hard to get the `mcmc` I'm going to try and use this. The problem is because I was not using ticktack I can now immensly simplify the code by using the `SingleFitter` implementations provided by Q.

To minimize the $\chi^{2}$ statistic I will need to isolate the data before the event and use this to determine the $\sin$ parameters. This is going to require modifications to the `prod` function and also will require a better understanding of the plotting backend.

It will be easiest to modify the file. It is nice I must say to have got fucking nowhere all day. I'm going to go and pack now after merging these changes on github.