<img src="../../shared/img/banner.svg" width=2560></img>

# Bayesian Inference 03 - Flexibility of Bayesian Inference

In [None]:
%matplotlib inline

In [None]:
import sys

sys.path.append("../../")

from shared.src import quiet
from shared.src import seed
from shared.src import style

In [None]:
from pathlib import Path
import random

from IPython.display import HTML, Image
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt
import seaborn as sns
import scipy.stats

In [None]:
sns.set_context("notebook", font_scale=1.7)

In [None]:
import shared.src.utils.util as shared_util

import utils.daft
import utils.plot

In [None]:
def arrays_to_scalars(MAP):
    for key, value in MAP.items():
        if isinstance(value, np.ndarray) and value.shape is ():
            MAP[key] = np.asscalar(value)
    return MAP

# Once You've Learned to Specify Models, Inference Becomes Easy

In the traditional approach,
each change to the model requires a new statistical test
or the verification/extension of an old one.

In the Bayesian Monte Carlo approach,
each change to a model either requires just changing what we do with our samples
or changing our model specification.

Within the context of our difference-in-means problem,
that means that all of the questions below
become difficult to answer,
essentially requiring a trip back to the blackboard or Wikipedia.

Can I use the standard $t$-test if...

- ...my standard deviations are not the same?

Yes, the test will still perform well, but only if the groups have roughly the same number of observations.
Otherwise, you must use [Welch's modified $t$ test](https://en.wikipedia.org/wiki/Welch%27s_t-test),
aka `scipy.stats.ttest_ind` with argument `equal_var=False`.

Note: what does it mean for a test to "perform well"? The first-order answer is that the distribution of $p$ stays uniform, meaning that applying a threshold of $\alpha$ to $p$ gives a false positive rate equal to $\alpha$. The harder question is what happens to the power. The answers to all of these questions are in terms of the false positive rate.

- ...the groups have different sizes?

Yes, but only if the groups have roughly the same standard deviation. See above.

- ...my likelihood changes from `Normal`?

Yes, [so long as your sample size is large enough](https://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/). How large is large enough? That's another hard question.

- ...I care about differences in medians?

Yes, if the likelihood is symmetric, because the mean is then the same as the median.
If the mean and median are different,
which is likely why you're interested in the median rather than the mean, then you need a new test,
the [Mann-Whitney $U$ test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Comparison_to_Student's_t-test)

And all of that is while still assuming we are working with comparing the "center value" of just two groups in terms of a single measurement!

## All of these questions have simple answers in the Bayesian approach:

What do I do if...

- ...my standard deviations are not the same?

Include that in your model! Just make sure there's a different standard deviation variable for each group,
as in some of the examples from the other set of slides.

- ...the groups have different sizes?

Include that in your model!
pyMC will automatically handle combining information across groups,
of varying sizes.

- ...my likelihood changes from `Normal`?

Include that in your model! Just switch the likelihood component and possibly redefine some of the hidden parameter variables.

- ...I care about differences in medians?

Include that in your model! Add a `pm.Deterministic` for calculating that value, or just calculate it afterwards!

## The secret is putting modeling first and statistics second 

All of these questions are about how the model behaves when its assumptions are violated.

When you just call a function like `scipy.stats.ttest_ind`,
it's unclear what model is being used, what its assumptions are,
and how to change that model when those assumptions are violated.

It is no different, fundamentally, from looking up the values in a table.
How the values inside would change if the assumptions used to generate it were different
is not something that a table can do.

When you write out a pyMC model, all of the assumptions are right there, in the program you have written,
and can be changed on the fly, as can the statistic you calculate.

If you want to see what happens when the model is wrong,
you can simulate data from the alternate model and see how your model behaves.

# This flexibility pays major dividends when specifying models

Rather than sticking only to the "off-the-shelf" models we've carefully studied and reviewed in a statistics course,
we can build them from scratch to answer the exact question we are interested in.

## Example: Donald Trump and Hip-Hop

Throughout the 90s,
Donald Trump's name appeared repeatedly in hip-hop lyrics,
especially those by East Coast rappers,
where his name was synonymous with wealth and status.

Trump's rise to the presidency has changed how many people feel about him.
Can we quantify when and to what degree this occurred in hip-hop?

Data collection by [fivethirtyeight](https://fivethirtyeight.com),
downloaded from
[kaggle](https://www.kaggle.com/fivethirtyeight/fivethirtyeight-hip-hop-candidate-lyrics-dataset),
see [this fivethirtyeight post](https://projects.fivethirtyeight.com/clinton-trump-hip-hop-lyrics/)
for more.

Note: the model used here is inspired by two examples in _Bayesian Methods for Hackers_:
1. The "texting switchpoint" example, from [Chapter 1](https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb).
2. The "Challenger O-Ring" example, from [Chapter 2](https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb).

In [None]:
data_folder = Path(".") / "data"

trump_data = pd.read_csv(data_folder / "trump_hiphop_sentiment_data.csv", index_col=0)

print(trump_data.sample(10))

Let's eliminate the "neutral" category
and treat sentiment as binary: `"positive"` (`1`) or `"negative"` (`0`)

In [None]:
def sentiment_to_posneg(sentiment):
    if sentiment == "negative":
        return 0
    elif sentiment == "positive":
        return 1
    else:
        return np.nan

trump_data["positive"] = trump_data["sentiment"].apply(sentiment_to_posneg)

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
utils.plot.plot_raw_data_sentiment(trump_data, ax)

Darker circles mean more lyrics with that sentiment were observed in that year.

### Switchpoint Model

For each observation (pair of year and sentiment), there is some unknown chance $p$ that it was positive.

One reasonable model for this data is that the chance $p$ switched, one year,
from being high to low.

So our model will have three latent, or hidden, variables
that give rise to the unknown variable $p$:

- Chance of positive sentiment before switch, `p_1` / $p_1$
- Chance of positive sentiment after swith, `p_2` / $p_2$
- Time that switch occurred, `switchpoint` / $s$

The resulting model looks like this:

In [None]:
utils.daft.make_switchpoint_model()

#### What are some reasonable priors for these variables?

- $p_1$/$p_2$: unknown number between 0 and 1.
- $s$: unknown integer between 1989 and 2016

If we want to express no strong prior beliefs,
we might say `Uniform` for the $p_i$ and
`DiscreteUniform` for $s$.

#### What is our likelihood?
- obs: whether sentiment in a lyric was positive or negative 

It's a binary variable, so it's a `Bernoulli` likelihood,
with $p$ set by `pm.math.switch`ing between $p_1$ and $p_2$ depending on the date.

#### What would we like to infer?

For one, the parameter $s$, which determines when the change happened.
If this value has a wide posterior, then
the change was perhaps gradual or small.

For another,
the difference between $p_2$ and $p_1$,
which captures whether the change was positive or negative and to what degree.
If $p_2 - p_1$ is small in magnitude, or has equal chance of being positive or negative,
then the change was minimal or nonexistent.

Lastly, we might be interested in the relationships between our parameters and fixed quantities:
what is the chance that the final value of sentiment is below 50% positive?
what is the chance that the switch happened after 2012?

In [None]:
trump_data_sub = trump_data.dropna()  # remove all the na values before passing to pyMC
dates = trump_data_sub["album_release_date"]
positive = trump_data_sub["positive"]

with pm.Model() as sentiment_switch_model:
    p1 = pm.Uniform("$p_1$", lower=0, upper=1)  # prior
    p2 = pm.Uniform("$p_2$", lower=0, upper=1)  # prior
    switchpoint = pm.DiscreteUniform(        # prior
        "switchpoint", lower=min(dates), upper=max(dates))  
    
    p = pm.math.switch(shared_util.to_pymc(dates) >= switchpoint, p2, p1)
    
    # likelihood
    positive_sentiment = pm.Bernoulli("positive_sentiment", p, observed=positive)
    
    # statistic / observation of interest
    change_in_sentiment = pm.Deterministic("$p_2 - p_1$", var=p2 - p1)

In [None]:
with sentiment_switch_model:
    sentiment_trace = pm.sample(draws=2500, target_accept=0.9, chains=4)

Above, draw samples and save them as a trace, then put them into a dataframe, below.

In [None]:
switchpoint_samples = shared_util.samples_to_dataframe(sentiment_trace)

In [None]:
f, axs = plt.subplots(figsize=(12, 12), nrows=3, sharex=True)
sns.distplot(switchpoint_samples["$p_1$"], ax=axs[0], color="C2");
sns.distplot(switchpoint_samples["$p_2$"], ax=axs[1], color="C2");
sns.distplot(switchpoint_samples["$p_2 - p_1$"], ax=axs[2], color="C2");
axs[-1].set_xlim([-1, 1]); axs[0].set_title("Posteriors for Switchpoint Model"); plt.tight_layout();

Looking at the posterior, it seems that sentiment most likely _dropped_ between before and after the switchpoint,
from about 90% positive to about 30% positive.

The total drop was about 60%, but could be as low as 30% or as high as 70%.
Most of the uncertainty comes from the value after the switchpoint:
the posterior for $p_2$ is much wider.

Our posterior for the time of the switch is very tight: it almost certainly happened in 2015.

In [None]:
f, ax = plt.subplots(figsize=(12, 4))
sns.distplot(switchpoint_samples["switchpoint"], kde=False, norm_hist=True, bins=range(1989, 2019),
             color="C2");
ax.set_title("Posterior for Switchpoint Model"); plt.tight_layout();

### For simplicity, could use `plot_posterior` instead

In [None]:
pm.plot_posterior(sentiment_trace, figsize=(12, 12),
                  varnames=["$p_1$", "$p_2$", "switchpoint",  "$p_2 - p_1$"],
                  text_size=24, color="C2");

### But don't be afraid to make a special visualization for your model!

The generic `plot_posterior` visualization is good,
but you can usually do better if you make your own
plot that directly visualizes data and uncertainty estimates together. 

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
utils.plot.plot_raw_data_sentiment(trump_data_sub, ax)

# plot median value for each prediction
utils.plot.plot_switchpoint(switchpoint_samples["$p_1$"].median(),
                 switchpoint_samples["$p_2$"].median(),
                 switchpoint_samples["switchpoint"].median(),
                 range(1989, 2019), ax, lw=4, color="C0", zorder=4);

# plot each sample
for _, sample in switchpoint_samples.sample(frac=0.05).iterrows():
    utils.plot.plot_switchpoint(sample["$p_1$"], sample["$p_2$"], sample["switchpoint"],
                                range(1989, 2019), ax, alpha=0.05, color="C1");
    
ax.legend(
[matplotlib.lines.Line2D([], [], color="C1"),
 matplotlib.lines.Line2D([], [], color="C0", lw=4)],
["Sampled Predictions", "Median Prediction"]);

The median of each parameter is used to select a single model to plot, labeled the "Median Prediction".

The predictions of many samples from the posterior are plotted transparently over the median prediction.
Where the color is darker, more samples were making approximately the same prediction,
whereas where it is lighter, fewer samples were making that prediction.

The median and the mean across samples are not always good choices for single parameters,
but here they seem to fall in a region of high probability under the posterior:
a place where the color is dark in the plot above.

Rather than just viewing the posteriors of the parameters,
it's good to compare the predictions directly to the data,
in some way that makes visually apparent where there's high and low uncertainty.

The fact that all of the samples change heights in 2015 indicates
that there's essentially no uncertainty in the switchpoint,
while the fact that the samples are very spread out afterwards
indicates that there's a good deal of uncertainty in the value after the switch.

## We can also directly compute posterior probabilities from samples

In [None]:
(switchpoint_samples["$p_2$"] < 0.5).mean()  # posterior belief that p_2 is below half

We can check the posterior probability of anything we can express as a boolean function of a sample:

In [None]:
# posterior belief that positive sentiment dropped by a larger amount than remained afterwards
(np.abs(switchpoint_samples["$p_2 - p_1$"]) > switchpoint_samples["$p_2$"]).mean()

## What about when you absolutely must produce a point estimate?

The most natural way to talk about unknown quantities in Bayesian inference is with distributions:
for each possible value of the unknown quantity,
I give a probability.

More broadly, for a given inference, I state the probability that it is true under my posterior,
as computed in the section immediately above.

The second most natural way to talk about unknowns is with _intervals_,
as in the highest posterior density interval.

This preserves some sense of uncertainty, but it easier to work with than a full distribution:
it's only two numbers, instead of a number for each outcome or a function or a bag of samples.

This is what we do when we use `plot_posterior`.

It is very unnatural, in Bayesian inference, to talk about unknown values with a single "best guess". A single "best guess" of an unknown quantity is called a _point estimate_.

However, it is sometimes necessary to give a point estimate:
someone has a proverbial "gun to your head" and is demanding it.

- Non-quantitative audiences will often more easily understand a point estimate
- The dominance of frequentist techniques means that scientific audiences often expect a point estimate
- For high-dimensional data, distributions become prohibitively expensive

For example, the median prediction in the visualization above is a point estimate, selected to summarize the predictions of the samples.

### If you have no other choice, use the MAP value

The closest thing to a Bayesian approach to providing point estimates is the **maximum a posteriori** or (MAP) value.

### The MAP is the setting of all of the unknown variables that has the highest probability under the posterior:

That is, the values whose probability is _maximal_ after seeing the data, or _a posteriori_.

Choose params to make $$p(\text{params} \vert \text{data})$$ as high as possible.

For a variety of reasons, some of them numerical,
pyMC and other computational libraries instead do the equivalent:

Choose params to make $$\log p(\text{params} \vert \text{data})$$ as high as possible.

### The pyMC function for this is `find_MAP`

Rather than using sampling, this function uses _optimization_:
starting from an initial guess of the parameter values,
it iteratively makes targeted, small changes their values
until no small changes can increase the probability any further.

In [None]:
with sentiment_switch_model:
    sentiment_MAP = arrays_to_scalars(pm.find_MAP(start=sentiment_trace[-1]))

Notice the `logp` on the left in the output: that's the probability we are maximizing.

The most important argument here is `start`,
which is used to tell `find_MAP` what its initial guess should be.
If you have samples available, then one of them is usually a good choice for a starting point.

If you don't provide a `start` value, pyMC will try to guess where to start from.
This often goes poorly.
We'll see what the consequences of a bad initial guess are in the next example.

In [None]:
ax = utils.plot.plot_MAP(sentiment_MAP, switchpoint_samples, "$p_2 - p_1$", compare_mean=True);


The MAP is typically located at a "peak" of the distribution: a _mode_.
If the distribution has more than one mode, MAP inference can run into problems.
A distribution with one mode is called _unimodal_.

For symmetric distributions with only one mode, the mean and the mode are the same.
Since the distribution above is approximately symmetric and unimodal,
the mean and the mode are approximately the same.

But note that `find_MAP` is maximizing the _joint probability_
of all of the parameters at once.
Therefore, it is at a mode of the _joint distribution_,
rather than a mode of any of the _marginal distributions_,
or the distributions of any single variable.
The plot above is of a marginal distribution: the values of just $p_2 - p_1$.
It is very common for modes of the joint distribution to line up with
modes of the marginal distributions.

## Example: Texting Rates

A major advantage of the Bayesian approach is that
when the type of data changes while the inferential question stays the same,
we only need to make minor adjustments to our model.

The following example is from
[Chapter 1](https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb)
of _Bayesian Methods for Hackers_.

Let's say that I suspect that, at some point in the last few months,
the rate at which I sent and received text messages changed.

This is similar in spirit to the question about sentiment above,
in which we wanted to determine if there was a sudden shift in sentiment,
but the data we observed there was binary.
Now, our data is in terms of counts.

In [None]:
text_data = pd.read_csv(data_folder / "txtdata.csv", index_col=0)
print(text_data.head())

In [None]:
f, ax = plt.subplots(figsize=(16, 6))
ax.bar(text_data.index, text_data["counts"]);
ax.set_ylabel("Number of Texts"); ax.set_xlabel("Day");

So what do we do?

#### We need to update the likelihood portion of our model.

The part that relates parameters to data
needs to be updated to reflect the new form of the data.
Furthermore, if that likelihood has different parameters,
we need to update the prior to reflect that.

But our inference process remains the same:
draw samples and then look at HPDs
and compute posterior probabilities.
This is accomplished _with almost identical code_,
no matter how the model has changed:
`pm.sample`, `pm.plot_posterior`,
and computing the mean of conditions applied to `samples["var"]`.

For an approach based on traditional hypothesis testing,
there is no obvious way to transfer tools or skills from modeling one dataset onto another.

In order to think inferentially about a new type of data,
you have to learn or derive a new hypothesis test.

### Likelihood: `pm.Poisson`

The `Poisson` distribution was derived originally to determine the distribution of the number of falsely-convicted individuals.

It is commonly used for count data,
but it presumes that all of the events are independent:
that the events being counted are instances of a memoryless process,
just as `Exponential` is used for the intervals between such events.

This is decidedly not the case for text messages:
sending a text message makes it more likely to receive a text message.
However, we can hope that the deviations are relatively small.

The `Poisson` distribution has one parameter:
`mu`, the average number of events,
in this case messages sent and received.

The values, and so the average, cannot be negative:
what does it mean to send -10 text messages?

Side note: why is there not also a parameter for the width of the distribution,
as is the case for, e.g. `Normal` and `Cauchy` distributions?

One signature characteristic of the `Poisson` distribution
is that its standard deviation is related to its mean:
the mean is the standard deviation squared.

So we need a prior over `mu`, over what the possible average number of messages per day might be.

### Prior: `pm.Exponential`

This is just one of many possible choices.

The main constraint to respect here is non-negativity.

Can you think of any other choices that satisfy this constraint?


- `HalfFlat`, if you don't mind having an improper prior
- `HalfCauchy`, if you want to put more probability into large values
- `TruncatedNormal`, if you think it's probably close to some value but not less than another.

`Exponential` also has a parameter: `lam`, or 1 / average.

Since this value is unknown,
you might be tempted to put a prior over it.
But that prior would most likely itself have a parameter,
which would be unknown.

It's simpler to set this to a relatively neutral value.
For example, if the means were the same,
then `lam` would be 1 / the mean of the data.

In [None]:
lam =  1 / text_data["counts"].mean()

We might alternatively make it even smaller, so the `Exponential` distribution becomes even flatter.

Graphically, this model looks like:

In [None]:
utils.daft.make_text_model()

In [None]:
dates = text_data.index
counts = text_data["counts"]

with pm.Model() as text_switch_model:
    mu1 = pm.Exponential("$\mu_1$", lam=lam)  # prior
    mu2 = pm.Exponential("$\mu_2$", lam=lam)  # prior
    switchpoint = pm.DiscreteUniform(        # prior
        "switchpoint", lower=min(dates), upper=max(dates))  
    
    mu = pm.Deterministic("mu",
        pm.math.switch(shared_util.to_pymc(dates) >= switchpoint, mu2, mu1))
    
    # likelihood
    text_counts = pm.Poisson("text_count", mu, observed=counts)
    
    # statistic / observation of interest
    change_in_rate = pm.Deterministic("$\mu_2 - \mu_1$", var=mu2 - mu1)

In [None]:
with text_switch_model:
    text_samples = pm.sample(draws=1000, chains=5)

On some runs, you may get an error message about the "Gelman-Rubin statistic"
and the need to reparameterize or change `target_accept`.
This message appears when pyMC detects that sampling from the posterior may have failed.
The results below will often look very different in that case.

If this is the case, try setting the `target_accept` keyword argument above to `0.9`.
The default value is `0.8`.

In [None]:
pm.plot_posterior(text_samples, figsize=(16, 16), color="C2", text_size=24,
                  varnames=["$\mu_1$", "$\mu_2$", "$\mu_2 - \mu_1$", "switchpoint"]);

Some notes on the posterior:

1. We are almost completely certain that there was, in fact, a change:
the posterior for the difference in means has no samples below 0.
2. There are several plausible dates for when the change occurred:
it could've been day 44 or 45, perhaps 43, but not any other time.
This is somewhat surprising, given that such a stark pattern is not obvious in the data,
but according to the original example, this is approximately the date
on which the user moved cities.

Again, it is helpful to visualize the posterior
along with the data.

In [None]:
text_samples_df = shared_util.samples_to_dataframe(text_samples)

In [None]:
f, ax = plt.subplots(figsize=(16, 6))
ax.bar(dates, counts);
for _, sample in text_samples_df.sample(n=500).iterrows():
    ax.plot(dates, sample["mu"], color="C1", lw=2, alpha=0.02);
    
ax.legend(
[matplotlib.lines.Line2D([], [], color="C1"),
 matplotlib.lines.Line2D([], [], color="C0", lw=4)],
["Sampled Predictions of Mean", "Raw Data"]);
ax.set_ylabel("Number of Texts"); ax.set_xlabel("Day");

Like the visualization for the sentiment example,
darker lines indicate higher posterior probability:
more samples from the posterior are concentrated around those values.

This visualization gives some perspective:
while we are certain that a change occurred,
it's not a particularly dramatic one.
The spread of the data
is much larger than the difference in means.

### Applying MAP inference

If we want to give a single answer to "what is most likely the day my texting behavior changed, and what was it before and after?" we need to compute the MAP value for all of the parameters.

In [None]:
with text_switch_model:
    text_MAP = pm.find_MAP(start=text_samples[random.randint(0, len(text_samples))])

text_MAP = arrays_to_scalars(text_MAP)

Again, we use a randomly-chosen posterior sample as our starting point.
Using different random `start` values can result in different answers below.

A technical detail:
the `find_MAP` function returns all parameters as arrays,
including parameters that are just a single value.
The function `arrays_to_scalars` above
converts any arrays in the output of `find_MAP`
that have just one entry back into non-array, or scalar, form.

In [None]:
utils.plot.plot_MAP(text_MAP, text_samples_df, "$\mu_2 - \mu_1$", compare_mean=True);

In [None]:
utils.plot.plot_MAP(text_MAP, text_samples_df, "switchpoint", compare_mean=True,
         bins=range(40, 50), kde=False, norm_hist=True, hist_kws={"align": "left"});

Another nice property of the MAP estimate over the mean is that
the MAP is always a valid parameter value.

Here, the mean switchpoint is not a specific day,
but rather in between two days:
sommething like 44.3,
while the MAP is always a valid day, e.g. 45.

### MAP inference can go horribly wrong

When initialized badly, MAP inference can return very bad answers.

In [None]:
with text_switch_model:
    bad_text_MAP = pm.find_MAP()

bad_text_MAP = arrays_to_scalars(bad_text_MAP);

In [None]:
utils.plot.plot_MAP(bad_text_MAP, text_samples_df, "$\mu_2 - \mu_1$");
plt.title("Incorrect MAP Estimate");

Most MAP inference algorithms only perform "local" improvements:
they consider small changes to parameter values and look for which make the posterior probability higher.

If they get trapped in a "bad neighborhood",
where everything nearby is worse, they will stop,
even though there are better solutions far away.

Imagine trying to find the highest point on a mountain range in a heavy fog.
Without knowledge of the terrain, your best bet is to just keep climbing.
This will only work if you lucked out and started on the tallest peak.
The more mountains there are in the range,
the less likely this is.

# Returning to the big picture.

By thinking about our data and then writing down generative models in pyMC,
we were able to draw posterior samples
and make inferences about the process by which the data was generated.

We didn't need a recipe book or a prefabricated model:
we just needed to think about our data.

# Don't forget about this flexibility when modeling

For the next few weeks, we'll focus on models that are Bayesian versions of traditional models:
- ANOVA
- Linear Regression
- Logistic Regression



All examples of [_generalized linear models_](https://twiecki.io/blog/2013/08/12/bayesian-glms-1/) or GLMs.

These models are ubiquitous not only because they are amenable to traditional analysis,
but also because they are the first tool even very sophisticated statisticians reach for
when modeling new data.

We'll still have more freedom because of our approach, especially when writing down our likelihoods,
but don't forget that, if you can formulate a model for your data that doesn't fit into one of the "cookie-cutter" models,
you can always just use that model directly!