# Recitation 11#

UTDallas CS 4375, taught by Dr. Ruozzi

Recitation with Jim Amato

Future Updates:
* Clarify all the text; possibly clean some of it up (remove it)
* Decide: keep or remove likelihood vs. probability discussion
  * Either finish that section out or remove it
* Do MAP estimation (Derivation was done during the session)
* Either continue to the multivariate Gaussian or remove the generation/vis code (very bottom)
* Clarify the term "Prior" (It's the prior on the *model*, not all uses of "prior" are model priors, e.g. class priors in GMMs)

#<font color="blue">Today:</font>

* Any requests?


#<font color="blue">Problem 1:</font> MLE and MAP

The short version is this:

The MLE estimator is based on the data given the hypothesis.

The MAP estimator is based on the hypothesis given the data.

### <font color="blue">Introduction:</font>  Bayes' Theorem and Definitions

#### Question

A quick digression. We've talked about Bayes' theorem a lot. We want to express the probability of $\mathcal{H}$ if we have some data $\mathcal{D}$; how do we write this?

#### Answer

$$P(\mathcal{H}|\mathcal{D}) = \frac{P(\mathcal{D}|\mathcal{H})P(\mathcal{H})}{P(\mathcal{D})}$$

#### Next Question

Because spoken language is a terribly confusion thing, it's easy to get some terms muddled up. In our discussions, there are precise definitions for *likelihood*, *probability*, *prior*, and *posterior*.

Which portions of Bayes' Theorem are the *likelihood*, *probability*, *prior*, and *posterior*?

#### Answer

* Probability/Posterior: $P(\mathcal{H}|\mathcal{D})$
* Likelihood: $P(\mathcal{D}|\mathcal{H})$
* Prior: $P(\mathcal{H})$

#### Question

Suppose I hear a loud noise coming from upstairs. I consider that there are gremlins upstairs bowling. Which of these are high or low?

* Probability/Posterior
* Likelihood
* Prior

#### Answer

* Posterior Probability: $P(\mathcal{H}|\mathcal{D})$ : low
* Likelihood: $P(\mathcal{D}|\mathcal{H})$ : high
* Prior: $P(\mathcal{H})$ : low

#### Question

To illustrate another difference, consider rules of total probability.

Suppose I see someone who is (A) an absolute buffoon and (B) clearly a millionaire. I wonder if they won the lottery last week. What can I say about probability and likelihood if I know they bought a lottery ticket? If I know they did not buy a lottery ticket last week?

#### Question

Further, let's discuss "strength" of hypotheses.

\<Ace of hearts\> discussion

The above is grossly adapted from Sober, 2008, Evidence and Evolution: The Logic Behind the Science. It's been warped to a large degree; credit for errors goes to me, all else to that author. I'm not sure what the book as a whole discusses, but the intro portion on evidence was really interesting to analogy-loving-Jim.

### <font color="blue">Introduction:</font> Connect to MLE vs MAP

#### Question

I can't really estimate over all possible hypotheses, $\mathcal{H}$. But I can make an estimate if I pick a hypothesis space governed by some $\theta$.

We can choose $\theta$ such that it maximizes a probability of some data in conjunction with some hypothesis. Two options have been discussed in class, MLE and MAP. Which one maximizes to the likelihood and which to the posterior? Hint: what do MLE and MAP stand for?

## <font color="blue">1.1</font> Skittles dataset

I'm stealing from https://jrmeyer.github.io/machinelearning/2017/08/18/mle.html and building on it. Mostly I like his story and it's easy to plot. Unfortunately I cannot read his unrendered LaTeX, so that's about all I can take.

I've assembled some world class Skittles afficionados with the most profoundly discerning palates. I don't really understand what they're talking about, but they describe Skittles flavors in terms of `aromatic lift` and `elegance` (I paid them a lot of money, so I'm sure this is all totally legit).

#### Generation code

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Circle, PathPatch
from matplotlib.path import Path

In [None]:
yp_dict = {'yellow': 0, 'purple': 1}

def make_skittles(n_samples):
    rng = np.random.default_rng(seed=4375)

    true_mean_yellow = 2
    true_var_yellow =  0.5

    X_yellow = rng.normal(true_mean_yellow,
                          true_var_yellow,
                          size=n_samples//2)
    df_yellow = pd.DataFrame(X_yellow, columns=['Aromatic Lift'])
    df_yellow['Label'] = yp_dict['yellow']

    true_mean_purple = 4
    true_var_purple =  0.5

    X_purple = rng.normal(true_mean_purple,
                          true_var_purple,
                          size=n_samples//2)
    df_purple = pd.DataFrame(X_purple, columns=['Aromatic Lift'])
    df_purple['Label'] = yp_dict['purple']

    df = pd.concat([df_yellow, df_purple])
    skittles_multi = df.sample(frac=1, random_state=42).reset_index(drop=True)

    return skittles_multi

n_samples = 140
train_portion = 0.6
skittles = make_skittles(n_samples)
split_idx = int(train_portion*n_samples)
skittles_train = skittles[:split_idx].copy()
skittles_test = skittles[split_idx:].copy()
skittles_test.loc[139, 'Aromatic Lift'] = 2.85
print(skittles.head(5))
print(skittles_train.shape)
print(skittles_test.shape)

#### Demo Code

In [None]:
def display_skittles_uni(df):
    # Create a grid of plots
    fig, ax = plt.subplots(figsize=(10, 6))

    # Plot the histogram for yellow skittles
    sns.histplot(df[df['Label'] == 0]['Aromatic Lift'], ax=ax, color='yellow', edgecolor='black',
                 kde=True, label='Yellow Skittles', bins=10)

    # Plot the histogram for purple skittles
    sns.histplot(df[df['Label'] == 1]['Aromatic Lift'], ax=ax, color='purple', edgecolor='black',
                 kde=True, label='Purple Skittles', bins=10)

    # Set labels and titles
    ax.set_title('Distribution of Aromatic Lift for Skittles')
    ax.set_xlabel('Aromatic Lift (mH)')
    ax.set_ylabel('Frequency')
    ax.legend()

    plt.show()

In [None]:
def display_skittles_swarm_horizontal(df_train, df_test):
    # Creating copies of the data to prevent modifying original data
    df_train = df_train.copy()
    df_test = df_test.copy()

    # Adding an indicator for train/test datasets
    df_train['Dataset'] = 'Train'
    df_test['Dataset'] = 'Test'

    # Add a common category for y-axis placement
    df_train['Category'] = 'Skittles'
    df_test['Category'] = 'Skittles'

    # Plotting the data
    plt.figure(figsize=(12, 4))

    # Plotting train data with circle markers
    sns.swarmplot(data=df_train, x='Aromatic Lift', y='Category', hue='Label',
                  palette=['yellow', 'purple'], size=8, marker='o',
                  edgecolor='black', linewidth=1)

    # Plotting test data with triangle markers
    sns.swarmplot(data=df_test, x='Aromatic Lift', y='Category', hue='Label',
                  palette=['yellow', 'purple'], size=8, marker='^',
                  edgecolor='red', linewidth=1)

    plt.title('Distribution of Aromatic Lift for Skittles - Train vs Test')
    plt.xlabel('Aromatic Lift (mH)')
    plt.ylabel('')

    # Custom legend
    legend_elements = [plt.Line2D([0], [0], marker='o', color='w', label='Train - Yellow',
                              markersize=8, markerfacecolor='yellow',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='^', color='w', label='Test - Yellow',
                              markersize=8, markerfacecolor='yellow',
                              markeredgewidth=1, markeredgecolor="red"),
                       plt.Line2D([0], [0], marker='o', color='w', label='Train - Purple',
                              markersize=8, markerfacecolor='purple',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='^', color='w', label='Test - Purple',
                              markersize=8, markerfacecolor='purple',
                              markeredgewidth=1, markeredgecolor="red")]

    plt.legend(handles=legend_elements, title='Skittles')
    plt.tight_layout()
    plt.show()

#### Demo

In [None]:
display_skittles_uni(skittles)

In [None]:
display_skittles_swarm_horizontal(skittles_train, skittles_test)

## <font color="blue">1.2</font> MLE Estimates and Skittles

Now we want to determine MLE estimators for the parameters for these distributions.

Let's pretend I didn't just generate the data. Looking at the plot, I see a general blobby shape for both purple and yellow data. I'll make a guess and assume they follow a normal distribution. If I wanted to go further, I could try other assumed shapes, but the Gaussian is like $0$, or $1$, or the identity matrix $I$: a good first guess.

The maximum likelihood estimator is (trivially) the thing that maximizes the data likelihood.

Assuming that the distribution is Gaussian is choosing a hypothesis space. We now want to find the best hypothesis in that space. That is, the space is of the form

$$P(x|\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right)$$

Where we pick some $\mu$ and $\sigma$.

Earlier in this notebook, we referred to the data likelihood with increasing specificity as $P(\mathcal{D}|\mathcal{H})$ and $P(D|\theta)$. Because we've picked a hypothesis space (we take it as a given), we can be even more specific and call it $P(D|\mu, \theta)$.

### Question

I have a bunch of data about Skittles. We're going to assume that each data point was drawn from that distribution. What's the likelihood of the full data set?

That is to ask: how do we go from $P(x|\mu, \sigma)$ to $P(D|\mu, \sigma)$?

#### Hint

We have many samples: $D=x^{(1)}, \dots, x^{(N)}$.

The likelihood of the first is $P(x^{(1)}|\mu, \sigma)$. The likelihood of the second is $P(x^{(2)}|\mu, \sigma)$.

What's the likelihood of both? What's the likelihood of arbitrarily many?

#### Answer

We assume that each point sampled is *independent*.

Thus, the likelihood of the data,
$$
\begin{align}
P(D|\mu, \sigma) &= \prod_{i=1}^N P(x^{(i)}|\mu, \sigma) \\
&= \prod_{i=1}^N\frac{1}{\sigma\sqrt{2\pi}}\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right) \\
&= \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^N \prod_{i=1}^N\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right)\\
\end{align}$$

### Question

Knowing the likelihood of the data, we want to find the estimates for the parameters $\mu$ and $\sigma$. Let's start with just $\mu$. What is $\mu_\text{MLE}$? How do we find it?

I already gave the general answer above.

#### Hint

The maximum likelihood estimator is the estimator that maximizes the likelihood. This is trivial. It's amazing how easily I can forget that. With all the earnestness at my command, I urge you to remember it and remember how we write that in math:

$$
\mu_\text{MLE} = \arg \max_\mu P(D|\mu, \sigma)
$$

We're going to be more specific to this particular case.

First, though, what follows is A CASE. ONE. ONE CASE.

ONE CASE.

We can do this a bunch of different ways. This one is going to look intimidating. Not gonna lie. But we aren't there yet.

So. like. breathe. chill. (***Jim***, this is to you)

#### Method

Actually, sidebar. What are we going to do?

* Realize that it'll be a lot easier if we take the log of everything <!-- inhale -->
* Plug in the equation we found for $P(D|\mu, \sigma)$ <!-- breathe Jim! it's ok! -->
* Take the derivative <!-- exhale -->
* Set that equal to zero to find the critical point <!-- namaste, dog breath -->

#### Answer (Long)

Great. Let's do that. I'm showing ALL the steps as I work them out. Each one is a small step.

$$
\begin{align}
P(D|\mu, \sigma)
&= \left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^N \prod_{i=1}^N\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right) \tag{1}\\
\ln P(D|\mu, \sigma)
&= \ln\left(\color{blue}{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^N} \color{red}{\prod_{i=1}^N\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right)}\right) \\
&= \ln\left(\color{blue}{\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right)^N}\right) + \ln \left(\color{red}{\prod_{i=1}^N\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right)}\right) \\
&= N\ln\left(\frac{1}{\sqrt{2\pi\sigma^2}}\right) + \sum_{i=1}^N\ln\exp\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right) \\
&= N\ln\left(2\pi\sigma^2\right)^{\frac{-1}{2}} + \sum_{i=1}^N\left(\frac{-(x^{(i)}-\mu)^2}{2\sigma^2}\right) \\
&= -\frac{N}{2}\ln2\pi\sigma^2 + \sum_{i=1}^N\frac{-(x^{(i)}-\mu)^2}{2\sigma^2} \tag{2}\\
\end{align}
$$

Note that $(1)$ is at the top of slide 8, lecture 15, and $(2)$ is at the bottom.

Now we take the partial derivative with respect to $\mu$:

$$
\begin{align}
\frac{\partial}{\partial \mu} \ln p(D|\mu, \sigma)
&= \frac{\partial}{\partial \mu} \left[-\frac{N}{2}\ln2\pi\sigma^2 + \sum_{i=1}^N\frac{-(x^{(i)}-\mu)^2}{2\sigma^2} \right] \\
&= \frac{\partial}{\partial \mu} \left[-\frac{N}{2}\ln2\pi\sigma^2 \right] + \frac{\partial}{\partial \mu}\left[\sum_{i=1}^N\frac{-(x^{(i)}-\mu)^2}{2\sigma^2} \right] \\
&= 0 + \sum_{i=1}^N\frac{-(2)(x^{(i)}-\mu)(-1)}{2\sigma^2} \\
&= \sum_{i=1}^N\frac{1}{\sigma^2}(x^{(i)}-\mu) \\
&= \frac{1}{\sigma^2}\sum_{i=1}^N(x^{(i)}-\mu) \\
&= \frac{1}{\sigma^2}\left(\sum_{i=1}^N(x^{(i)})-\sum_{i=1}^N\mu\right) \\
&= \frac{1}{\sigma^2}\left(-N\mu + \sum_{i=1}^N x^{(i)}\right) \\
\end{align}
$$

Now we can set this to zero and determine $\mu_\text{MLE}$

$$
\begin{align}
0 &= \frac{1}{\sigma^2}\left(-N\mu_\text{MLE} + \sum_{i=1}^N x^{(i)}\right) \\
0 &= -N\mu_\text{MLE} + \sum_{i=1}^N x^{(i)} \\
N\mu_\text{MLE} &= \sum_{i=1}^N x^{(i)} \\
\mu_\text{MLE} &= \frac{1}{N}\sum_{i=1}^N x^{(i)} \\
\end{align}
$$

#### Answer (Short)

So... that was a lot.

And... the result says... \*checks notes\* ... the maximum likelihood predictor for the mean is the mean of the data.

At some point, I have to imagine an archaeologist spending a year deciphering a tablet that ends up being a long parable with the moral "floss daily."

I don't know if it's true, but it feels good to imagine that. You should try it at this point if you tried to follow all that.

That being said: it's worth running through the process.

<!-- not the translation process -->
<!-- the derivation of the maximum likelihood estimator for the mean -->
<!-- the flossing is also worth doing, but that's totally beside the point -->

### Implementation: Code Template

We have an estimator for the mean. We can use a simplifying assumption for the variance: unit covariance.

This may feel like a cop-out. It is.

It is also an often-applied cop-out.

<!-- and a call-back! It's in the first cell of section 1.2. Where I talked about great first guesses. Like, 1, or 0, or I. Remember? Uh. #comedy -->

Here's what I'm going to do:
* Estimate the parameters for both the purple and the yellow distributions
* Calculate the likelihoods each test point was generated from that distribution
* Classify them according to the bigger likelihood.

Seems like a lot of math, right?

$$P(x|\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right)$$

In [None]:
# sigma_mle = 1
# sigma_mle_yellow = sigma_mle
# sigma_mle_purple = sigma_mle

# split_idx = int(train_portion*n_samples)
# skittles_train = skittles[:split_idx].copy()
# skittles_test = skittles[split_idx:].copy()

# train_set_yellow = skittles[skittles['Label']==0]
# train_set_purple = skittles[skittles['Label']==1]

# mu_mle_yellow = train_set_yellow[['Aromatic Lift']].mean().values
# mu_mle_purple = train_set_purple[['Aromatic Lift']].mean().values

# def gaussian_likelihood_by_mle(datum, sigma, mu):
#     res = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-((datum - mu) ** 2) / (2 * (sigma ** 2)))
#     return res

# predictions = []
# for _, row in skittles_test.iterrows():
#     skittle = row['Aromatic Lift']
#     likelihood_yellow = gaussian_likelihood_by_mle(skittle, sigma_mle_yellow, mu_mle_yellow)
#     likelihood_purple = gaussian_likelihood_by_mle(skittle, sigma_mle_purple, mu_mle_purple)
#     if likelihood_yellow > likelihood_purple:
#         predictions.append(yp_dict['yellow'])
#     else:
#         predictions.append(yp_dict['purple'])

# skittles_test['Predicted'] = predictions
# skittles_test['Correct'] = skittles_test['Label'] == skittles_test['Predicted']

### Completed Code

In [None]:
sigma_mle = 1
sigma_mle_yellow = sigma_mle
sigma_mle_purple = sigma_mle

train_set_yellow = skittles[skittles['Label']==0]
train_set_purple = skittles[skittles['Label']==1]

mu_mle_yellow = train_set_yellow[['Aromatic Lift']].mean().values
mu_mle_purple = train_set_purple[['Aromatic Lift']].mean().values

def gaussian_likelihood_by_mle(datum, sigma, mu):
    res = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-((datum - mu) ** 2) / (2 * (sigma ** 2)))
    return res

yellow_likelihoods = []
purple_likelihoods = []
for _, row in skittles_test.iterrows():
    skittle = row['Aromatic Lift']
    yellow_likelihoods.append(gaussian_likelihood_by_mle(skittle, sigma_mle_yellow, mu_mle_yellow))
    purple_likelihoods.append(gaussian_likelihood_by_mle(skittle, sigma_mle_purple, mu_mle_purple))

skittles_test_mle = skittles_test.copy()
skittles_test_mle['L(Y)'] = yellow_likelihoods
skittles_test_mle['L(P)'] = purple_likelihoods
skittles_test_mle['Predicted'] = np.where(skittles_test_mle['L(Y)'] < skittles_test_mle['L(P)'],
                                          yp_dict['purple'], yp_dict['yellow'])
skittles_test_mle['Correct'] = skittles_test_mle['Label'] == skittles_test_mle['Predicted']

###Demonstration Code

In [None]:
def plot_mle_results(data, draw_line=False):
    # 2. Plot with highlighting
    plt.figure(figsize=(14, 6))

    data = data.copy()

    data['Category'] = 'Skittles'

    # Correct skittles
    sns.swarmplot(data=data[data['Correct']],
                  x='Aromatic Lift', y='Category', hue='Label',
                  palette=['yellow', 'purple'], size=8,
                  edgecolor="black", linewidth=1)

    # Incorrect skittles
    sns.swarmplot(data=data[~data['Correct']],
                  x='Aromatic Lift', y='Category', hue='Label',
                  palette=['red'], size=8,
                  edgecolor="black", linewidth=1)

    # 3. Plotting the distributions
    x = np.linspace(data['Aromatic Lift'].min(), data['Aromatic Lift'].max(), 1000)
    y_yellow = -gaussian_likelihood_by_mle(x, sigma_mle_yellow, mu_mle_yellow[0])
    y_purple = -gaussian_likelihood_by_mle(x, sigma_mle_purple, mu_mle_purple[0])
    plt.plot(x, y_yellow, color='yellow')
    plt.plot(x, y_purple, color='purple')

    # Plotting the vertical line at the intersection point
    idx = np.argwhere(np.diff(np.sign(y_yellow - y_purple))).flatten()
    if draw_line:
        plt.axvline(x[idx[0]], color='red', linestyle='--', label='Intersection Point')

    plt.title('Skittles Test Set Classification')
    plt.xlabel('Aromatic Lift')
    plt.ylabel('Category')
    legend_elements = [plt.Line2D([0], [0], marker='o', color='w', label='Correct - Yellow',
                              markersize=8, markerfacecolor='yellow',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='o', color='w', label='Correct - Purple',
                              markersize=8, markerfacecolor='purple',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='o', color='w', label='Incorrect',
                              markersize=8, markerfacecolor='red',
                              markeredgewidth=1, markeredgecolor="black")]
    plt.legend(handles=legend_elements, title='Skittles')
    plt.show()

###Demonstration

In [None]:
plot_mle_results(skittles_test_mle)

#### Question

That red circle is incorrect. Is it wrong because it's purple in yellow-town? Or yellow in purple-town? WHY?

#### Answer

In [None]:
plot_mle_results(skittles_test_mle, draw_line=True)

## <font color="blue">1.3</font> MAP Estimates and Skittles

Ok - so - I'm basically out of time here. My intention was to now do a MAP estimate.

$$
\mu_\text{MAP} = \arg \max_\mu P(\theta | D)
$$

$$
P(\theta | D) = \frac{P(D| \theta) P(\theta)}{P(D)}
$$

$$
P(\theta | D) \propto P(D| \theta) P(\theta)
$$

We again play the "take the log of it"-trick.

$$
\mu_\text{MAP} = \arg \max_\mu P(D| \theta) P(\theta)
$$

$$
\mu_\text{MAP} = \arg \max_\mu \ln \left[ P(D| \theta) P(\theta) \right]
$$


spoilers:

$$
\mu_\text{MAP} = \frac{\sigma_0^2}{n\sigma^2_\text{MLE}+\sigma_0^2}\mu_\text{MLE}+
\frac{n\sigma_\text{MLE}^2}{n\sigma^2_\text{MLE}+\sigma_0^2}\mu_0
$$

$$
\frac{1}{\sigma_\text{MAP}^2}=\frac{1}{\sigma_0^2}+\frac{n}{\sigma_\text{MLE}^2}
$$

### Code Template

### Completed Code

In [None]:
print(skittles_train.shape)
print(skittles_test.shape)

In [None]:
def gaussian_likelihood_map(datum, sigma_squared, mu):
    sigma = np.sqrt(sigma_squared)
    res = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-((datum - mu) ** 2) / (2 * sigma_squared))
    return res[0]

def map_estimate(n_train='all'):
    if n_train == 'all':
        n_train = skittles_train.shape[0]
    skittles_train_map = skittles_train[:n_train].copy()
    train_map_yellow = skittles_train_map[skittles_train_map['Label']==yp_dict['yellow']].copy()
    train_map_purple = skittles_train_map[skittles_train_map['Label']==yp_dict['purple']].copy()

    # Step 1: Set the prior mean and variance
    mu_prior = skittles_train_map['Aromatic Lift'].mean()
    mu_prior_yellow = mu_prior
    mu_prior_purple = mu_prior

    # A broad variance to represent initial uncertainty
    sigma_prior_squared = 2

    # Number of data points
    n_yellow = len(train_map_yellow)
    n_purple = len(train_map_purple)

    # Step 2: Calculate the posterior mean and variance for yellow Skittles
    mu_map_yellow = (sigma_prior_squared / (n_yellow * sigma_mle + sigma_prior_squared) * mu_mle_yellow) + \
                    (n_yellow * sigma_mle / (n_yellow * sigma_mle + sigma_prior_squared) * mu_prior_yellow)

    sigma_map_squared_yellow = 1 / (1/sigma_prior_squared + n_yellow/sigma_mle)

    # Repeat the above for purple Skittles
    mu_map_purple = (sigma_prior_squared / (n_purple * sigma_mle + sigma_prior_squared) * mu_mle_purple) + \
                    (n_purple * sigma_mle / (n_purple * sigma_mle + sigma_prior_squared) * mu_prior_purple)

    sigma_map_squared_purple = 1 / (1/sigma_prior_squared + n_purple/sigma_mle)

    # Step 3: Predictions using MAP estimates
    yellow_likelihoods_map = []
    purple_likelihoods_map = []
    for _, row in skittles_test.iterrows():
        skittle = row['Aromatic Lift']
        yellow_likelihoods_map.append(gaussian_likelihood_map(skittle,
                                                              sigma_map_squared_yellow,
                                                              mu_map_yellow))
        purple_likelihoods_map.append(gaussian_likelihood_map(skittle,
                                                              sigma_map_squared_purple,
                                                              mu_map_purple))

    skittles_test_map = skittles_test.copy()
    skittles_test_map['L(Y)'] = yellow_likelihoods_map
    skittles_test_map['L(P)'] = purple_likelihoods_map

    skittles_test_map['Predicted'] = np.where(skittles_test_map['L(Y)'] < skittles_test_map['L(P)'],
                                              1, 0)
    skittles_test_map['Correct'] = skittles_test_map['Label'] == skittles_test_map['Predicted']
    params = dict(
        yellow=dict(
            mu=mu_map_yellow,
            sigma=sigma_map_squared_yellow
        ),
        purple=dict(
            mu=mu_map_purple,
            sigma=sigma_map_squared_purple
        )
    )
    return skittles_test_map, params

In [None]:
skittles_test_map_1, params_1 = map_estimate()
pd.set_option('display.float_format', '{:.3e}'.format)
print(skittles_test_map_1.copy().sort_values('Aromatic Lift'))

In [None]:
def plot_map_results(data, params, draw_line=False):
    sigma_map_yellow = params['yellow']['sigma']
    sigma_map_purple = params['purple']['sigma']
    mu_map_yellow = params['yellow']['mu']
    mu_map_purple = params['purple']['mu']
    # 2. Plot with highlighting
    plt.figure(figsize=(14, 6))

    data = data.copy()

    data['Category'] = 'Skittles'

    # Correct skittles
    sns.swarmplot(data=data[data['Correct']],
                  x='Aromatic Lift', y='Category', hue='Label',
                  palette=['yellow', 'purple'], size=8,
                  edgecolor="black", linewidth=1)

    # Incorrect skittles
    sns.swarmplot(data=data[~data['Correct']],
                  x='Aromatic Lift', y='Category',
                  palette=['red'], size=8,
                  edgecolor="black", linewidth=1)

    # 3. Plotting the distributions
    x = np.linspace(data['Aromatic Lift'].min(), data['Aromatic Lift'].max(), 1000)
    y_yellow = -gaussian_likelihood_map(x, sigma_map_yellow, mu_map_yellow[0])
    y_purple = -gaussian_likelihood_map(x, sigma_map_purple, mu_map_purple[0])
    plt.plot(x, y_yellow, color='yellow')
    plt.plot(x, y_purple, color='purple')

    # Plotting the vertical line at the intersection point
    idx = np.argwhere(np.diff(np.sign(y_yellow - y_purple))).flatten()
    if draw_line:
        plt.axvline(x[idx[0]], color='red', linestyle='--', label='Intersection Point')

    plt.title('Skittles Test Set Classification')
    plt.xlabel('Aromatic Lift')
    plt.ylabel('Category')
    legend_elements = [plt.Line2D([0], [0], marker='o', color='w', label='Correct - Yellow',
                              markersize=8, markerfacecolor='yellow',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='o', color='w', label='Correct - Purple',
                              markersize=8, markerfacecolor='purple',
                              markeredgewidth=1, markeredgecolor="black"),
                       plt.Line2D([0], [0], marker='o', color='w', label='Incorrect',
                              markersize=8, markerfacecolor='red',
                              markeredgewidth=1, markeredgecolor="black")]
    plt.legend(handles=legend_elements, title='Skittles')
    plt.show()

In [None]:
plot_map_results(skittles_test_map_1, params_1)

In [None]:
skittles_test_map_2, params_2 = map_estimate(20)
print(skittles_test_map_2.copy().sort_values('Aromatic Lift'))

In [None]:
plot_map_results(skittles_test_map_2, params_2)

###Demonstration Code

###Demonstration

# Resources

In [None]:
def make_skittles_multi():
    rng = np.random.default_rng(seed=516)

    n_samples = 100

    true_mean_yellow = np.array([3, 7])
    true_var_yellow =  np.array([[1, 0], [0, 1]]) * 2

    X_yellow = rng.multivariate_normal(true_mean_yellow,
                                      true_var_yellow,
                                      size=n_samples)
    df_yellow = pd.DataFrame(X_yellow, columns=['Aromatic Lift', 'Elegance'])
    df_yellow['Label'] = 0

    true_mean_purple = [7, 3]
    true_var_purple =  np.array([[1, 0], [0, 1]]) * 2

    X_purple = rng.multivariate_normal(true_mean_purple,
                                      true_var_purple,
                                      size=n_samples)
    df_purple = pd.DataFrame(X_purple, columns=['Aromatic Lift', 'Elegance'])
    df_purple['Label'] = 1

    df = pd.concat([df_yellow, df_purple])
    skittles_multi = df.sample(frac=1, random_state=42).reset_index(drop=True)

    return skittles_multi

skittles_multi = make_skittles_multi()
print(skittles_multi.head(5))

In [None]:
def show_skittles_multi(df):
    # Splitting the data
    train_set = df.iloc[:80]
    test_set = df.iloc[-20:]

    train_set_yellow = train_set[train_set['Label'] == 0]
    train_set_purple = train_set[train_set['Label'] == 1]

    test_set_yellow  = test_set[test_set['Label'] == 0]
    test_set_purple  = test_set[test_set['Label'] == 1]

    # Plotting
    fig, ax = plt.subplots()

    # Plot training set
    ax.scatter(train_set_yellow['Aromatic Lift'],
              train_set_yellow['Elegance'],
              label='Yellow Skittles (Train)', color='yellow')
    ax.scatter(train_set_purple['Aromatic Lift'],
              train_set_purple['Elegance'],
              label='Purple Skittles (Train)', color='purple')

    # Plot test set with different markers
    ax.scatter(test_set_yellow['Aromatic Lift'],
              test_set_yellow['Elegance'],
              label='Yellow Skittles (Test)', color='yellow', marker='x')
    ax.scatter(test_set_purple['Aromatic Lift'],
              test_set_purple['Elegance'],
              label='Purple Skittles (Test)', color='purple', marker='x')

    # Setting axis limits based on the entire dataset
    ax.set_xlim(df['Aromatic Lift'].min() - 1, df['Aromatic Lift'].max() + 1)
    ax.set_ylim(df['Elegance'].min() - 1, df['Elegance'].max() + 1)

    # Legends and titles
    ax.legend()
    ax.set_title('Skittles Distribution')
    ax.set_xlabel('Aromatic Lift (mH)')
    ax.set_ylabel('Elegance (pZA)')

    plt.show()

show_skittles_multi(skittles_multi)