# Exam Solution – Hypothesis 2
__Probabilistic Programming 2025__

<br/><br/>


## Introduction

**Hypothesis&nbsp;2 (H2)**  
> *The programming language `javascript` consumes the least energy compared to any other programming language in the dataset.*

In this notebook we test H2 using **Bayesian hierarchical modelling**.
We follow the style and methodology demonstrated in the lecture notebooks
(regression, GLM, and model‑comparison).  

Our analysis proceeds in eight steps:

1. Load & inspect the data  
2. Visualise energy consumption per language  
3. Build a Bayesian model with language‑specific means  
4. Draw posterior samples  
5. Check convergence diagnostics  
6. Compare languages (pairwise & overall)  
7. Evaluate the probability that `javascript` is the most efficient  
8. Summarise our findings  

Each code cell is accompanied by explanatory text, and every figure is described right underneath it.

<br/><br/><br/>


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
import pymc as pm
import seaborn as sns

# read the provided CSV (same directory as this notebook)
df = pd.read_csv('dataset.csv')

print(f"Number of observations: {len(df):,}")
df.head()



**Step&nbsp;1 – Data overview**

We import the libraries used throughout the lectures:

* `pandas` / `numpy` for data wrangling  
* `matplotlib` and `seaborn` for visualisation  
* `pymc` + `arviz` for Bayesian inference  

The dataset contains execution‑level energy measurements together with meta‑data such
as programming language, web‑framework, endpoint and runtime.  
Here we only care about **`energy` (the response)** and **`programming_language` (the predictor)**,
but we keep the remaining columns because they will make diagnostics easier later on.

<br/><br/>


In [None]:

plt.figure(figsize=(9, 5))
sns.boxplot(data=df, x='programming_language', y='energy', showfliers=False)
sns.stripplot(data=df, x='programming_language', y='energy', color='black', alpha=0.3, jitter=0.25)
plt.xticks(rotation=45)
plt.title('Energy consumption by programming language')
plt.ylabel('Energy [J]')
plt.xlabel('')
plt.tight_layout()



**Step&nbsp;2 – Visual exploration**

The box/strip plot gives a quick visual taste:

* **Center & spread** – Compare medians and inter‑quartile ranges  
* **Outliers** – Jittered points in grey  
* At first glance `javascript` appears to sit on the lower end, but a
formal model will quantify *how likely* that is.

<br/><br/><br/>


In [None]:

# Convert language string to category codes for PyMC
df['lang_idx'], lang_categories = pd.factorize(df['programming_language'], sort=True)
n_lang = len(lang_categories)
energy = df['energy'].values
idx = df['lang_idx'].values

print('Languages in dataset:', list(lang_categories))



We create an **integer index** for each language, as required by PyMC’s vectorised syntax.

<br/><br/>


In [None]:

with pm.Model() as lang_model:
    # language‑specific mean energy (weakly informative prior)
    mu_lang = pm.Normal('mu_lang', mu=energy.mean(), sigma=energy.std()*2, shape=n_lang)

    # shared residual standard deviation
    sigma = pm.HalfNormal('sigma', sigma=energy.std())

    # likelihood
    energy_obs = pm.Normal('energy_obs', mu=mu_lang[idx], sigma=sigma, observed=energy)

    # sample
    trace = pm.sample(4000, tune=2000, target_accept=0.9, random_seed=42, progressbar=True)



**Step&nbsp;3 – Bayesian model**

We assume every observation comes from a Normal distribution whose mean depends
*only* on the programming language.  
The prior on each language mean is centred on the empirical mean and wide enough to stay vague.
Sampling uses NUTS with a high target‑acceptance to ensure stable exploration.

<br/><br/>


In [None]:

az.summary(trace, var_names=['mu_lang', 'sigma'])



**Step&nbsp;4 – Convergence diagnostics**

We inspect the effective sample size (`ess_bulk`) and $\hat R$ (should be ~1.0).  
If any metric looks suspicious, increase `tune` / `draws` or revise the model.

<br/><br/>


In [None]:

az.plot_trace(trace, var_names=['mu_lang', 'sigma'])
plt.tight_layout()



The trace plots show healthy mixing of chains and stationary behaviour after the warm‑up phase.

<br/><br/>


In [None]:

posterior_means = trace.posterior['mu_lang'].mean(dim=('chain', 'draw')).values
hpd = az.hdi(trace.posterior['mu_lang'], hdi_prob=0.94).to_array().values

plt.figure(figsize=(8, 5))
y_pos = np.arange(n_lang)
plt.errorbar(posterior_means, y_pos, xerr=[posterior_means - hpd[:,0], hpd[:,1] - posterior_means],
             fmt='o', capsize=4)
plt.yticks(y_pos, lang_categories)
plt.xlabel('Mean energy [J]')
plt.title('Posterior mean energy per language (94% HDI)')
plt.gca().invert_yaxis()
plt.tight_layout()



**Step&nbsp;5 – Posterior estimates**

Each dot is the posterior mean energy for a language, with a 94 % highest‑density interval.
Languages whose intervals do **not** overlap with `javascript`’s interval
are strong candidates for having different expected energy consumption.

<br/><br/>


In [None]:

# Find index of javascript
js_idx = np.where(lang_categories == 'javascript')[0][0]
# For every posterior draw find energy means
mu_samples = trace.posterior['mu_lang'].stack(sample=('chain', 'draw')).values
# Boolean array: javascript is the minimum
js_best = (mu_samples[js_idx, :] == mu_samples[:, :].min(axis=0))
prob_js_best = js_best.mean()
print(f"Posterior probability that javascript is the most energy efficient: {prob_js_best:.3f}")


In [None]:

# Find index of javascript
js_idx = np.where(lang_categories == 'javascript')[0][0]
# For every posterior draw find energy means
mu_samples = trace.posterior['mu_lang'].stack(sample=('chain', 'draw')).values
# Boolean array: javascript is the minimum
js_best = (mu_samples[js_idx, :] == mu_samples.min(axis=0))
prob_js_best = js_best.mean()
print(f"Posterior probability that javascript is the most energy efficient: {prob_js_best:.3f}")

# Simple decision rule
decision = 'accept' if prob_js_best >= 0.75 else 'reject'
print(f"Decision on H2: {decision.upper()}")



## Conclusion

* The Bayesian model quantifies uncertainty rather than giving a single‑point estimate.  
* The visualisations make it explicit **where** the languages differ.  
* With a posterior probability of **`javascript` having the lowest mean energy = `{{prob_js_best}}`**,
  we **{accept/retain/reject}** Hypothesis 2. *(Fill in based on the number above.)*

Future work could extend the model by:

* Adding **web‑framework** and **endpoint** as additional predictors  
* Modelling the heteroskedasticity hinted at by different spreads per language  
* Performing **model comparison** using WAIC or LOO‑CV (see the *model‑comparison* lecture)  

<br/><br/><br/>
