The related data and code are also been found on https://github.com/Xuanstar42/BERN02_Exercise3.git

Response to Peer Review:
1. Choice and justification of the model
The chosen model fits the problem well and shows that the author understands the data and what
needs to be estimated. Still, it would have been nice to include a short explanation of why this model
was preferred over other possible ones - for example, what benefits it has compared to a simpler or
non-hierarchical version.

I have answered in this file. 

I chose a hierarchical Bayesian model because my project combines results from different studies. Hierarchical modeling allows partial pooling, so each study contributes to the overall estimate while still accounting for differences between them. This makes the evaluation of whether putting a sign increases towel reuse more robust than a non-hierarchical approach.

2. Method for estimation and its implementation
The estimation method works as intended and the implementation is clear. It's good that the author
used an appropriate framework, but a short reflection on tuning choices or potential issues (like
convergence or prior sensitivity) would make the section stronger.

I met the problem of prior sensitivity. Please relook the part 3, I found the problem and fixed it in the part 4. 

3. Method for testing and reliability of the results
The testing process looks reasonable, and the results seem consistent. However, it would improve
clarity to explicitly write out the hypotheses being tested and briefly explain how the results support
or reject them. Mentioning any limitations or uncertainty in the results would also add credibility.

I have answered it in the intepretation part, with 5% or 6% uncertainty. 

4. Readability of the report
The notebook is generally well-organized, with a logical order from data to results. A few small
comments or summaries at the end of each section could help readers follow the reasoning more
easily. Otherwise, the structure and explanations are quite clear and pleasant to read.

I wrote before the paragraph why I use that code. 

# 0. Introduction
The goal of this analysis is to understand whether social norm messages influence the likelihood that people reuse towels in hotel rooms, compared to a control message.

Each dataset entry summarizes the number of people who reused their towel (reuse) out of the total number of people observed (total) in a given study. There are several studies (for example, seven), each testing the same idea but in slightly different contexts.

Because the outcome is a count of yes/no decisions (reuse or not), and the data come from multiple studies, a hierarchical binomial model is an appropriate statistical choice.

# 1. Import Package

In [5]:
try:
    from scipy.signal import gaussian  
except Exception:
    from scipy.signal.windows import gaussian as _gaussian  
    import scipy.signal as _sig
    _sig.gaussian = _gaussian 

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import bambi as bmb
import arviz as az
from matplotlib.lines import Line2D
from matplotlib.patches import Patch

# 2. Data Wrangling

In [4]:
# read in data
data_file = "towelData.csv"
data = pd.read_csv(data_file, sep=';', encoding='latin1')
count = data.iloc[:, -1] # get the last column with numbers or yes/no

# count has the number of yes and no for control and social norm groups
control_yes = count[::4].to_numpy() # every 4th starting from 0 - control group + yes
control_no = count[2::4].to_numpy() # every 4th starting from 2 - control group + no
control_total = np.array([y + n for y, n in zip(control_yes, control_no)])

social_yes = count[1::4].to_numpy() # every 4th starting from 1 - social norm group + yes
social_no = count[3::4].to_numpy() # every 4th starting from 3 - social norm group + no
social_total = np.array([y + n for y, n in zip(social_yes, social_no)])

study = np.arange(1,len(control_yes)+1) # 7 diferent studies

control_data = pd.DataFrame({"reuse": control_yes, "total": control_total, "group": "control", "study": study})
social_data = pd.DataFrame({ "reuse": social_yes, "total": social_total, "group": "social", "study": study})

combined_data = pd.concat([control_data, social_data], ignore_index=True)

# Convert data types - important for bambi that they are correctly set 
# can give errors otherwise
combined_data['reuse'] = combined_data['reuse'].astype(int)
combined_data['total'] = combined_data['total'].astype(int)
combined_data['group'] = combined_data['group'].astype('category')
combined_data['study'] = combined_data['study'].astype('category')
print(combined_data.head(25))

    reuse  total    group study
0      74    211  control     1
1     103    277  control     2
2      77    135  control     3
3      82    187  control     4
4      21     25  control     5
5     123    147  control     6
6      28     30  control     7
7      98    222   social     1
8     587   1318   social     2
9     406    655   social     3
10    278    555   social     4
11     21     24   social     5
12    472    576   social     6
13    101    132   social     7


# 3. Apply bmb Model
1. I chose a hierarchical Bayesian model because my project combines results from different studies. Hierarchical modeling allows partial pooling, so each study contributes to the overall estimate while still accounting for differences between them. This makes the evaluation of whether putting a sign increases towel reuse more robust than a non-hierarchical approach.

2. I chose the binomial model because the response variable is not a continuous measurement like temperature or height — it’s the number of successes out of a total number of trials.
In this case:

Success = a person reused their towel

Trials = total number of participants in that condition

3. Why use p(reuse, total)?

This part defines the response variable for the binomial model:

reuse = number of people who reused towels (successes)

total = total number of people observed (trials)

So I am  modeling the probability of reuse for each group and study.

4. Why use 1 + group 

This specifies the fixed effects, meaning the general effects we expect across all studies.

1 adds an intercept term. This represents the baseline log-odds of towel reuse in the control group.

group adds a predictor for whether the message was “control” or “social.”
It estimates how much the social norm message changes the reuse rate compared to the control message.

5. Why use (1|study)

This one introduces the random effect exist between the studies.
Different studies might naturally have higher or lower reuse rates because of local conditions (different hotels, cultures, or sample sizes).
By including (1|study), we give each study its own intercept, drawn from a shared distribution.

This helps the model account for variation between studies rather than treating them as identical.

In [6]:
model_hierarchical = bmb.Model("p(reuse, total) ~ 1 + group + (1|study)", combined_data, family="binomial")
model_hierarchical



       Formula: p(reuse, total) ~ 1 + group + (1|study)
        Family: binomial
          Link: p = logit
  Observations: 14
        Priors: 
    target = p
        Common-level effects
            Intercept ~ Normal(mu: 0.0, sigma: 3.5355)
            group ~ Normal(mu: 0.0, sigma: 5.0)
        
        Group-level effects
            1|study ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 3.5355))

In [8]:
idata_hierarchical = model_hierarchical.fit(random_seed=42)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, group, 1|study_sigma, 1|study_offset]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.
There were 22 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details


It was found that there are too many divergences. Therefore, we need to modify the prior.

# 4. Prior Modification

In [9]:
idata_prior = model_hierarchical.prior_predictive()
prior = az.extract_dataset(idata_prior, group="prior_predictive")["p(reuse, total)"]

Sampling: [1|study_offset, 1|study_sigma, Intercept, group, p(reuse, total)]
  prior = az.extract_dataset(idata_prior, group="prior_predictive")["p(reuse, total)"]


In [10]:
priors = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=1),
    "1|study": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=1))
}
model_hierarchical = bmb.Model("p(reuse, total) ~ 1 + group + (1|study)", combined_data, family="binomial", priors=priors)
model_hierarchical

       Formula: p(reuse, total) ~ 1 + group + (1|study)
        Family: binomial
          Link: p = logit
  Observations: 14
        Priors: 
    target = p
        Common-level effects
            Intercept ~ Normal(mu: 0.0, sigma: 1.0)
            group ~ Normal(mu: 0.0, sigma: 5.0)
        
        Group-level effects
            1|study ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))

# 5. Fitting Model

In [11]:
idata_hierarchical = model_hierarchical.fit(random_seed=42)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, group, 1|study_sigma, 1|study_offset]


Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.


In [12]:
print(idata_hierarchical.posterior)


<xarray.Dataset> Size: 328kB
Dimensions:            (chain: 4, draw: 1000, group_dim: 1, study__factor_dim: 7)
Coordinates:
  * chain              (chain) int64 32B 0 1 2 3
  * draw               (draw) int64 8kB 0 1 2 3 4 5 ... 994 995 996 997 998 999
  * group_dim          (group_dim) <U6 24B 'social'
  * study__factor_dim  (study__factor_dim) <U1 28B '1' '2' '3' '4' '5' '6' '7'
Data variables:
    Intercept          (chain, draw) float64 32kB 0.1683 0.1897 ... 0.05734
    group              (chain, draw, group_dim) float64 32kB 0.08793 ... 0.04418
    1|study_sigma      (chain, draw) float64 32kB 1.052 1.137 ... 1.254 1.257
    1|study            (chain, draw, study__factor_dim) float64 224kB -0.5754...
Attributes:
    created_at:                  2025-10-24T15:15:13.230916
    arviz_version:               0.17.1
    inference_library:           pymc
    inference_library_version:   5.12.0
    sampling_time:               5.54235315322876
    tuning_steps:                1000
    mo

**Research Question**  
Does the social norm intervention increase towel reuse compared to the control condition?

**Hypotheses**  

- **Null hypothesis (H₀):**  
  The intervention has no effect.  
  Mathematically: `group[social] = 0`.

- **Alternative hypothesis (H₁):**  
  The intervention is effective. The social norm group shows a higher reuse rate than the control group.  
  Mathematically: `group[social] > 0`.

In [18]:
post = idata_hierarchical.posterior["group"].sel(group_dim="social").values
p_gt0 = (post > 0).mean()
print("P(group[social] > 0) =", p_gt0)


P(group[social] > 0) = 0.99725


Given 0.99+, I can answer the question that the intervention is effective! 

### Interpretation

- The posterior probability that the intervention effect is greater than zero is nearly 1.0, 
  indicating very strong evidence for effectiveness.
- The posterior mean of `group[social]` is about 0.21 (log-odds), with a 90% HDI of [0.09, 0.34].
- This corresponds to approximately a 5–6 percentage point increase in towel reuse under the social norm condition compared to control.
