# Effects of sex?

To understand the participant composition we count the number of male and female participants. To understand if there are sex effects we estimate parameters for a Bayesian linear model `bmi ~ age * sex`.

In [1]:
# Install Black autoformatter with: pip install nb-black
%load_ext lab_black

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd
import numpy as np
import pymc3 as pm
import arviz as az
import seaborn as sns
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'

## Number of male/female respondents before truncation

In [4]:
s1 = pd.read_csv("../data/02 processed data/study1_processed.csv")
s1["sex"].value_counts()

female    336
male       79
Name: sex, dtype: int64

In [5]:
s2 = pd.read_csv("../data/02 processed data/study2_processed.csv")
s2["sex"].value_counts()

female    366
male       73
Name: sex, dtype: int64

## Study 1

In [6]:
data = pd.read_csv("../data/04 final data/study1_final_data.csv")

In [7]:
data["sex"].value_counts()

female    312
male       71
Name: sex, dtype: int64

In [8]:
with pm.Model() as model:
    pm.glm.GLM.from_formula("BMI ~ age * sex", data)
    trace = pm.sample(10_000, cores=2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, age:sex[T.male], age, sex[T.male], Intercept]
Sampling 2 chains, 0 divergences: 100%|██████████| 21000/21000 [00:16<00:00, 1269.95draws/s]
The acceptance probability does not match the target. It is 0.8898082350005939, but should be close to 0.8. Try to increase the number of tuning steps.


In [9]:
#  pm.traceplot(trace)

In [10]:
func_dict = {
    "std": np.std,
    "5%": lambda x: np.percentile(x, 5),
    "median": lambda x: np.percentile(x, 50),
    "95%": lambda x: np.percentile(x, 95),
}
az.summary(trace, stat_funcs=func_dict, extend=False)

Unnamed: 0,std,5%,median,95%
Intercept,0.747,21.286,22.524,23.748
sex[T.male],1.833,-4.227,-1.226,1.842
age,0.021,0.078,0.111,0.146
age:sex[T.male],0.048,-0.079,0.002,0.082
sd,0.19,4.782,5.074,5.411


## Study 2

In [11]:
data = pd.read_csv("../data/04 final data/study2_final_data.csv")
data["sex"].value_counts()

female    333
male       67
Name: sex, dtype: int64

In [12]:
with pm.Model() as model:
    pm.glm.GLM.from_formula("BMI ~ age * sex", data)
    trace = pm.sample(10_000, cores=2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, age:sex[T.male], age, sex[T.male], Intercept]
Sampling 2 chains, 0 divergences: 100%|██████████| 21000/21000 [00:18<00:00, 1159.79draws/s]
The acceptance probability does not match the target. It is 0.9249568381126883, but should be close to 0.8. Try to increase the number of tuning steps.


In [13]:
# pm.traceplot(trace)

In [14]:
func_dict = {
    "std": np.std,
    "5%": lambda x: np.percentile(x, 5),
    "median": lambda x: np.percentile(x, 50),
    "95%": lambda x: np.percentile(x, 95),
}
az.summary(trace, stat_funcs=func_dict, extend=False)

Unnamed: 0,std,5%,median,95%
Intercept,0.875,20.493,21.922,23.36
sex[T.male],2.195,-0.109,3.484,7.153
age,0.023,0.126,0.164,0.202
age:sex[T.male],0.066,-0.24,-0.13,-0.022
sd,0.226,5.85,6.197,6.589
