Most of the simulation functionality is provided by `scipy` but there is still some useful material in `numpy`. We set the seed so that we can reproduce the data again.

In [1]:
import pandas as pd
from scipy import stats
import numpy as np

np.random.seed(seed=23)

The `random_cat_covariates` function simulates the properties of a random cat.

In [2]:
def random_cat_covariates():
    hidden = stats.norm.rvs()
    is_longhaired = stats.bernoulli.rvs(0.5)
    height = stats.norm.rvs(loc = 24 + hidden, scale = 0.5)
    loudness = np.log(stats.expon.rvs(scale = 10 + 5 * (4 + max(hidden,0))) + 5)
    return {
        "time_outdoors": stats.gamma.rvs(3, scale = 2),
        "coat_colour": stats.randint.rvs(low = 1, high = 4),
        "weight": stats.norm.rvs(loc = 4, scale = 0.5),
        "height": height,
        "loudness": loudness,
        "whisker_length": 0.3 * loudness + 0.3 * height + 0.1 * stats.norm.rvs(scale = 2),
        "is_longhaired": is_longhaired,
        "coat_length": stats.gamma.rvs((4 + 3 * is_longhaired) * 4, scale = 1/4)
    }

The `random_num_pats` takes the measurements of a random cat and returns the number of pats that they recieved on the day that they were observed. It is this function that specifies the relationship between the properties of the cat and the average number of pats it receives.

In [3]:
def random_num_pats(cat_covariates):
    coat_length_val =  cat_covariates["coat_length"] * ((-1) ** cat_covariates["is_longhaired"])
    
    mean_pats = (
        0.3 + 
        1.0 * cat_covariates["height"] +
        1.0 * cat_covariates["coat_colour"] ** 2 +
        1.0 * cat_covariates["weight"] +
        0.1 * cat_covariates["loudness"] +
        0.9 * coat_length_val +
        1 * cat_covariates["time_outdoors"]
    )
    
    safe_mean_pats = max(0.1, mean_pats)
    
    return stats.poisson.rvs(safe_mean_pats)

The `random_observation` function generates a random observation to include in the data set.

In [4]:
def random_observation():
    x = random_cat_covariates()
    y = random_num_pats(x)
    x["num_pats"] = y
    if x["time_outdoors"] > 24:
        x["time_outdoors"] = 24
    return x

In [5]:
pd.DataFrame([random_observation() for _ in range(1000)]).to_csv("cat-pats.csv", index = False)