Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different dimension for pre-defined kwargs and data #17

Closed
hyunjimoon opened this issue Nov 24, 2022 · 3 comments
Closed

Different dimension for pre-defined kwargs and data #17

hyunjimoon opened this issue Nov 24, 2022 · 3 comments

Comments

@hyunjimoon
Copy link
Contributor

With S = 5, Q = 3, R = 2 setting and

    idata_kwargs = dict(
        prior_predictive=["prey_obs", "predator_obs"],
        posterior_predictive=["prey_obs", "predator_obs"],
        coords={
            "time": [n for n in range(N)],
            "stock": [q for q in range(Q)],
            "region": [r for r in range(R)]
        },
        dims = {
            'initial_outcome': ["stock"],
            'integrated_result': ["time", "stock"],
            'prey': ["time", "region"],
            'process_noise':  ["time",  "region"],
            'predator': ["time", "region"],
            "prey_obs": ["time", "region"],
            "predator_obs": ["time", "region"],
        }
    )

I get the following error msg:

different number of dimensions on data and dims: 3 vs 4

when I print the shape, dims, coords of where the error event happens (elif len(dims) != len(shape):)

from xarray shape :  (1, 5, 200) dims ['chain', 'draw', 'time', 'region'] coords {'chain': <xarray.IndexVariable 'chain' (chain: 1)>

region is missing from shape of data, i.e. prior predictive from stan sample(). Am debugging based on the _infer_coords_and_dims xarray function. More detailed msg for this function has been raised.

_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'initial_outcome_dim_0': <xarray.IndexVariable 'initial_outcome_dim_0' (initial_outcome_dim_0: 3)>
array([0, 1, 2])}, ['chain', 'draw', 'initial_outcome_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'integrated_result_dim_0': <xarray.IndexVariable 'integrated_result_dim_0' (integrated_result_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199]), 'integrated_result_dim_1': <xarray.IndexVariable 'integrated_result_dim_1' (integrated_result_dim_1: 3)>
array([0, 1, 2])}, ['chain', 'draw', 'integrated_result_dim_0', 'integrated_result_dim_1'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'process_noise_dim_0': <xarray.IndexVariable 'process_noise_dim_0' (process_noise_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199])}, ['chain', 'draw', 'process_noise_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'prey_dim_0': <xarray.IndexVariable 'prey_dim_0' (prey_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199])}, ['chain', 'draw', 'prey_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'predator_dim_0': <xarray.IndexVariable 'predator_dim_0' (predator_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199])}, ['chain', 'draw', 'predator_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'prey_obs_dim_0': <xarray.IndexVariable 'prey_obs_dim_0' (prey_obs_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199])}, ['chain', 'draw', 'prey_obs_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4]), 'predator_obs_dim_0': <xarray.IndexVariable 'predator_obs_dim_0' (predator_obs_dim_0: 200)>
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
...
       196, 197, 198, 199])}, ['chain', 'draw', 'predator_obs_dim_0'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])
_infer_coords_and_dims(data.shape, coords, dims) ({'chain': <xarray.IndexVariable 'chain' (chain: 1)>
array([0]), 'draw': <xarray.IndexVariable 'draw' (draw: 5)>
array([0, 1, 2, 3, 4])}, ['chain', 'draw'])

chain, draw should be reset_indexed in advance.

@hyunjimoon
Copy link
Contributor Author

hyunjimoon commented Nov 24, 2022

From the following stan file,

generated quantities{
    array[n_t] vector[n_r] prey_obs;
    array[n_t] vector[n_r] predator_obs;

    real pred_birth_frac[n_r] = normal_rng(rep_vector(0.05, R), 0.005);
    real m_noise_scale = normal_rng(0.01, 0.001);
    real prey_birth_frac = normal_rng(0.8, 0.08);

    // Initial ODE values
    real prey__init = 30;
    real process_noise__init = 0;
    real predator__init = 4;
    vector[3] initial_outcome;  // Initial ODE state vector
    initial_outcome[1] = prey__init;
    initial_outcome[2] = process_noise__init;
    initial_outcome[3] = predator__init;
    array[n_t] vector[n_r] prey;
    array[n_t] vector[n_r] predator;
    array[n_t] vector[n_r] process_noise;
    for (r in 1:n_r){
        array[n_t] vector[n_q] integrated_result = ode_rk45(vensim_ode_func, initial_outcome, initial_time, times, prey_birth_frac, pred_birth_frac[r], time_step, process_noise_scale);
        prey[:, r] = integrated_result[:, 1];
        process_noise[:, r]  = integrated_result[:, 2];
        predator[:, r]  = integrated_result[:, 3];
        prey_obs[:, r] = normal_rng(prey[:, r], m_noise_scale);
        predator_obs[:, r] = normal_rng(predator[:, r], m_noise_scale);
    }
}

sampled_from_stan.draws_xr() is:

draws2data_xr <xarray.Dataset>
Dimensions:              (draw: 5, chain: 1, initial_outcome_dim_0: 3,
                          integrated_result_dim_0: 200,
                          integrated_result_dim_1: 3, predator_dim_0: 200,
                          process_noise_dim_0: 200, prey_dim_0: 200,
                          prey_obs_dim_0: 200, predator_obs_dim_0: 200)
Coordinates:
  * chain                (chain) int64 1
  * draw                 (draw) int64 0 1 2 3 4
Dimensions without coordinates: initial_outcome_dim_0, integrated_result_dim_0,
                                integrated_result_dim_1, predator_dim_0,
                                process_noise_dim_0, prey_dim_0,
                                prey_obs_dim_0, predator_obs_dim_0
Data variables: (12/13)
    prey_birth_frac      (chain, draw) float64 0.8001 0.8738 0.6499 0.7694 0.734
    m_noise_scale        (chain, draw) float64 0.009252 0.0111 ... 0.01153
    pred_birth_frac      (chain, draw) float64 0.04873 0.05066 ... 0.05469
    predator__init       (chain, draw) float64 4.0 4.0 4.0 4.0 4.0
    process_noise__init  (chain, draw) float64 0.0 0.0 0.0 0.0 0.0
    prey__init           (chain, draw) float64 30.0 30.0 30.0 30.0 30.0
    ...                   ...
    integrated_result    (chain, draw, integrated_result_dim_0, integrated_result_dim_1) float64 ...
    predator             (chain, draw, predator_dim_0) float64 4.027 ... 5.747
    process_noise        (chain, draw, process_noise_dim_0) float64 0.004472 ...
    prey                 (chain, draw, prey_dim_0) float64 30.18 30.73 ... 3.724
    prey_obs             (chain, draw, prey_obs_dim_0) float64 30.19 ... 3.721
    predator_obs         (chain, draw, predator_obs_dim_0) float64 4.023 ... ...

Compared to array[n_t] vector[n_q] integrated_result which is assigned two dimensions (integrated_result_dim_[0,1]), I wonder why prey, prey_obs,.. which also has array[n_t] vector[n_r] type have only one dimension.

@hyunjimoon
Copy link
Contributor Author

hyunjimoon commented Nov 24, 2022

from_cmdstanpy seems to be using draws_xr() and the same dims problem is observed. When I ran the following debug code

 idata = az.from_cmdstanpy(
        prior=draws2data_data
    )
    print("let's see whether arviz assigns R correctly to prey_obs, prey etc", idata)
    print(idata.prior.prey_obs.shape)
    idata_orig = az.from_cmdstanpy(
        **idata_kwargs
    ).stack(prior_draw=["chain", "draw"], groups="prior_groups")
    idata_orig.reset_index("prior_draw", inplace=True)
    draws2data_xr = draws2data_data.draws_xr()
    print("draws2data_xr", draws2data_xr)
    print("dims says (1,5, 200) but where is n_r?", draws2data_xr.dims)
    idata_orig.add_groups(prior = draws2data_xr)

the following was printed i.e. same dims (without n_r).

let's see whether arviz assigns R correctly to prey_obs, prey etc Inference data with groups:
	> prior
	> sample_stats_prior
Frozen({'chain': 1, 'draw': 5, 'initial_outcome_dim_0': 3, 'integrated_result_dim_0': 200, 'integrated_result_dim_1': 3, 'prey_dim_0': 200, 'process_noise_dim_0': 200, 'predator_dim_0': 200, 'prey_obs_dim_0': 200, 'predator_obs_dim_0': 200})

dims says (1,5, 200) but where is n_r? 
Frozen({'draw': 5, 'chain': 1, 'initial_outcome_dim_0': 3, 'integrated_result_dim_0': 200, 'integrated_result_dim_1': 3, 'prey_dim_0': 200, 'process_noise_dim_0': 200, 'predator_dim_0': 200, 'prey_obs_dim_0': 200, 'predator_obs_dim_0': 200})

@hyunjimoon
Copy link
Contributor Author

This was the false alarm as I was using pre-made stan file (instead of the auto-generated) in which region was hard-coded. Including prior draws equal 1 model as a hierarchical model i.e. extending (200) to be included in (200, 1) can prevent this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant