Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare raises InvalidIndexError: slice(None, None, None) when provided with dictionary of inference data objects #2120

Closed
jerzybaranowski opened this issue Sep 23, 2022 · 3 comments

Comments

@jerzybaranowski
Copy link

Describe the bug
Weird issue with fresh installation.
Using cmdstanpy.
I created two mcmc objects from cmdstanpy. Both stan codes used default names for log likelihood.
I put both of them into a dictionary, converting them to inference data object.
Both are proper objects when displayed everything is ok.
Individual loo and waic using az.loo and az.waic also work.
When submitted to az.compare exception is raised.

To Reproduce
Two cmdstanpy sampling objects are needed

import arviz as az
comp_dict = {'Fractional': az.from_cmdstanpy(result2), 'Integer order': az.from_cmdstanpy(result3)}

Both work and contain log_lik object.

az.compare(comp_dict)

returns:

{
"name": "InvalidIndexError",
"message": "slice(None, None, None)",
"stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3799\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 3800\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_engine\u001b[39m.\u001b[39;49mget_loc(casted_key)\n\u001b[1;32m 3801\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mKeyError\u001b[39;00m \u001b[39mas\u001b[39;00m err:\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:138\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/libs/index.pyx:144\u001b[0m, in \u001b[0;36mpandas.libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n\n\u001b[0;31mTypeError\u001b[0m: 'slice(None, None, None)' is an invalid key\n\nDuring handling of the above exception, another exception occurred:\n\n\u001b[0;31mInvalidIndexError\u001b[0m Traceback (most recent call last)\nCell \u001b[0;32mIn [45], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maz\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompare\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcomp_dict\u001b[49m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:306\u001b[0m, in \u001b[0;36mcompare\u001b[0;34m(compare_dict, ic, method, b_samples, alpha, seed, scale, var_name)\u001b[0m\n\u001b[1;32m 304\u001b[0m std_err \u001b[39m=\u001b[39m ses\u001b[39m.\u001b[39mloc[val]\n\u001b[1;32m 305\u001b[0m weight \u001b[39m=\u001b[39m weights[idx]\n\u001b[0;32m--> 306\u001b[0m df_comp\u001b[39m.\u001b[39;49mat[val] \u001b[39m=\u001b[39m (\n\u001b[1;32m 307\u001b[0m idx,\n\u001b[1;32m 308\u001b[0m res[ic],\n\u001b[1;32m 309\u001b[0m res[p_ic],\n\u001b[1;32m 310\u001b[0m d_ic,\n\u001b[1;32m 311\u001b[0m weight,\n\u001b[1;32m 312\u001b[0m std_err,\n\u001b[1;32m 313\u001b[0m d_std_err,\n\u001b[1;32m 314\u001b[0m res[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m],\n\u001b[1;32m 315\u001b[0m res[scale_col],\n\u001b[1;32m 316\u001b[0m )\n\u001b[1;32m 318\u001b[0m df_comp[\u001b[39m"\u001b[39m\u001b[39mrank\u001b[39m\u001b[39m"\u001b[39m] \u001b[39m=\u001b[39m df_comp[\u001b[39m"\u001b[39m\u001b[39mrank\u001b[39m\u001b[39m"\u001b[39m]\u001b[39m.\u001b[39mastype(\u001b[39mint\u001b[39m)\n\u001b[1;32m 319\u001b[0m df_comp[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m] \u001b[39m=\u001b[39m df_comp[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m]\u001b[39m.\u001b[39mastype(\u001b[39mbool\u001b[39m)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexing.py:2438\u001b[0m, in \u001b[0;36m_AtIndexer.setitem\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 2435\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mobj\u001b[39m.\u001b[39mloc[key] \u001b[39m=\u001b[39m value\n\u001b[1;32m 2436\u001b[0m \u001b[39mreturn\u001b[39;00m\n\u001b[0;32m-> 2438\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39msuper\u001b[39;49m()\u001b[39m.\u001b[39;49m\u001b[39m__setitem\u001b[39;49m(key, value)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexing.py:2393\u001b[0m, in \u001b[0;36m_ScalarAccessIndexer.setitem\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 2390\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(key) \u001b[39m!=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mndim:\n\u001b[1;32m 2391\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m"\u001b[39m\u001b[39mNot enough indexers for scalar access (setting)!\u001b[39m\u001b[39m"\u001b[39m)\n\u001b[0;32m-> 2393\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mobj\u001b[39m.\u001b[39;49m_set_value(\u001b[39m*\u001b[39;49mkey, value\u001b[39m=\u001b[39;49mvalue, takeable\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_takeable)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/frame.py:4208\u001b[0m, in \u001b[0;36mDataFrame._set_value\u001b[0;34m(self, index, col, value, takeable)\u001b[0m\n\u001b[1;32m 4206\u001b[0m iindex \u001b[39m=\u001b[39m cast(\u001b[39mint\u001b[39m, index)\n\u001b[1;32m 4207\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m-> 4208\u001b[0m icol \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mcolumns\u001b[39m.\u001b[39;49mget_loc(col)\n\u001b[1;32m 4209\u001b[0m iindex \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mindex\u001b[39m.\u001b[39mget_loc(index)\n\u001b[1;32m 4210\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_mgr\u001b[39m.\u001b[39mcolumn_setitem(icol, iindex, value)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3807\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3802\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mKeyError\u001b[39;00m(key) \u001b[39mfrom\u001b[39;00m \u001b[39merr\u001b[39;00m\n\u001b[1;32m 3803\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mTypeError\u001b[39;00m:\n\u001b[1;32m 3804\u001b[0m \u001b[39m# If we have a listlike key, _check_indexing_error will raise\u001b[39;00m\n\u001b[1;32m 3805\u001b[0m \u001b[39m# InvalidIndexError. Otherwise we fall through and re-raise\u001b[39;00m\n\u001b[1;32m 3806\u001b[0m \u001b[39m# the TypeError.\u001b[39;00m\n\u001b[0;32m-> 3807\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_check_indexing_error(key)\n\u001b[1;32m 3808\u001b[0m \u001b[39mraise\u001b[39;00m\n\u001b[1;32m 3810\u001b[0m \u001b[39m# GH#42269\u001b[39;00m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:5963\u001b[0m, in \u001b[0;36mIndex._check_indexing_error\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 5959\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_check_indexing_error\u001b[39m(\u001b[39mself\u001b[39m, key):\n\u001b[1;32m 5960\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m is_scalar(key):\n\u001b[1;32m 5961\u001b[0m \u001b[39m# if key is not a scalar, directly raise an error (the code below\u001b[39;00m\n\u001b[1;32m 5962\u001b[0m \u001b[39m# would convert to numpy arrays and raise later any way) - GH29926\u001b[39;00m\n\u001b[0;32m-> 5963\u001b[0m \u001b[39mraise\u001b[39;00m InvalidIndexError(key)\n\n\u001b[0;31mInvalidIndexError\u001b[0m: slice(None, None, None)"
}


TypeError Traceback (most recent call last)
File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance)
3799 try:
-> 3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:

File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:144, in pandas._libs.index.IndexEngine.get_loc()

TypeError: 'slice(None, None, None)' is an invalid key

During handling of the above exception, another exception occurred:

InvalidIndexError Traceback (most recent call last)
Cell In [45], line 1
----> 1 az.compare(comp_dict)

File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:306, in compare(compare_dict, ic, method, b_samples, alpha, seed, scale, var_name)
304 std_err = ses.loc[val]
305 weight = weights[idx]
--> 306 df_comp.at[val] = (
307 idx,
308 res[ic],
...
5961 # if key is not a scalar, directly raise an error (the code below
5962 # would convert to numpy arrays and raise later any way) - GH29926
-> 5963 raise InvalidIndexError(key)

InvalidIndexError: slice(None, None, None)

Expected behavior
I'd expect a dataframe with loo comparison, especially that individually values can be computed.
Additional context
Arviz version:0.12.1
CmdStanPy version:1.0.7
Python version:3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:54) [Clang 13.0.1 ]

M1 Mac Mini
macOS Monterey 12.5.1

@ahartikainen
Copy link
Contributor

Can you try current main

pip install git+https://github.com/arviz-devs/arviz

I think that was already fixed in #2104

@jerzybaranowski
Copy link
Author

It works now, but with one minor problem. When displaying warnings there is some weird artefact (bolded):

/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:802: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
warnings.warn(

@ahartikainen
Copy link
Contributor

It works now, but with one minor problem. When displaying warnings there is some weird artefact (bolded):

/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:802: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
warnings.warn(

This warning comes from loo code, maybe it should not be a warnings.warn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants