Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an InferenceData object #173

Merged
merged 6 commits into from
Aug 26, 2018
Merged

Conversation

ColCarroll
Copy link
Member

This is a pretty big change - see also #169. I am sure there are some rough edges, but this seems like a flexible way to get to feature parity with current PyMC3 plotting.

  • InferenceData is the object that carries all the schema data that is available. Working with it a little bit, I like it - it is just a light wrapper on a netCDF.Dataset that accesses xarray.Dataset's. Supports tab completion, and usage looks like:
data = az.InferenceData('my_analysis.nc')
data.posterior.mu.mean()   # 42.0
data.sample_stats.diverging.sum()  # 2
data.prior.mu.mean()  # AttributeError until it gets implemented
print(data)

Inference data from "/home/my_analysis.nc" with groups:
	> posterior
	> sample_stats
  • I added sample_stats to the PyMC3 extractor. It was pretty easy, and I can follow up with a similar job on PyStan (or @ahartikainen can!). We should argue about names for those sample_stats as well as the required/optional stats then, and update the schema accordingly.

  • A funny thing about InferenceData is that it is file based, not memory based. I did not want to require every plotting function to require a filename, and I want them to work out of the box with PyMC3 or PyStan objects, so making a plot with one of these objects will write a file to disk. It uses tempfile to get a unique filename, and will always write into the same directory. By default, it writes to .arviz_data/, but that can be updated with

    import arviz as az
    az.config['default_data_directory'] = 'somewhere_else'

    If anyone has a more elegant way of handling this, I'm all ears. I was thinking of at least adding a warning every once in a while about the existence of this folder along with a suggestion to clean it out. Maybe every time the number of files in the directory is a multiple of 10, spawn a warning?

  • I updated the sample data to use InferenceData. az.load_arviz_data('centered_eight') is a nice way to start playing with this.

  • I tried to update documentation and function names, and got most of the way there.

@ahartikainen
Copy link
Contributor

I can update PyStan stuff, I just need the normalized names.

sample_stats == rhat, etc ?

And then we also have sampler_stats or diagnostics?

@ColCarroll
Copy link
Member Author

Heh, this maybe means we need better names. I took these straight from their names in pymc3 after a NUTS run:

data.sample_stats

<xarray.Dataset>
Dimensions:           (chain: 4, draw: 500)
Coordinates:
  * chain             (chain) int64 0 1 2 3
  * draw              (draw) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
Data variables:
    depth             (chain, draw) int64 ...
    diverging         (chain, draw) bool False False False False False False ...
    energy            (chain, draw) float64 ...
    energy_error      (chain, draw) float64 ...
    max_energy_error  (chain, draw) float64 ...
    mean_tree_accept  (chain, draw) float64 ...
    step_size         (chain, draw) float64 ...
    step_size_bar     (chain, draw) float64 ...
    tree_size         (chain, draw) float64 ...
    tune              (chain, draw) bool ...

@@ -1,18 +1,21 @@
from abc import ABC, abstractmethod, abstractstaticmethod
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to call this xarray_utils now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - I think we're close to getting rid of some of the original dataframe functions in utils.py. probably there should be a refactor then. Really, these classes to convert objects should probably live in their own module, with the inference_data object.

@ahartikainen
Copy link
Contributor

ahartikainen commented Aug 25, 2018

I have updated the PyStan code (I also did update some dim calculations).
I can't push to your branch so maybe I wait until this is merged and create another PR.

@ahartikainen
Copy link
Contributor

Also, I found that only divergent__-- >diverging and energy__ --> energy are "same". Not sure about the rest.

@ColCarroll
Copy link
Member Author

sounds good to me! most sampler stats will have to be optional (divergent doesn't make much sense for metropolis-hastings samples, for example), so we will also have to implement some schema checking after this.

@ColCarroll ColCarroll merged commit affe364 into arviz-devs:master Aug 26, 2018
@ColCarroll ColCarroll deleted the move_to_netcdf branch July 4, 2019 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants