# binned_statistic_2dの説明

Scipyには，binned_statistic_2dという関数が含まれています．この関数が大変便利でしたので，使い方を説明します．


## ソースコード

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html

## 概要の和訳

histgram2dの進化形．ヒストグラムはデータをビンに分割し，ビンに含まれるデータの数を返す．この関数は，総和，平均など様々な算術演算をビンに含まれるデータに対して行うことができる．

# 本体

binned_statistic_2d(x, y, values, statistic='mean',bins=10, range=None, expand_binnumbers=False):


Parameters
----------
x : (N,) array_like
    A sequence of values to be binned along the first dimension.
    
    
y : (N,) array_like
    A sequence of values to be binned along the second dimension.
    
    
values : (N,) array_like or list of (N,) array_like
    The data on which the statistic will be computed.  This must be
    the same shape as `x`, or a list of sequences - each with the same
    shape as `x`.  If `values` is such a list, the statistic will be
    computed on each independently.
    
    
statistic : string or callable, optional
    The statistic to compute (default is 'mean').
    The following statistics are available:
    
      * 'mean' : compute the mean of values for points within each bin.
        Empty bins will be represented by NaN.
        
      * 'std' : compute the standard deviation within each bin. This
        is implicitly calculated with ddof=0.
        
      * 'median' : compute the median of values for points within each
        bin. Empty bins will be represented by NaN.
        
      * 'count' : compute the count of points within each bin.  This is
        identical to an unweighted histogram.  `values` array is not
        referenced.
        
      * 'sum' : compute the sum of values for points within each bin.
        This is identical to a weighted histogram.
        
      * 'min' : compute the minimum of values for points within each bin.
        Empty bins will be represented by NaN.
        
      * 'max' : compute the maximum of values for point within each bin.
        Empty bins will be represented by NaN.
        
      * function : a user-defined function which takes a 1D array of
        values, and outputs a single numerical statistic. This function
        will be called on the values in each bin.  Empty bins will be
        represented by function([]), or NaN if this returns an error.
        
bins : int or [int, int] or array_like or [array, array], optional
    The bin specification:
    
      * the number of bins for the two dimensions (nx = ny = bins),
      
      * the number of bins in each dimension (nx, ny = bins),
      
      * the bin edges for the two dimensions (x_edge = y_edge = bins),
      
      * the bin edges in each dimension (x_edge, y_edge = bins).
      
    If the bin edges are specified, the number of bins will be,
    (nx = len(x_edge)-1, ny = len(y_edge)-1).
    
range : (2,2) array_like, optional

    The leftmost and rightmost edges of the bins along each dimension
    (if not specified explicitly in the `bins` parameters):
    [[xmin, xmax], [ymin, ymax]]. All values outside of this range will be
    considered outliers and not tallied in the histogram.
    
expand_binnumbers : bool, optional

    'False' (default): the returned `binnumber` is a shape (N,) array of
    linearized bin indices.
    'True': the returned `binnumber` is 'unraveled' into a shape (2,N)
    ndarray, where each row gives the bin numbers in the corresponding
    dimension.
    See the `binnumber` returned value, and the `Examples` section.
    .. versionadded:: 0.17.0
    
Returns
-------
statistic : (nx, ny) ndarray
    The values of the selected statistic in each two-dimensional bin.
    
x_edge : (nx + 1) ndarray
    The bin edges along the first dimension.
    
y_edge : (ny + 1) ndarray
    The bin edges along the second dimension.
    
binnumber : (N,) array of ints or (2,N) ndarray of ints
    This assigns to each element of `sample` an integer that represents the
    bin in which this observation falls.  The representation depends on the
    `expand_binnumbers` argument.  See `Notes` for details.


# 触っていきます

In [1]:
from scipy import stats

In [2]:
#　データの存在する座標
x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]


# データの存在する座標を分割する際のグリッドの座標
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]

'''
図解


    0.0  0.5  1.0
2.0 -|----|----| 
2.5 -|----|----| 
3.0 -|----|----| 
'''

'\n図解\n\n\n    0.0  0.5  1.0\n2.0 -|----|----| \n2.5 -|----|----| \n3.0 -|----|----| \n'

In [3]:
# 今回はcountを指定したので，どこに何個データがあるかを返している
ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx, biny])
print(ret.statistic)

[[2. 1.]
 [1. 0.]]


# 算術演算をやってみる

> binned_statistic_2d(x, y, values, statistic='mean',bins=10, range=None, expand_binnumbers=False):

ここのstatisticにいろいろ突っ込めるらしい．



### 平均を求めることができる

valuesにx,yの順番に対応したデータ群を突っ込むと，平均を求めることができるらしいぞ．

‘mean’ : compute the mean of values for points within each bin. Empty bins will be represented by NaN.


### 関数を突っ込むこともできる

function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

## 平均を求める

In [4]:
# データの存在する座標
x = [0.5, 0.3, 0.7]
y = [0.5, 0.6, 0.2]

# データの持つ値
v = [1, 2, 3]

# 分割する時のグリッド
binx = [0.0, 1.0, 2.0]
biny = [0.0, 1.0, 2.0]

In [5]:
ret = stats.binned_statistic_2d(x, y, v, 'mean', bins=[binx, biny])
print(ret.statistic)

[[ 2. nan]
 [nan nan]]
