## Elementary Statistics in numpy
### Order statistics
**Function**   |       **Action/Result**
-----------------   | -----------------
ptp( a, axis=None, ...)   |Range of values (maximum - minimum) along an axis.  
percentile( a, q, axis=None, ..., method='linear', keepdims=False)  |Compute the q-th percentile of the data along the specified axis.
quantile( a, q, axis=None, ..., method='linear', keepdims=False)   |Compute the q-th quantile of the data along the specified axis.

**keepdims:** bool, optional --
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

### Averages and variances
**Function**   |       **Action/Result**
-----------------   | -----------------
median( a, axis=None, ..., keepdims=False)       |       Compute the median along the specified axis.
average( a, axis=None, weights=None, ..., keepdims=False)         |       Compute the weighted average along the specified axis.
mean( a, axis=None, ..., keepdims=False, where)            |       Compute the arithmetic mean along the specified axis.
std( a, axis=None, ..., ddof, keepdims=False, where)       |       Compute the standard deviation along the specified axis.
var( a, axis=None, ..., ddof, keepdims=False, where)       |       Compute the variance along the specified axis.

**median:** Given a vector $V$ of length $N$, the median of $V$ is the middle value of a sorted copy of $V, V_{sorted}$ - i e., $V_{sorted}\left[\frac{N-1}{2}\right]$, when $N$ is odd, and the average of the two middle values of $V_{sorted}$ when $N$ is even.

**avg = sum(a * weights) / sum(weights)**  

**where:** array_like of bool, optional -- 
Elements to include in the mean. 

**ddof:** int, optional --
Means *Delta Degrees of Freedom*. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.


### **Histograms**
**Functions**   |       **Actions**
-----------------   | -----------------
numpy.histogram( a, bins=10, range=None, density=None, weights=None)  |        Compute the histogram of a dataset.
numpy.histogram2d( x, y, bins=10, range=None, density=None, weights=None)  |   Compute the bi-dimensional histogram of two data samples.
histogram_bin_edges( a, bins=10, range=None, weights=None) | Function to calculate only the edges of the bins used by the histogram function.

The methods to estimate the optimal number of bins are well founded in literature, and are inspired by the choices R provides for histogram visualisation. Note that having the number of bins proportional to 
$n^{1/3}$ is asymptotically optimal, which is why it appears in most estimators.  
If bins is a string from the list below, histogram_bin_edges will use the method chosen to calculate the optimal bin width and consequently the number of bins (see Notes for more detail on the estimators) from the data that falls within the requested range. While the bin width will be optimal for the actual data in the range, the number of bins will be computed to fill the entire range, including the empty portions. For visualisation, using the ‘auto’ option is suggested. Weighted data is not supported for automated bin size selection.

* ‘auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good all around performance.

* ‘fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into account data variability and data size.

* ‘doane’
An improved version of Sturges’ estimator that works better with non-normal datasets.

* ‘scott’
Less robust estimator that takes into account data variability and data size.

* ‘stone’
Estimator based on leave-one-out cross-validation estimate of the integrated squared error. Can be regarded as a generalization of Scott’s rule.

* ‘rice’
Estimator does not take variability into account, only data size. Commonly overestimates number of bins required.

* ‘sturges’
R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.

* ‘sqrt’
Square root (of data size) estimator, used by Excel and other programs for its speed and simplicity.


### Correlating
**Function**   |       **Action/Value**
-----------------   | -----------------
cov( m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, ...) |Estimate a covariance matrix, given data and weights.
corrcoef( x, y=None, rowvar=True) | Return Pearson product-moment correlation coefficients.

* **bias:** bool, optional  
Default normalization (False) is by $N - 1$, where $N$ is the number of observations given (unbiased estimate). If bias is True, then normalization is by $N$. These values can be overridden by using the keyword **ddof** in numpy versions >= 1.5.
* The relationship between the *correlation coefficient* matrix, $R$, and the *covariance matrix*, $C$, is
$$R_{ij}=\frac{C_{ij}}{\sqrt{C_{ii}C_{jj}}}$$
