## pandas: Data Exploration and Cleaning

### Descriptive and summary statistics

Method | Description
-------|------------ 
count | Number of non-NA values
describe | Compute set of summary statistics for Series or each DataFrame column
min, max | Compute minimum and maximum values
argmin, argmax | Compute index locations (integers) at which minimum or maximum value obtained, respectively
idxmin, idxmax | Compute index values at which minimum or maximum value obtained, respectively
quantile |Compute sample quantile ranging from 0 to 1
sum | Sum of values
mean | Mean of values
median | Arithmetic median (50% quantile) of values
mad | Mean absolute deviation from mean value
var | Sample variance of values
std | Sample standard deviation of values
skew | Sample skewness (3rd moment) of values
kurt | Sample kurtosis (4th moment) of values
cumsum | Cumulative sum of values
cummin, cummax | Cumulative minimum or maximum of values, respectively
cumprod | Cumulative product of values
diff | Compute 1st arithmetic difference (useful for time series)
pct_change | Compute percent changes

### Unique, value counts, and binning methods
Method | Description
-------|-----------
isin | Compute boolean array indicating whether each Series value is contained in the passed sequence of values.
unique | Compute array of unique values in a Series, returned in the order observed.
value_counts | Return a Series containing unique values as its index and frequencies as its values, ordered count in descending order.

### Correlation and Covariance

The `corr` method of Series computes the correlation of the overlapping, non-NA, aligned-by-index values in two Series. Relatedly, `cov` computes the covariance.

DataFrame’s `corr` and `cov` methods, on the other hand, return a full correlation or
covariance matrix as a DataFrame, respectively.

Using DataFrame’s `corrwith` method, you can compute pairwise correlations between a DataFrame’s columns or rows with another Series or DataFrame.

### NA handling methods
Argument | Description
---------|------------
dropna | Filter axis labels based on whether values for each label have missing data, with varying thresholds for how much missing data to tolerate.
fillna | Fill in missing data with some value or using an interpolation method such as 'ffill' or 'bfill'.
isnull | Return like-type object containing boolean values indicating which values are missing / NA.
notnull | Negation of isnull.

#### Filtering Out Missing Data

You have a number of options for filtering out missing data. While doing it by hand is always an option, `dropna()` can be very helpful. On a Series, it returns the Series with only the non-null data and index values.

With DataFrame objects, these are a bit more complex. You may want to drop rows or columns which are all NA or just those containing any NAs. `dropna` by default drops *any row containing a missing value*. However, passing `how='all'` will only drop rows that are all NA. Dropping columns in the same way is only a matter of passing `axis=1`.

A related way to filter out DataFrame rows tends to concern time series data. Suppose you want to keep only rows containing a certain number of observations. You can indicate this with the `thresh` argument: `df.dropna(thresh=[some threshold value]`.

#### Filling In Missing Data

Calling `fillna` with a constant replaces missing values with that value: `df.fillna(0)`. Calling `fillna` with a dict you can use a different fill value for each column:
    `df.fillna({1: 0.5, 3: -1})`. `fillna` returns a new object, but you can modify the existing object in place: `df.fillna(0, inplace=True)`  

With `fillna` you can do lots of other things with a little creativity. For example, you
might pass the mean or median value of a Series: `data.fillna(data.mean())`

**fillna function arguments**

Argument | Description
---------|------------
value | Scalar value or dict-like object to use to fill missing values
method | Interpolation, by default 'ffill' if function called with no other arguments
axis | Axis to fill on, default axis=0
inplace | Modify the calling object without producing a copy
limit | For forward and backward filling, maximum number of consecutive periods to fill


*******************************************
Source:

**Python for Data Analysis**  
by Wes McKinney  
Copyright © 2013 Wes McKinney. All rights reserved.  
Printed in the United States of America.  
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.