# Summarizing and Computing Descriptive Statistics

pandas objects are equipped with a set of common mathematical and statistical methods. Most of these fall into the category of *reductions or summary statistics*, methods that extract a single value (like the sum or mean) from a Series or a Series of values from the rows or columns of a DataFrame. Compared with the equivalent methods of vanilla NumPy arrays, they are all built from the ground up to exclude missing data. Consider a small DataFrame:

In [2]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [11]:
df = DataFrame([[1, -5], [np.nan, 3], [8, 2], [-4, np.nan]],
            index= ['a', 'b', 'c', 'd'], columns=['one', 'two'])

df

Unnamed: 0,one,two
a,1.0,-5.0
b,,3.0
c,8.0,2.0
d,-4.0,


Calling DataFrame’s sum method returns a Series containing column sums:

In [8]:
df.sum()

one    0.0
two    3.0
dtype: float64

Passing axis=1 sums over the rows instead:

In [9]:
df.sum(axis= 1)

a   -1.0
b    3.0
c    1.0
d    0.0
dtype: float64

NA values are excluded unless the entire slice (row or column in this case) is NA. This can be disabled using the skipna option:

In [12]:
df.mean(axis= 1, skipna= False)

a   -2.0
b    NaN
c    5.0
d    NaN
dtype: float64

![Options for reduction methods](../../Pictures/Options%20for%20reduction%20methods.png)

Some methods, like *idxmin and idxmax*, return indirect statistics like the index value where the minimum or maximum values are attained:

In [22]:
df.idxmax()

one    c
two    b
dtype: object

Another type of method is neither a reduction nor an accumulation. *describe* is one such example, producing multiple summary statistics in one shot:

In [23]:
df.describe()

Unnamed: 0,one,two
count,3.0,3.0
mean,1.666667,0.0
std,6.027714,4.358899
min,-4.0,-5.0
25%,-1.5,-1.5
50%,1.0,2.0
75%,4.5,2.5
max,8.0,3.0


On non-numeric data, describe produces alternate summary statistics:

In [24]:
obj = Series(['a','a', 'b', 'c'] * 4)

obj

0     a
1     a
2     b
3     c
4     a
5     a
6     b
7     c
8     a
9     a
10    b
11    c
12    a
13    a
14    b
15    c
dtype: object

In [25]:
obj.describe()

count     16
unique     3
top        a
freq       8
dtype: object

![Descriptive and summary statistics](../../Pictures/Descriptive%20and%20summary%20statistics.png)