# Computing Descriptive Statistics

In [2]:
import pandas as pd
import numpy as np

pandas objects are equipped with a set of common mathematical and statistical meth-
ods. Most of these fall into the category of reductions or summary statistics, methods
that extract a single value (like the sum or mean) from a Series or a Series of values from
the rows or columns of a DataFrame. Compared with the equivalent methods of vanilla
NumPy arrays, they are all built from the ground up to exclude missing data

In [3]:
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
[np.nan, np.nan], [0.75, -1.3]],
index=['a', 'b', 'c', 'd'],
columns=['one', 'two'])

In [4]:
df

Unnamed: 0,one,two
a,1.4,
b,7.1,-4.5
c,,
d,0.75,-1.3


Calling DataFrame’s sum method returns a Series containing column sums:

In [7]:
df.sum() 

one    9.25
two   -5.80
dtype: float64

In [8]:
df.sum(axis=1)

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

In [10]:
df.mean(axis=1, skipna=True)

a    1.400
b    1.300
c      NaN
d   -0.275
dtype: float64

In [11]:
df.mean(axis=1, skipna=False)

a      NaN
b    1.300
c      NaN
d   -0.275
dtype: float64

Some methods, like idxmin and idxmax , return indirect statistics like the index value
where the minimum or maximum values are attained

In [15]:
df.idxmax()

one    b
two    d
dtype: object

In [18]:
df.cumsum()

Unnamed: 0,one,two
a,1.4,
b,8.5,-4.5
c,,
d,9.25,-5.8


In [19]:
df.describe()

Unnamed: 0,one,two
count,3.0,2.0
mean,3.083333,-2.9
std,3.493685,2.262742
min,0.75,-4.5
25%,1.075,-3.7
50%,1.4,-2.9
75%,4.25,-2.1
max,7.1,-1.3


# Unique Values, Value Counts, and Membership

In [20]:
sr = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])

In [24]:
sr.unique()

array(['c', 'a', 'd', 'b'], dtype=object)

In [25]:
sr.value_counts()

a    3
c    3
b    2
d    1
dtype: int64

isin : Compute boolean array indicating whether each Series value is contained in the passed sequence of values.


In [27]:
sr.isin(['a'])

0    False
1     True
2    False
3     True
4     True
5    False
6    False
7    False
8    False
dtype: bool