# Summary Statistics
Summary statistics is a part of descriptive statistics that summarizes and provides the gist of information about the sample data. Statisticians commonly try to describe and characterize the observations by finding: a measure of location, or central tendency, such as the arithmetic mean.

In [1]:
import pandas as pd
import numpy as np

In [3]:
# read dataset
df = pd.read_csv('Srt_dta.csv')
df

Unnamed: 0,Name,Breed,Color,Height(cm),Weight(kg),Date of Birth
0,Bella,Labrador,Brown,56,25,2013-07-01
1,Charlie,Poddle,Black,43,23,2016-09-16
2,Lucy,Chow Chow,Brown,46,22,2014-08-25
3,Copper,Schnauzer,Gray,49,17,2011-12-11
4,Max,Labrador,Black,59,29,2017-01-20
5,Stella,Chihuahua,Tan,18,2,2015-04-20
6,Bernle,St. Bernard,White,77,74,2018-02-27


# Summarizing numerical data

In [4]:
df['Height(cm)'].mean()

49.714285714285715

# Summarizing dates

In [5]:
df['Date of Birth'].min()

'2011-12-11'

In [6]:
df['Date of Birth'].max()

'2018-02-27'

# The .agg() method
agg() is used to pass a function or list of function to be applied on a series or even each element of series separately. In case of list of function, multiple results are returned by agg() method.

In [9]:
def pct30(column):
    return column.quantile(0.3)

df['Weight(kg)'].agg(pct30)

21.0

# Summaries on multiple columns

In [10]:
df[['Height(cm)', 'Weight(kg)']].agg(pct30)

Height(cm)    45.4
Weight(kg)    21.0
dtype: float64

# Multiple summaries

In [12]:
def pct40(column):
    return column.quantile(0.4)

df['Height(cm)'].agg([pct30, pct40])

pct30    45.4
pct40    47.2
Name: Height(cm), dtype: float64

# Cumulative sum

In [14]:
df['Weight(kg)'].cumsum()
# another method
# .cummax()
# .cumprod()
# .cummin()

0     25
1     48
2     70
3     87
4    116
5    118
6    192
Name: Weight(kg), dtype: int64