# Summary Statistics

Provides various methods to calculate summary statistics on DataFrame columns or rows. Summary statistics help to get a quick overview of the data and understand its characteristics. Here are some common methods for computing summary statistics in pandas:

In [3]:
import pandas as pd

In [4]:
path = r"C:\Users\Alysson\Downloads\database.xlsx"

data = pd.read_excel(path)
data.head()

Unnamed: 0,name,breed,color,height_cm,weight_kg,age
0,Paçoca,Labrador,Brown,56,25,5
1,Ivo,Poodle,Black,43,22,4
2,Lola,Schnauzer,Gray,49,23,4
3,Maracatu,King Cavalier,Brown,43,21,3
4,Chantal,Labrador,Black,59,29,6


In [5]:
data['weight_kg'].mean()

24.0

In [6]:
data[['weight_kg']].mode()

Unnamed: 0,weight_kg
0,21
1,22
2,23
3,25
4,29


In [7]:
data['weight_kg'].min()

21

In [8]:
data['weight_kg'].max()

29

In [9]:
data['weight_kg'].var()

10.0

In [10]:
data['weight_kg'].std()

3.1622776601683795

In [11]:
def pct30(column):
    return column.quantile(0.3)

data['weight_kg'].agg(pct30)

22.2

In [12]:
data[['weight_kg','height_cm']].agg(pct30)

weight_kg    22.2
height_cm    44.2
dtype: float64

In [13]:
def pct40(column):
    return column.quantile(0.4)

data['weight_kg'].agg([pct30,pct40])

pct30    22.2
pct40    22.6
Name: weight_kg, dtype: float64

## Cumulative Statistics

The resulting series or DataFrame will display the intermediate cumulative results at each row.

In [15]:
data['weight_kg'].cumsum()

0     25
1     47
2     70
3     91
4    120
Name: weight_kg, dtype: int64

In [16]:
data['weight_kg'].cummax()

0    25
1    25
2    25
3    25
4    29
Name: weight_kg, dtype: int64

In [17]:
data['weight_kg'].cummin()

0    25
1    22
2    22
3    21
4    21
Name: weight_kg, dtype: int64

In [18]:
data['weight_kg'].cumprod()

0         25
1        550
2      12650
3     265650
4    7703850
Name: weight_kg, dtype: int64