## Useful Functions for Summarizing Data

- **describe()**: Provides summary statistics for each column
- **mean()**: Returns the mean of all columns
- **corr()**: Returns the correlation between columns in a DataFrame
- **count()**: Returns the number of non-null values in each DataFrame column
- **max()**: Returns the highest value in each column
- **min()**: Returns the lowest value in each column
- **median()**: Returns the median of each column
- **std()**: Returns the standard deviation
- **groupby()**: Splits data into groups
- **sum()**: Returns the sum of each column

In [7]:
# Summarizing numercial data with pandas
import pandas as pd
import numpy as np

#* In this Notebook we learn how to use some functions like cumsum(), cummax() etc.,
#* We learns too about agg() funciton that allows us to apply multiple functions to a DataFrame at once.

# Create a dictionary with data about dogs
data = {
    'Name': ['Bella', 'Lucy', 'Max', 'Charlie', 'Buddy', 'Rocky', 'Molly', 'Daisy', 'Bailey', 'Lola'],
    'Age': [3, 5, 2, 4, 6, 1, 7, 3, 5, 2],
    'Breed': ['Labrador', 'Poodle', 'Bulldog', 'Beagle', 'Pug', 'Boxer', 'Dachshund', 'Shih Tzu', 'Husky', 'Chihuahua'],
    'Weight': [22, 18, 25, 20, 10, 30, 12, 8, 28, 6]
}
# Create a DataFrame from the dictionary
df_dogs = pd.DataFrame(data)

df_dogs[['Age']].cumsum()

Unnamed: 0,Age
0,3
1,8
2,10
3,14
4,20
5,21
6,28
7,31
8,36
9,38


DataFrame.agg(func=None, axis=0, *args, **kwargs)[source]
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions

In [11]:
def iqr(column):
    return column.quantile(0.75) - column.quantile(0.25)

# Calculate the interquartile range for the 'Age' column
df_dogs['Age'].agg(iqr)
#* We can pass an entire dataframe to the agg() function or just a few columns
# And we can use more than one function at a time for a agg() call
df_dogs[['Age']].agg(['mean', 'median', iqr])

Unnamed: 0,Age
mean,3.8
median,3.5
iqr,2.75
