# Descriptive Statistics via `DTable()`

The function `to.DTable()` allows to display descriptive statistics for a set of variables in the same layout.

## Basic Usage of dtable
Specify the variables you want to display the descriptive statistics for. You can also use a dictionary to rename the variables and add a caption.


In [4]:
# Import necessary libraries
import numpy as np
import pandas as pd
import maketables as mt

# Load sample dataset
df = pd.read_csv("data/salaries.csv")

# Define variable labels
labels = {
    "logwage": "ln(Wage)",
    "wage": "Wage",
    "age": "Age",
    "female": "Female",
    "tenure": "Years of Tenure",
    "occupation": "Occupation",
    "worker_type": "Worker Type",
    "education": "Education Level"
}


In [6]:
mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    labels=labels,
    caption="Descriptive statistics",
)

Descriptive statistics,Descriptive statistics,Descriptive statistics,Descriptive statistics
Unnamed: 0_level_1,N,Mean,Std. Dev.
Wage,1800.0,62741.77,28312.41
ln(Wage),1800.0,10.94,0.48
Age,1800.0,40.77,11.1
Years of Tenure,1800.0,17.62,11.18
,,,




Choose the set of statistics to be displayed with `stats`. You can use any pandas aggregation functions.


In [7]:
mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    stats=["count", "mean", "std", "min", "max"],
    labels=labels,
    caption="Descriptive statistics",
)

Descriptive statistics,Descriptive statistics,Descriptive statistics,Descriptive statistics,Descriptive statistics,Descriptive statistics
Unnamed: 0_level_1,N,Mean,Std. Dev.,Min,Max
Wage,1800.0,62741.77,28312.41,25000.0,166589.0
ln(Wage),1800.0,10.94,0.48,10.13,12.02
Age,1800.0,40.77,11.1,22.0,65.0
Years of Tenure,1800.0,17.62,11.18,0.0,43.0
,,,,,




## Summarize by characteristics in columns and rows
You can summarize by characteristics using the `bycol` argument when groups are to be displayed in columns. When the number of observations is the same for all variables in a group, you can also opt to display the number of observations only once for each group byin a separate line at the bottom of the table with `counts_row_below==True`.


In [8]:
# Generate a categorical variable for gender from the dummy variable
df["gender"] = df["female"].map({0: "Male", 1: "Female"})

mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    labels=labels,
    bycol=["worker_type","gender"],
    stats=["count", "mean", "std"],
    caption="Descriptive statistics by worker type and gender",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
)

Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender
Unnamed: 0_level_1,Blue Collar,Blue Collar,Blue Collar,Blue Collar,White Collar,White Collar,White Collar,White Collar
Unnamed: 0_level_2,Female,Female,Male,Male,Female,Female,Male,Male
Unnamed: 0_level_3,Mean,Std. Dev.,Mean,Std. Dev.,Mean,Std. Dev.,Mean,Std. Dev.
stats,stats,stats,stats,stats,stats,stats,stats,stats
Wage,53899.74,24679.29,54360.28,26129.05,65614.76,27897.84,71399.23,29204.37
ln(Wage),10.79,0.47,10.79,0.49,11.00,0.45,11.08,0.46
Age,41.10,10.96,39.83,11.14,41.79,11.02,40.20,11.17
Years of Tenure,17.86,11.19,16.73,11.15,18.59,11.08,17.10,11.23
nobs,nobs,nobs,nobs,nobs,nobs,nobs,nobs,nobs
Number of observations,357,,368,,530,,545,
,,,,,,,,




You can also use custom aggregation functions to compute further statistics or affect how statistics are presented. Pyfixest provides two such functions `mean_std` and `mean_newline_std` which compute the mean and standard deviation and display both the same cell (either with line break between them or not). This allows to have more compact tables when you want to show statistics for many characteristcs in the columns.

You can also hide the display of the statistics labels in the header with `hide_stats_labels=True`. In that case a table note will be added naming the statistics displayed using its label (if you have not provided a custom note).


In [9]:
mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    labels=labels,
    bycol=["worker_type", "gender"],
    stats=["mean_newline_std", "count"],
    caption="Descriptive statistics by worker type and gender",
    stats_labels={"count": "Number of observations"},
    counts_row_below=True,
    hide_stats=True,
)

Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender
Unnamed: 0_level_1,Blue Collar,Blue Collar,White Collar,White Collar
Unnamed: 0_level_2,Female,Male,Female,Male
stats,stats,stats,stats,stats
Wage,53899.74 (24679.29),54360.28 (26129.05),65614.76 (27897.84),71399.23 (29204.37)
ln(Wage),10.79 (0.47),10.79 (0.49),11.00 (0.45),11.08 (0.46)
Age,41.10 (10.96),39.83 (11.14),41.79 (11.02),40.20 (11.17)
Years of Tenure,17.86 (11.19),16.73 (11.15),18.59 (11.08),17.10 (11.23)
nobs,nobs,nobs,nobs,nobs
Number of observations,357,368,530,545
Note: Displayed statistics are Mean (Std. Dev.).,Note: Displayed statistics are Mean (Std. Dev.).,Note: Displayed statistics are Mean (Std. Dev.).,Note: Displayed statistics are Mean (Std. Dev.).,Note: Displayed statistics are Mean (Std. Dev.).




You can also split by characteristics in both columns and rows. Note that you can only use one grouping variable in rows, but several in columns (as shown above).


In [10]:
mt.DTable(
    df,
    vars=["wage", "logwage", "age", "tenure"],
    labels=labels,
    bycol=["worker_type"],
    byrow="gender",
    stats=["count", "mean", "std"],
    caption="Descriptive statistics by worker type and gender",
)

Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender,Descriptive statistics by worker type and gender
Unnamed: 0_level_1,Blue Collar,Blue Collar,Blue Collar,White Collar,White Collar,White Collar
Unnamed: 0_level_2,N,Mean,Std. Dev.,N,Mean,Std. Dev.
Female,Female,Female,Female,Female,Female,Female
Wage,357,53899.74,24679.29,530,65614.76,27897.84
ln(Wage),357,10.79,0.47,530,11.00,0.45
Age,357,41.10,10.96,530,41.79,11.02
Years of Tenure,357,17.86,11.19,530,18.59,11.08
Male,Male,Male,Male,Male,Male,Male
Wage,368,54360.28,26129.05,545,71399.23,29204.37
ln(Wage),368,10.79,0.49,545,11.08,0.46
Age,368,39.83,11.14,545,40.20,11.17
Years of Tenure,368,16.73,11.15,545,17.10,11.23


