Skip to content

CHAI-NU/pyartool

Repository files navigation

PyARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs

Python port of the R ARTool package.

PyARTool implements the Aligned Rank Transform (ART) for conducting nonparametric analyses of variance on factorial models. It faithfully translates the R ARTool package by Wobbrock, Findlater, Gergle, Higgins, Kay, and Elkin to Python, producing numerically identical results.


Table of Contents


Overview

The Aligned Rank Transform (ART) is a nonparametric technique that allows you to use standard ANOVA procedures on ranked data, while correctly handling main effects, interactions, and contrasts in factorial designs. It works by:

  1. Aligning the response variable to strip out effects not of interest for each term.
  2. Ranking the aligned responses.
  3. Running standard ANOVAs on the aligned-and-ranked data.

PyARTool automates this entire pipeline and additionally supports the ART-C procedure for post-hoc contrast tests (Elkin et al., 2021).

When to Use ART

Use the Aligned Rank Transform when:

  • Your data violates ANOVA assumptions (non-normality, heteroscedasticity).
  • You have a factorial design (two or more factors) — ART handles interactions correctly, unlike simpler rank-based tests.
  • You need post-hoc pairwise or interaction contrasts on nonparametric data.

For more background, see the ARTool project page.


Installation

From PyPI (recommended)

pip install pyartool

From source

git clone <this-repo>
cd PyARTool

# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # macOS/Linux
# .venv\Scripts\activate    # Windows

# Install in editable mode
pip install -e .

Requirements

  • Python >= 3.9
  • numpy >= 1.22
  • pandas >= 1.4
  • scipy >= 1.8
  • statsmodels >= 0.13

Quick Start

from pyartool import art, anova_art, art_con, load_higgins1990_table5

# Load data
df = load_higgins1990_table5()

# Step 1: Apply the Aligned Rank Transform
m = art("DryMatter ~ Moisture * Fertilizer + (1|Tray)", data=df)

# Step 2: Run the nonparametric ANOVA
print(anova_art(m))
#              Term  Df  Df.res        F        Pr(>F)
# 0        Moisture   3     8.0   23.833  2.419913e-04
# 1      Fertilizer   3    24.0  122.402  1.110223e-14
# 2  Moisture:Fert.   9    24.0    5.118  6.466476e-04

# Step 3: Post-hoc contrasts
print(art_con(m, "Moisture"))
#   contrast  estimate    SE  df  t.ratio   p.value
# 0  m1 - m2   -23.083  4.12   8   -5.607    0.0023
# ...

API Reference

art() — Aligned Rank Transform

from pyartool import art

result = art(formula, data)
Parameter Type Description
formula str R-style formula (see Formula Syntax).
data pd.DataFrame Data in long format. Factor columns should be pd.Categorical or string.

Returns an ArtResult object containing:

Attribute Description
result.formula Original formula string.
result.data Original DataFrame.
result.aligned DataFrame of aligned responses (one column per term).
result.aligned_ranks DataFrame of ranks of aligned responses.
result.residuals Residuals from the cell-means model.
result.cell_means Cell means for every term.
result.estimated_effects Estimated effects for every term.

anova_art() — ANOVA on ART Data

from pyartool import anova_art

anova_table = anova_art(m)
Parameter Type Description
m ArtResult Object returned by art().

Returns a pd.DataFrame with columns: Term, Df, Df.res, F, Pr(>F).

The model type is determined automatically by the formula:

Formula pattern Model R equivalent
Y ~ A * B OLS (fixed effects) lm()
Y ~ A * B + (1|S) Mixed-effects (REML) lmer()
Y ~ A * B + Error(S) Repeated measures aov(Error())

summary_art() — Diagnostic Summary

from pyartool import summary_art

s = summary_art(m)

Returns an ArtSummary object with:

Attribute Description
s.aligned_col_sums Dict of column sums of aligned responses (should all be ~0).
s.aligned_anova_f_values Array of F values from ANOVAs on non-target aligned responses (should all be ~0).

These diagnostics verify that the ART alignment procedure correctly stripped out effects not of interest. If values are not close to zero, the ART may not be appropriate for your data.

art_con() — Contrast Tests (ART-C)

from pyartool import art_con

contrasts = art_con(m, formula, *, response="art", method="pairwise",
                    interaction=False, adjust="tukey")
Parameter Type Default Description
m ArtResult Object returned by art().
formula str Term to contrast: "A", "A:B", or "A:B:C".
response str "art" "art" (ranked) or "aligned" (unranked).
method str "pairwise" Contrast method.
interaction bool False If True, compute difference-of-difference contrasts.
adjust str|None "tukey" P-value adjustment (see below).

Returns a pd.DataFrame with columns: contrast, estimate, SE, df, t.ratio, p.value.

artlm() / artlm_con() — Access Fitted Models

For advanced users who need the underlying statsmodels fit objects:

from pyartool import artlm, artlm_con

# Get the fitted model for a specific ART term
lm_result = artlm(m, "A:B")

# Get the fitted model for an ART-C contrast term
lm_con_result = artlm_con(m, "A:B")

Dataset Loaders

PyARTool bundles the same example datasets as the R package:

from pyartool import (
    load_higgins1990_table1,   # 3x3 between-subjects
    load_higgins1990_table5,   # 4x4 split-plot (Moisture x Fertilizer)
    load_elkin_ab,             # 2x2 within-subjects
    load_elkin_abc,            # 2x2x2 within-subjects
    load_higgins_abc,          # 2x2x2 mixed design
)

df = load_higgins1990_table5()
print(df.head())
#   Tray Moisture Fertilizer  DryMatter
# 0   t1       m1         f1        3.3
# 1   t1       m1         f2        4.3
# 2   t1       m1         f3        4.5
# 3   t1       m1         f4        5.8
# 4   t2       m1         f1        4.0

Supported Designs

Design Formula Example Model
Between-subjects factorial Y ~ A * B OLS (lm)
Split-plot / mixed-effects Y ~ A * B + (1|Subject) Mixed (lmer via MixedLM)
Repeated measures (aov) Y ~ A * B + Error(Subject) RM-ANOVA (aov)
2-factor Y ~ A * B Any of the above
3-factor Y ~ A * B * C + (1|S) Any of the above
N-factor Y ~ A * B * C * D + ... Any of the above

Formula Syntax

PyARTool uses R-style formula strings. The parser supports all the same patterns as the R ARTool package:

Fixed effects

# Full factorial (A + B + A:B)
"Y ~ A * B"

# Three-way factorial (all main effects, 2-way, and 3-way interactions)
"Y ~ A * B * C"

# You can also spell out terms explicitly
"Y ~ A + B + A:B"

Random / grouping effects (mixed-effects model)

# Random intercept for Subject — fits a mixed-effects model (lmer)
"Y ~ A * B + (1|Subject)"

Error terms (repeated measures ANOVA)

# Repeated measures — fits an aov() with Error()
"Y ~ A * B + Error(Subject)"

Important notes

  • The response variable (left of ~) must be a single numeric column.
  • All factor columns should be pd.Categorical or string type. Numeric columns used as factors will raise a warning.
  • The formula must specify the full factorial — all lower-order terms must be present for any interaction term. PyARTool will raise an error if the design is not fully crossed.

Detailed Walkthrough

Example 1: Between-Subjects Factorial

A simple 3x3 factorial design with no repeated measures:

from pyartool import art, anova_art, load_higgins1990_table1

df = load_higgins1990_table1()
print(df.head())
#   Subject Row Column  Response
# 0      s1   1      1         9
# 1      s2   1      1         6
# ...

# Fit the ART model (no grouping term = OLS)
m = art("Response ~ Row * Column", data=df)

# Run ANOVA
print(anova_art(m))
#         Term  Df  Df.res       F        Pr(>F)
# 0        Row   2    27.0  29.993  1.383278e-07
# 1     Column   2    27.0  77.867  6.149827e-12
# 2  Row:Column  4    27.0   0.642  6.374203e-01

Example 2: Split-Plot / Mixed-Effects

When you have a grouping factor (e.g., trays, subjects), include (1|Group) to fit a mixed-effects model:

from pyartool import art, anova_art, summary_art, art_con
from pyartool import load_higgins1990_table5

df = load_higgins1990_table5()

# Moisture varies between trays; Fertilizer varies within trays
m = art("DryMatter ~ Moisture * Fertilizer + (1|Tray)", data=df)

# Check diagnostics
s = summary_art(m)
print("Aligned column sums:", s.aligned_col_sums)
# Should all be ~0

# ANOVA
print(anova_art(m))
#                  Term  Df  Df.res        F        Pr(>F)
# 0            Moisture   3     8.0   23.833  2.419913e-04
# 1          Fertilizer   3    24.0  122.402  1.110223e-14
# 2  Moisture:Fertilizer  9    24.0    5.118  6.466476e-04

# Post-hoc: pairwise contrasts on Moisture
# Default adjustment is Tukey HSD
print(art_con(m, "Moisture"))
#   contrast  estimate    SE  df  t.ratio   p.value
# 0  m1 - m2   -23.083  4.12   8   -5.607    0.0023
# 1  m1 - m3   -33.750  4.12   8   -8.198    0.0002
# ...

# Interaction contrasts with Holm adjustment
print(art_con(m, "Moisture:Fertilizer", adjust="holm"))

Example 3: Multi-Factor Within-Subjects

A 2x2x2 fully within-subjects design:

from pyartool import art, anova_art, art_con, load_elkin_abc

df = load_elkin_abc()
m = art("Y ~ A * B * C + (1|S)", data=df)

# Full ANOVA table
print(anova_art(m))
#    Term  Df  Df.res        F        Pr(>F)
# 0     A   1    49.0  288.181  0.000000e+00
# 1     B   1    49.0   28.103  2.732842e-06
# 2     C   1    49.0   60.510  4.168039e-10
# 3   A:B   1    49.0   28.528  2.377711e-06
# 4   A:C   1    49.0   16.545  1.720573e-04
# 5   B:C   1    49.0   76.258  1.481193e-11
# 6 A:B:C   1    49.0   75.592  1.690836e-11

# Contrasts on the 3-way interaction
print(art_con(m, "A:B:C", adjust="holm"))

# Contrasts on a 2-way interaction (averaged over 3rd factor)
print(art_con(m, "A:B", adjust="holm"))

# Single-factor contrasts
print(art_con(m, "A"))  # Tukey by default

# Different adjustment methods
print(art_con(m, "B:C", adjust="bonferroni"))

Example 4: Repeated Measures with Error()

If you prefer traditional repeated-measures ANOVA (via aov) instead of mixed-effects models, use Error():

from pyartool import art, anova_art, load_higgins_abc

df = load_higgins_abc()
m = art("Y ~ A * B * C + Error(Subject)", data=df)

print(anova_art(m))
#    Term  Df  Df.res        F        Pr(>F)
# 0     A   1     4.0  120.471  3.914986e-04
# 1     B   1     4.0  120.471  3.914986e-04
# 2     C   1     4.0   14.322  1.936216e-02
# 3   A:B   1     4.0   81.920  8.257143e-04
# 4   A:C   1     4.0    0.126  7.406643e-01
# 5   B:C   1     4.0    0.232  6.552898e-01
# 6 A:B:C   1     4.0    0.972  3.800992e-01

P-Value Adjustment Methods

The adjust parameter in art_con() supports these methods:

Value Method Description
"tukey" Tukey HSD Default. Uses the studentized range distribution. Best for pairwise comparisons.
"holm" Holm-Bonferroni Step-down procedure. Good general-purpose choice.
"bonferroni" Bonferroni Conservative; multiplies p-values by number of tests.
"fdr" or "bh" Benjamini-Hochberg Controls false discovery rate. Less conservative.
"none" or None No adjustment Raw (unadjusted) p-values.

Note: The default is "tukey", matching R's emmeans / art.con() behavior.


R Parity & Validation

PyARTool has been validated to produce numerically identical results to R's ARTool package across all bundled datasets and model types:

Dataset Design ANOVA Contrasts
Higgins1990Table1 3x3 OLS Exact match
Higgins1990Table5 4x4 split-plot (lmer) Exact match Moisture (Tukey), Fertilizer (Tukey), Moisture:Fertilizer (Holm, 120 pairs)
ElkinABC 2x2x2 within (lmer) Exact match A:B:C (Holm), A:B (Holm), A (Tukey), B:C (Bonferroni)
ElkinAB 2x2 within (lmer) Exact match A (Tukey), B (Tukey), A:B (Holm)
HigginsABC 2x2x2 mixed (aov) Exact match

The companion files artool_example.r and example.py run the same analyses in R and Python respectively, allowing side-by-side output comparison.

To run both:

# R version (requires R and the ARTool package)
Rscript artool_example.r

# Python version
python example.py

Architecture & Implementation Notes

Package Structure

PyARTool/
  src/pyartool/
    __init__.py          # Public API exports
    art.py               # Core ART: alignment + ranking (art())
    formula.py           # R-style formula parser
    effects.py           # Cell means & estimated effects
    anova.py             # ANOVA: OLS, split-plot, and RM dispatching
    models.py            # artlm: model fitting (OLS / MixedLM / aov)
    summary.py           # Diagnostic checks (summary_art())
    contrasts.py         # ART-C contrasts (art_con(), artlm_con())
    datasets.py          # Bundled dataset loaders
    data/                # CSV files for bundled datasets
  tests/                 # Test suite (84 tests)
  example.py             # Python example script
  artool_example.r       # R reference script
  pyproject.toml         # Package metadata & dependencies
  README.md              # This file

Key Design Decisions

1. R-style formula parsing. PyARTool parses R formula syntax (Y ~ A * B + (1|S)) with a custom parser rather than relying on patsy for formula interpretation. This ensures identical handling of interactions, Error() terms, and grouping terms.

2. Patsy name-conflict handling. Factor column names that conflict with patsy reserved words (e.g., a column literally named C or S) are automatically prefixed with _f_ internally before model fitting and unaliased in output. This is transparent to the user.

3. Sum (deviation) coding. For Type III ANOVA equivalence with R, all categorical variables are explicitly coded with statsmodels Sum coding (C(var, Sum)) rather than the default Treatment coding.

4. Split-plot ANOVA for mixed models. The ANOVA for mixed-effects models implements a full split-plot SS decomposition to correctly compute between-group and within-group error strata, matching R's lmer + Kenward-Roger behavior.

5. Satterthwaite degrees of freedom. For mixed-model contrasts, per-contrast degrees of freedom are computed using the Satterthwaite approximation with analytical gradients and the REML Fisher information matrix, matching R's lmerTest / emmeans.

6. Tukey HSD via studentized range. The default p-value adjustment for pairwise contrasts uses scipy.stats.studentized_range, matching R's emmeans Tukey method.

Dependency Mapping (R to Python)

R Package Python Equivalent
base R (lm, aov) statsmodels (OLS, formula API)
lme4 (lmer) statsmodels.regression.mixed_linear_model
car (Type III Anova) statsmodels.stats.anova + custom split-plot
emmeans (contrasts) Custom implementation in contrasts.py
stats::p.adjust statsmodels.stats.multitest.multipletests
stats::qtukey scipy.stats.studentized_range

Dependencies

PyARTool requires:

numpy >= 1.22
pandas >= 1.4
scipy >= 1.8
statsmodels >= 0.13

All dependencies are automatically installed when installing PyARTool via pip.


Example Scripts

Two companion scripts are included for cross-validation:

example.py — Python

Runs all five example analyses using PyARTool. This is the best place to start understanding how to use the package.

python example.py

artool_example.r — R

Runs the same five analyses using R's ARTool package. Use this to compare outputs side-by-side.

Rscript artool_example.r

Both scripts cover:

  1. Between-subjects 3x3Higgins1990Table1 (OLS, no repeated measures)
  2. Split-plot 4x4Higgins1990Table5 (mixed-effects with (1|Tray))
  3. 2x2x2 within-subjectsElkinABC (mixed-effects with (1|S))
  4. 2x2 within-subjectsElkinAB (mixed-effects with (1|S))
  5. 2x2x2 mixed with Error()HigginsABC (repeated measures ANOVA)

Citations

If you use PyARTool in your research, please cite the original R package and methods papers:

Package:

Kay, M., Elkin, L. A., Higgins, J. J., and Wobbrock, J. O. (2025). ARTool: Aligned Rank Transform for Nonparametric Factorial ANOVAs. R package version 0.11.2. https://github.com/mjskay/ARTool. DOI: 10.5281/zenodo.594511.

ART procedure (used by art() and anova_art()):

Wobbrock, J. O., Findlater, L., Gergle, D., and Higgins, J. J. (2011). The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2011). Vancouver, British Columbia (May 7--12, 2011). New York: ACM Press, pp. 143--146. DOI: 10.1145/1978942.1978963.

ART-C procedure (used by art_con() and artlm_con()):

Elkin, L. A., Kay, M., Higgins, J. J., and Wobbrock, J. O. (2021). An Aligned Rank Transform Procedure for Multifactor Contrast Tests. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2021). Virtual Event (October 10--14, 2021). New York: ACM Press, pp. 754--768. DOI: 10.1145/3472749.3474784.


License

GPL-2.0-or-later, matching the original R ARTool package.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors