Skip to content

Introduce Enhanced Q-Q Plot Support for Distribution GOF Diagnostics#111

Merged
Oscar-Gil-Data merged 2 commits into
mainfrom
qq_utility_LS
Dec 14, 2025
Merged

Introduce Enhanced Q-Q Plot Support for Distribution GOF Diagnostics#111
Oscar-Gil-Data merged 2 commits into
mainfrom
qq_utility_LS

Conversation

@lshpaner
Copy link
Copy Markdown
Collaborator

@lshpaner lshpaner commented Dec 14, 2025

The goal is to make distributional goodness-of-fit diagnostics more expressive, statistically correct, and easier to interpret when comparing multiple candidate distributions.


Key Features

1. Theoretical and Empirical Q-Q Plots

  • Theoretical Q-Q: compares sample quantiles against a fitted parametric distribution using SciPy.
  • Empirical Q-Q: compares matched quantiles between the sample and a user-provided reference dataset.
  • Behavior is explicitly controlled via the qq_type parameter.

2. Optional Reference Line Control

  • Reference lines are drawn only for theoretical Q-Q plots, where a 1:1 identity line is statistically meaningful.
  • Users can explicitly toggle reference lines using show_reference=True|False.
  • Reference lines are labeled per distribution to avoid ambiguous legends when plotting multiple fits.

3. Multi-Distribution Overlay Support

  • Multiple candidate distributions can be overlaid on the same Q-Q axis.
  • Colors are resolved consistently via an optional palette mapping.
  • Legends clearly distinguish between data points and reference lines.

4. Axis Scaling and Limits

  • Optional xlim and ylim parameters allow users to enforce consistent axis ranges across distributions.
  • Axis limits are applied after plotting to avoid clipping during rendering.
  • Supports linear and log scaling where appropriate.

5. Defensive Validation

  • Explicit checks for minimum sample size.
  • Strict validation for empirical Q-Q plots requiring valid reference_data.
  • Clear error messages for invalid configurations.

API Additions

New or updated parameters in distribution_gof_plots():

  • qq_type: {"theoretical", "empirical"}
  • reference_data: np.ndarray | None
  • show_reference: bool
  • xlim: tuple[float, float] | None
  • ylim: tuple[float, float] | None
  • palette: dict[str, str] | None

Internal Refactoring

  • Introduced a dedicated _qq_plot() helper with a focused responsibility:
    • quantile computation
    • scatter rendering
    • optional reference line
    • formatting and scaling
  • Added clear section headers to group plotting utilities by responsibility.
  • Improved readability and maintainability of plots.py by making helper intent explicit.

Design Notes

  • Reference lines are intentionally disabled for empirical Q-Q plots, since there is no theoretical identity relationship between two empirical samples.
  • Reference lines are colored and labeled per distribution to avoid legend collisions when overlaying multiple fits.
  • The implementation avoids hidden defaults and favors explicit user control.

Backward Compatibility

  • Existing usage of distribution_gof_plots() with plot_types="qq" continues to work with default behavior.
  • New functionality is opt-in via additional parameters.

Example

distribution_gof_plots(
    df,
    var="age",
    dist=["norm", "lognorm", "gamma"],
    plot_types="qq",
    qq_type="theoretical",
    show_reference=True,
    palette={
        "norm": "tab:blue",
        "lognorm": "tab:orange",
        "gamma": "tab:green",
    },
    xlim=(5, 100),
    ylim=(5, 100),
)
image

@Oscar-Gil-Data Oscar-Gil-Data merged commit 140aee2 into main Dec 14, 2025
@Oscar-Gil-Data Oscar-Gil-Data deleted the qq_utility_LS branch December 14, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants