Skip to content

Releases: cmg777/expdpy

v0.4.19 — richer sample selector (category/range/period filters, persistence, cross-section)

26 Jun 04:12
b626afc

Choose a tag to compare

Apps — a richer, more robust sample selector

  • Filter by category, value range, and period. Pick factor variables and the categories to keep, continuous variables and a min–max range, and a dedicated Period slider over the time id. All filters combine (AND), with an always-visible Active sample summary (dataset · period · filters · "rows kept = n / N") and a Reset filters button on every page.
  • Single-year → cross-section. Collapse the Period slider to one year and the app switches to cross-sectional mode — the panel / over-time views hide with a notice; the cross-sectional views keep working.
  • Fixed the disappearing-selection bug. Filter options are computed from the pre-subset frame, so choosing a category no longer collapses the factor and resets your selection — selections persist across pages and every analysis updates as you filter.
  • Missing-values view no longer looks blank. When a sample has no missing values (e.g. the unbalanced firms panel) it now says so and points to Panel structure for the structural gaps.
  • Export reproduces the filters as code. The exported notebook ships the unfiltered working frame plus a runnable subset cell, so it rebuilds the exact analysis sample from the full data.

Full changelog: https://cmg777.github.io/expdpy/changelog.html

v0.4.18 — two-file (data + dictionary) app workflow, all datasets, Colab export

26 Jun 02:42
3c7fb94

Choose a tag to compare

Apps — the two-file (data + dictionary) workflow

The three ExPdPy apps now revolve around two files: a data file and a data dictionary (df_def) — for both input and output.

  • Upload your data and (optionally) its dictionary and every figure/table updates with the right labels and panel structure. Upload data only and the app auto-builds an editable dictionary in the sidebar — set the entity/time rows to unlock the panel views, then Apply (and Download .csv).
  • New public helper build_data_def(df) infers that dictionary (types + entity/time) from any frame, so the same workflow runs in notebooks: ex.set_labels(df, ex.build_data_def(df), set_panel=True).
  • The dictionary is applied throughout — labelled axes/legends/headers now render for the bundled datasets too, switching datasets resets stale selections, and views no longer silently disappear on upload (e.g. the missing-values map) because the panel is now declared.
  • The dataset picker offers every bundled dataset — adds Productivity and Bolivia (provinces).
  • Export ships the dictionary, and the notebook is Colab-ready — "Export notebook + data" writes expdpy_data_def.csv alongside the sample, and the generated .ipynb is a Google Colab notebook (pinned expdpy==0.4.18 install + one-time runtime restart + Plotly colab renderer) whose load cell calls set_labels(..., set_panel=True). The .py script stays a plain local-run script.

Full changelog: https://cmg777.github.io/expdpy/changelog.html

v0.4.17 — Learn case-study redesign

25 Jun 05:05
5611976

Choose a tag to compare

Completes the Explore → Analyze → Learn trilogy: the Learn surface is rebuilt as a single-pass tutorial — "the ideas behind the case study" — in lockstep across the website, the Colab notebook, and the Streamlit app.

Learn

  • The Learn page is now a single-pass tutorial that opens by interpreting a real two-way fixed-effects Kuznets model (.interpret() / .explain()), browses the full 27-topic concept index (list_topics) grouped by theme, then isolates each idea in a simulated sandbox where the truth is known: the within-transformation identity, why fixed effects matter, two inference classics, convergence, and the Kuznets wave. Every learn_* sandbox appears once each, removing the old tour/gallery duplication and adding the previously-undemonstrated learn_kuznets_waves.
  • The Learn Streamlit app mirrors the tutorial — the Concept sandboxes page grows from seven to nine tabs (adding learn_sigma_convergence and learn_convergence_clubs), reordered to the case-study sequence; the explainers page browses all 27 topics.
  • The Learn Colab notebook is regenerated so the website, notebook and app present the same sequence.

v0.4.16 — Analyze case-study redesign

25 Jun 04:20
ba3a259

Choose a tag to compare

Redesigns the Analyze surface as a single-pass, pedagogical Kuznets case study — in lockstep across the website, the Colab notebook, and the Streamlit app — the sequel to the 0.4.15 Explore redesign.

Analyze

  • The Analyze page is now a single-pass Kuznets case study introducing all 17 analyze_* functions once each, in the order an analyst works: fit a first model and add fixed effects → enrich the estimation → read the fitted model → stress-test the inference → choose the panel estimator → the flagship Kuznets-waves curve → a related income-convergence question → a causal DiD design. The four previously-undemonstrated functions (the convergence trio + Kuznets waves) now appear.
  • The Analyze Streamlit app mirrors the case study — reorganized into Regression, Post-estimation (new), Panel models, Kuznets waves, Convergence (β/σ/clubs consolidated), Event study & DiD — surfacing the seven functions the app did not previously render.
  • The Analyze Colab notebook is regenerated so the website, notebook and app present the same sequence of tools.

v0.4.15 — Explore case-study redesign

25 Jun 02:02
08d30e8

Choose a tag to compare

Explore

  • The Explore page is now a single-pass Kuznets case study. A complete redesign walks newcomers through all 21 explore_* functions (and the set_panel / resolve_panel / treat_outliers utilities) once each, in the order an analyst actually works: know the panel's skeleton (explore_panel_structure, explore_missing_values_plot, explore_value_heatmap) → describe variables → split within vs between variation (explore_xtsum_table, explore_spaghetti_plot) → trends (incl. explore_distribution_over_time) → compare groups → relationships and the N-shaped curve (incl. explore_scatter_plot_within_between) → dynamics (explore_transition_matrix, explore_within_persistence). The eight previously-undemonstrated panel-aware functions now appear in the narrative.
  • The Explore Streamlit app mirrors the case study. Its pages are reorganized to the same workflow — Overview & Data, Describe variables, Within & between, Trends, By group, Relationships, Dynamics — replacing the single catch-all "Panel structure" page. The Colab notebook is regenerated from the redesigned page.

v0.4.14 — drop inconsistent scatter LOESS band and the spaghetti legend

24 Jun 11:17

Choose a tag to compare

0.4.14 (2026-06-24)

Explore

  • explore_scatter_plot no longer draws a LOESS confidence band. The shaded band was an
    unweighted bootstrap while the LOESS line is size-weighted under loess=2, so the line could
    fall outside its own band; the smoother now shows just the line.
  • explore_spaghetti_plot no longer shows a legend. It was a single "mean (…)" entry that ate
    horizontal plot space; highlighted units still render in saturated colour.

v0.4.13 — panel-aware descriptives, histogram overlays, df_def-driven readability

24 Jun 10:01

Choose a tag to compare

0.4.13 (2026-06-24)

Explore

  • explore_descriptive_table is now panel-aware. When a time column is known (declared
    via set_panel / set_labels, or passed explicitly) each statistic is shown by period
    — by default at the first and last period — under a spanning column header (e.g. Mean over
    2015 and 2025); without a time dimension it falls back to one column per statistic. The
    default statistics are now Mean, Std. dev., Median, Min., Max., rows are labelled from
    the data dictionary, and the notes report the number of observations and any variable with
    missing data. Breaking: the old length-8 digits vector is replaced by a stats=
    selection (any of the eight statistics), a scalar-or-mapping digits=, and a new periods=
    argument; the result gains a tidy .by_period frame (.df still carries all eight
    pooled statistics).
  • explore_histogram gains opt-in density overlays. New kde= and normal= flags draw a
    Gaussian kernel-density estimate and/or a normal curve on the Density scale (off by default;
    the Count/Density toggle hides them in Count view).
  • The trend / time-series plots no longer show a draggable range slider
    (explore_trend_plot, explore_quantile_trend_plot, explore_spaghetti_plot).

Data dictionary (df_def) everywhere

  • Every function now leans on the data dictionary for readable output, while still working
    without it.
    Regression / estimation / CRE tables relabel their coefficient and
    dependent-variable rows from the dictionary (the tidy .df keeps raw term names), and the
    panel estimators (analyze_panel_table, analyze_hausman_test, analyze_cre_table) and DiD
    views (analyze_event_study, analyze_panel_view) now resolve entity / time / unit
    from the declared panel, so those arguments can be omitted after set_panel / set_labels.
  • Examples across the library now illustrate the dictionary, opening with
    df = ex.set_labels(load_kuznets(), load_kuznets_data_def(), set_panel=True).

v0.4.11 — Kuznets waves + convergence-clubs correctness fixes

23 Jun 03:46
d890c1a

Choose a tag to compare

This release supersedes the unreleased 0.4.10, so it brings both the new Kuznets-waves analysis and the convergence-clubs fixes to PyPI in one step.

Added — analyze_kuznets_waves (the extended Kuznets curve)

Tests the inequality–development relationship taken up to a quartic, gini = b₁g + b₂g² + b₃g³ + b₄g⁴ (g = log GDP per capita), under three panel estimators side by side: pooled OLS, the between estimator (the cross-country curve, a polynomial in the entity means) and the within estimator (two-way country + year fixed effects). Each is a cumulative-stepwise (csw) comparison table — linear, then quadratic, up to the full degree-order polynomial. Three figures tell the pooled → between → within story: a raw scatter with the pooled wave overlaid, and between/within Frisch–Waugh–Lovell partial-residual (component) plots that draw the fitted wave once optional controls (and the two-way fixed effects) are partialled out. The result exposes gt_pooled / gt_between / gt_within, the three figures, a per-estimator curvature summary (turning points, peak, top-order term), the fitted models, and .interpret() / .explain(). Ships with a learn_kuznets_waves() sandbox, a kuznets_waves concept explainer, a Streamlit Kuznets waves tab, and a Quarto → Colab notebook.

Fixed — analyze_convergence_clubs (Phillips-Sul clustering)

  • The default method="adjust" (Schnurbus et al. 2016) club refinement scored each candidate against the core group rather than the growing club and lacked a final joint-test fallback, so it could emit a group labelled a convergence "club" whose own log(t) t-statistic was below the threshold. It now scores against the accumulating club and falls back to the core when the refined club still fails (matching the method="ps" branch).
  • A variable whose per-period cross-sectional mean is at/near zero (a demeaned, centered or growth series) made the relative transition h = x/mean blow up to inf and silently corrupt every frame and figure — it now raises a clear error.
  • A constant / already-identical panel produced a non-finite global log(t) statistic that was silently reported as "divergent" (printing a literal NaN) — it now raises a clear "not estimable" error.
  • The user-supplied tcrit threshold is now threaded into the summary table's converging column, the table source-note and .interpret() (previously hardcoded to -1.65), and is exposed on ConvergenceClubsResult.tcrit.

Full changelog: https://cmg777.github.io/expdpy/changelog.html

v0.4.9 — bundle the bolivia112_gdppc subnational convergence panel

22 Jun 21:51

Choose a tag to compare

Added

  • Data: a new bundled bolivia112_gdppc dataset — a real-world balanced panel of 112 Bolivian provinces (nested within 9 departments) over 1990-2024 with GDP per capita and its natural log; the empirical counterpart to the synthetic productivity panel, for the convergence workflows (analyze_beta_convergence / analyze_sigma_convergence / analyze_convergence_clubs) and general subnational exploration. Load with load_bolivia112_gdppc() / load_bolivia112_gdppc_data_def().

Source: Kummu, Kosonen & Masoumzadeh Sayyar, "Downscaled gridded global dataset for GDP per capita PPP over 1990-2022," Sci Data 12, 178 (2025), https://doi.org/10.1038/s41597-025-04487-x

v0.4.8 — convergence clubs (Phillips-Sul log t)

22 Jun 06:07

Choose a tag to compare

Added

  • analyze_convergence_clubs — a faithful Python port of the Phillips & Sul (2007/2009) log(t) convergence test and data-driven club clustering (the Stata psecta package). Full workflow from one variable: per-unit Hodrick-Prescott trend (lambda=400), relative transition paths, the log(t) test, recursive clustering when global convergence is rejected, and adjacent-club merging. The log(t) statistic uses the Phillips-Sul scalar long-run-variance HAC (Andrews 1991 quadratic-spectral kernel, AR(1) bandwidth). Returns a tidy long frame, three figures (within-club averages, paths by club, per-club small multiples), a classification table, and an entity -> club membership frame.
  • Learn: a learn_convergence_clubs sandbox (recovers a planted club structure), an explain("convergence_clubs") explainer, and a "Convergence clubs" page in the Analyze app.
  • Data: a new bundled productivity dataset — a balanced 108-country x 25-year Penn World Table panel of log GDP per capita and log labor productivity (load_productivity).

Fixed

  • analyze_beta_convergence: a constant/collinear conditional control was silently dropped by the estimator, mislabelling the unconditional fit as conditional; it is now skipped with an explicit note.
  • analyze_beta_convergence: NaN-blind duplicate-key de-duplication could evict a valid observation; it now keeps the first non-missing value (matching the sigma/clubs paths).