Releases: cmg777/expdpy
v0.4.19 — richer sample selector (category/range/period filters, persistence, cross-section)
Apps — a richer, more robust sample selector
- Filter by category, value range, and period. Pick factor variables and the categories to keep, continuous variables and a min–max range, and a dedicated Period slider over the time id. All filters combine (AND), with an always-visible Active sample summary (dataset · period · filters · "rows kept = n / N") and a Reset filters button on every page.
- Single-year → cross-section. Collapse the Period slider to one year and the app switches to cross-sectional mode — the panel / over-time views hide with a notice; the cross-sectional views keep working.
- Fixed the disappearing-selection bug. Filter options are computed from the pre-subset frame, so choosing a category no longer collapses the factor and resets your selection — selections persist across pages and every analysis updates as you filter.
- Missing-values view no longer looks blank. When a sample has no missing values (e.g. the unbalanced firms panel) it now says so and points to Panel structure for the structural gaps.
- Export reproduces the filters as code. The exported notebook ships the unfiltered working frame plus a runnable subset cell, so it rebuilds the exact analysis sample from the full data.
Full changelog: https://cmg777.github.io/expdpy/changelog.html
v0.4.18 — two-file (data + dictionary) app workflow, all datasets, Colab export
Apps — the two-file (data + dictionary) workflow
The three ExPdPy apps now revolve around two files: a data file and a data dictionary (df_def) — for both input and output.
- Upload your data and (optionally) its dictionary and every figure/table updates with the right labels and panel structure. Upload data only and the app auto-builds an editable dictionary in the sidebar — set the
entity/timerows to unlock the panel views, then Apply (and Download .csv). - New public helper
build_data_def(df)infers that dictionary (types + entity/time) from any frame, so the same workflow runs in notebooks:ex.set_labels(df, ex.build_data_def(df), set_panel=True). - The dictionary is applied throughout — labelled axes/legends/headers now render for the bundled datasets too, switching datasets resets stale selections, and views no longer silently disappear on upload (e.g. the missing-values map) because the panel is now declared.
- The dataset picker offers every bundled dataset — adds
ProductivityandBolivia (provinces). - Export ships the dictionary, and the notebook is Colab-ready — "Export notebook + data" writes
expdpy_data_def.csvalongside the sample, and the generated.ipynbis a Google Colab notebook (pinnedexpdpy==0.4.18install + one-time runtime restart + Plotly colab renderer) whose load cell callsset_labels(..., set_panel=True). The.pyscript stays a plain local-run script.
Full changelog: https://cmg777.github.io/expdpy/changelog.html
v0.4.17 — Learn case-study redesign
Completes the Explore → Analyze → Learn trilogy: the Learn surface is rebuilt as a single-pass tutorial — "the ideas behind the case study" — in lockstep across the website, the Colab notebook, and the Streamlit app.
Learn
- The Learn page is now a single-pass tutorial that opens by interpreting a real two-way fixed-effects Kuznets model (
.interpret()/.explain()), browses the full 27-topic concept index (list_topics) grouped by theme, then isolates each idea in a simulated sandbox where the truth is known: the within-transformation identity, why fixed effects matter, two inference classics, convergence, and the Kuznets wave. Everylearn_*sandbox appears once each, removing the old tour/gallery duplication and adding the previously-undemonstratedlearn_kuznets_waves. - The Learn Streamlit app mirrors the tutorial — the Concept sandboxes page grows from seven to nine tabs (adding
learn_sigma_convergenceandlearn_convergence_clubs), reordered to the case-study sequence; the explainers page browses all 27 topics. - The Learn Colab notebook is regenerated so the website, notebook and app present the same sequence.
v0.4.16 — Analyze case-study redesign
Redesigns the Analyze surface as a single-pass, pedagogical Kuznets case study — in lockstep across the website, the Colab notebook, and the Streamlit app — the sequel to the 0.4.15 Explore redesign.
Analyze
- The Analyze page is now a single-pass Kuznets case study introducing all 17
analyze_*functions once each, in the order an analyst works: fit a first model and add fixed effects → enrich the estimation → read the fitted model → stress-test the inference → choose the panel estimator → the flagship Kuznets-waves curve → a related income-convergence question → a causal DiD design. The four previously-undemonstrated functions (the convergence trio + Kuznets waves) now appear. - The Analyze Streamlit app mirrors the case study — reorganized into Regression, Post-estimation (new), Panel models, Kuznets waves, Convergence (β/σ/clubs consolidated), Event study & DiD — surfacing the seven functions the app did not previously render.
- The Analyze Colab notebook is regenerated so the website, notebook and app present the same sequence of tools.
v0.4.15 — Explore case-study redesign
Explore
- The Explore page is now a single-pass Kuznets case study. A complete redesign walks newcomers through all 21
explore_*functions (and theset_panel/resolve_panel/treat_outliersutilities) once each, in the order an analyst actually works: know the panel's skeleton (explore_panel_structure,explore_missing_values_plot,explore_value_heatmap) → describe variables → split within vs between variation (explore_xtsum_table,explore_spaghetti_plot) → trends (incl.explore_distribution_over_time) → compare groups → relationships and the N-shaped curve (incl.explore_scatter_plot_within_between) → dynamics (explore_transition_matrix,explore_within_persistence). The eight previously-undemonstrated panel-aware functions now appear in the narrative. - The Explore Streamlit app mirrors the case study. Its pages are reorganized to the same workflow — Overview & Data, Describe variables, Within & between, Trends, By group, Relationships, Dynamics — replacing the single catch-all "Panel structure" page. The Colab notebook is regenerated from the redesigned page.
v0.4.14 — drop inconsistent scatter LOESS band and the spaghetti legend
0.4.14 (2026-06-24)
Explore
explore_scatter_plotno longer draws a LOESS confidence band. The shaded band was an
unweighted bootstrap while the LOESS line is size-weighted underloess=2, so the line could
fall outside its own band; the smoother now shows just the line.explore_spaghetti_plotno longer shows a legend. It was a single "mean (…)" entry that ate
horizontal plot space; highlighted units still render in saturated colour.
v0.4.13 — panel-aware descriptives, histogram overlays, df_def-driven readability
0.4.13 (2026-06-24)
Explore
explore_descriptive_tableis now panel-aware. When atimecolumn is known (declared
viaset_panel/set_labels, or passed explicitly) each statistic is shown by period
— by default at the first and last period — under a spanning column header (e.g.Meanover
2015and2025); without a time dimension it falls back to one column per statistic. The
default statistics are now Mean, Std. dev., Median, Min., Max., rows are labelled from
the data dictionary, and the notes report the number of observations and any variable with
missing data. Breaking: the old length-8digitsvector is replaced by astats=
selection (any of the eight statistics), a scalar-or-mappingdigits=, and a newperiods=
argument; the result gains a tidy.by_periodframe (.dfstill carries all eight
pooled statistics).explore_histogramgains opt-in density overlays. Newkde=andnormal=flags draw a
Gaussian kernel-density estimate and/or a normal curve on the Density scale (off by default;
the Count/Density toggle hides them in Count view).- The trend / time-series plots no longer show a draggable range slider
(explore_trend_plot,explore_quantile_trend_plot,explore_spaghetti_plot).
Data dictionary (df_def) everywhere
- Every function now leans on the data dictionary for readable output, while still working
without it. Regression / estimation / CRE tables relabel their coefficient and
dependent-variable rows from the dictionary (the tidy.dfkeeps raw term names), and the
panel estimators (analyze_panel_table,analyze_hausman_test,analyze_cre_table) and DiD
views (analyze_event_study,analyze_panel_view) now resolveentity/time/unit
from the declared panel, so those arguments can be omitted afterset_panel/set_labels. - Examples across the library now illustrate the dictionary, opening with
df = ex.set_labels(load_kuznets(), load_kuznets_data_def(), set_panel=True).
v0.4.11 — Kuznets waves + convergence-clubs correctness fixes
This release supersedes the unreleased 0.4.10, so it brings both the new Kuznets-waves analysis and the convergence-clubs fixes to PyPI in one step.
Added — analyze_kuznets_waves (the extended Kuznets curve)
Tests the inequality–development relationship taken up to a quartic, gini = b₁g + b₂g² + b₃g³ + b₄g⁴ (g = log GDP per capita), under three panel estimators side by side: pooled OLS, the between estimator (the cross-country curve, a polynomial in the entity means) and the within estimator (two-way country + year fixed effects). Each is a cumulative-stepwise (csw) comparison table — linear, then quadratic, up to the full degree-order polynomial. Three figures tell the pooled → between → within story: a raw scatter with the pooled wave overlaid, and between/within Frisch–Waugh–Lovell partial-residual (component) plots that draw the fitted wave once optional controls (and the two-way fixed effects) are partialled out. The result exposes gt_pooled / gt_between / gt_within, the three figures, a per-estimator curvature summary (turning points, peak, top-order term), the fitted models, and .interpret() / .explain(). Ships with a learn_kuznets_waves() sandbox, a kuznets_waves concept explainer, a Streamlit Kuznets waves tab, and a Quarto → Colab notebook.
Fixed — analyze_convergence_clubs (Phillips-Sul clustering)
- The default
method="adjust"(Schnurbus et al. 2016) club refinement scored each candidate against the core group rather than the growing club and lacked a final joint-test fallback, so it could emit a group labelled a convergence "club" whose own log(t) t-statistic was below the threshold. It now scores against the accumulating club and falls back to the core when the refined club still fails (matching themethod="ps"branch). - A variable whose per-period cross-sectional mean is at/near zero (a demeaned, centered or growth series) made the relative transition
h = x/meanblow up toinfand silently corrupt every frame and figure — it now raises a clear error. - A constant / already-identical panel produced a non-finite global log(t) statistic that was silently reported as "divergent" (printing a literal
NaN) — it now raises a clear "not estimable" error. - The user-supplied
tcritthreshold is now threaded into the summary table'sconvergingcolumn, the table source-note and.interpret()(previously hardcoded to-1.65), and is exposed onConvergenceClubsResult.tcrit.
Full changelog: https://cmg777.github.io/expdpy/changelog.html
v0.4.9 — bundle the bolivia112_gdppc subnational convergence panel
Added
- Data: a new bundled
bolivia112_gdppcdataset — a real-world balanced panel of 112 Bolivian provinces (nested within 9 departments) over 1990-2024 with GDP per capita and its natural log; the empirical counterpart to the syntheticproductivitypanel, for the convergence workflows (analyze_beta_convergence/analyze_sigma_convergence/analyze_convergence_clubs) and general subnational exploration. Load withload_bolivia112_gdppc()/load_bolivia112_gdppc_data_def().
Source: Kummu, Kosonen & Masoumzadeh Sayyar, "Downscaled gridded global dataset for GDP per capita PPP over 1990-2022," Sci Data 12, 178 (2025), https://doi.org/10.1038/s41597-025-04487-x
v0.4.8 — convergence clubs (Phillips-Sul log t)
Added
analyze_convergence_clubs— a faithful Python port of the Phillips & Sul (2007/2009) log(t) convergence test and data-driven club clustering (the Statapsectapackage). Full workflow from one variable: per-unit Hodrick-Prescott trend (lambda=400), relative transition paths, the log(t) test, recursive clustering when global convergence is rejected, and adjacent-club merging. The log(t) statistic uses the Phillips-Sul scalar long-run-variance HAC (Andrews 1991 quadratic-spectral kernel, AR(1) bandwidth). Returns a tidy long frame, three figures (within-club averages, paths by club, per-club small multiples), a classification table, and anentity -> clubmembership frame.- Learn: a
learn_convergence_clubssandbox (recovers a planted club structure), anexplain("convergence_clubs")explainer, and a "Convergence clubs" page in the Analyze app. - Data: a new bundled
productivitydataset — a balanced 108-country x 25-year Penn World Table panel of log GDP per capita and log labor productivity (load_productivity).
Fixed
analyze_beta_convergence: a constant/collinear conditional control was silently dropped by the estimator, mislabelling the unconditional fit as conditional; it is now skipped with an explicit note.analyze_beta_convergence: NaN-blind duplicate-key de-duplication could evict a valid observation; it now keeps the first non-missing value (matching the sigma/clubs paths).