https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA11319

Problems with the standard Fixed Effects approaches:
- Incidental parameter bias, specially in short panels
- Constant unobserved heterogeneity over time

This paper proposes a framework that allows for clustered time patterns of
unobserved heterogeneity that are common within groups of individuals.
- The group-specific time patterns and individual group membership are left unrestricted, and are estimated from the data.  In particular, as in fixed-effects, our time-varying specification allows for general forms of covariates endogeneity.
-The main assumption is that the number of distinct individual time patterns of unobserved heterogeneity is relatively small.

\begin{equation}
y_{it} =   x'_{it}\theta+\alpha_{g_{i}t}+v_{it}, \quad i=1,\ldots,N,t=1,\ldots,T, g_{i} \in \{1,\ldots,G\}
\end{equation}

where the covariates $x_{it}$ are contemporaneously uncorrelated with $v_{it}$, but may
be arbitrarily correlated with the group-specific unobservables $\alpha_{g_{i}t}$.

Units in the same group share the same time profile $\alpha_{gt}$ (e.g., all $i$ such that $g_{i} = 1$ share the profile $\alpha_{1t}$). The number of groups $G$ is to be set or estimated by the researcher. 

Our estimator, which we will refer to as 'grouped fixed-effects' (GFE), is based on an optimal grouping of the $N$ cross-sectional units, according to a least squares criterion. Units whose time profiles of outcomes—net of the ef-
fect of covariates—are most similar are grouped together in estimation.In the
absence of covariates in model (1), the estimation problem coincides with the
standard minimum sum-of-squares partitioning problem, and a simple com-
putational method is given by the “kmeans” algorithm (Forgy (1965), Steinley
(2006)). 

We derive the statistical properties of the grouped fixed-effects estimator in
an asymptotic where N and T tend to infinity simultaneously. In our frame-
work, N can grow substantially faster than T, in contrast with models with
unit-specific fixed-effects. While fixed-effects estimators generally suffer from
an O(1/T) bias as N/T tends to a constant (Arellano and Hahn (2007)), we show that the GFE estimator is consistent and asymptotically normal as N/T ν tends to zero for some ν > 0, provided groups are well separated and errors vit
satisfy suitable tail and dependence conditions.

As the two dimensions of the panel diverge, the GFE estimator is asymp-
totically equivalent to the infeasible least squares estimator with known popu-
lation groups. As a consequence, in a large-T perspective standard errors are
unaffected by the fact that group membership has been estimated. In short
panels, however, group misclassification may contribute to the finite-sample
dispersion of the estimator. For this reason, we also study the properties of
the GFE estimator for fixed T as N tends to infinity. In a Monte Carlo exer-
cise calibrated to the empirical application, we provide evidence that using the
GFE estimator in combination with an estimate of its fixed-T variance yields
reliable inference for the population parameters.

We use our approach to study the effect of income on democracy in a panel
of countries that spans the last part of the twentieth century. In an influen-
tial paper, Acemoglu, Johnson, Robinson, and Yared (2008) found that the
positive association between income and democracy disappears when control-
ling for additive country- and time-effects. They interpreted the country fixed-
effects as reflecting long-run, historical factors that have shaped the political
and economic development of countries.
In the context of this application, the grouped fixed-effects model allows for
time-varying unobservables in a period that is characterized by a large number
of transitions to democracy, and it is well suited to deal with the short length
of the panel (T = 7). Grouped patterns are also consistent with the empir-
ical observation that regime types and transitions tend to cluster in time and
space (e.g., Gleditsch and Ward (2006), Ahlquist and Wibbels (2012)). An early
conceptual framework was laid out in Huntington’s (1991) work on the “third
wave of democracy,” which argues that international and regional factors—
such as the influence of the Catholic Church or the European Union—may
have induced grouped patterns of democratization. We find robust evidence of
heterogeneous, group-specific paths of democratization in the data.


THE GROUPED FIXED-EFFECTS ESTIMATOR

\begin{equation}
(\hat{\theta}, \hat{\alpha},\hat{\gamma}) = \underset{(\theta,\alpha,\gamma) \in \Theta \times \mathcal{A}^{GT} \times \Gamma_{G}}{argmin} \sum\limits^{N}_{i=1} \sum\limits^{T}_{t=1} (y_{it}-x'_{it}\theta-\alpha_{g_{i}t})^2
\end{equation}

For given values of $\theta$ and $\alpha$, the optimal group assignment for each individual unit is 

\begin{equation}
    \hat{g}_{i}(\theta,\alpha) = \underset{g \in \{ 1,\ldots, G\}}{argmin} \sum\limits^{T}{t=1}(y_{it}-x'_{it}\theta-\alpha_{gt})^2
\end{equation}

Extension 1: Unit-Specific Heterogeneity
\begin{equation}
y_{it} =   x'_{it}\theta+\alpha_{g_{i}t}+\eta_{i}+v_{it}
\end{equation}
Extension 2: Heterogeneous Coefficient
\begin{equation}
y_{it} =   x'_{it}\theta_{g_{i}}+\alpha_{g_{i}t}+\eta_{i}+v_{it}
\end{equation}

Algorithm 1:
For smaller problems, pick initial value randomly
For larger scale problems, use k-means. 

exact computations and 2 algorithm performances are in Supplemental material. 

Asymptotic properties:

Nevertheless, (15) implies that the
group misclassification probability tends to zero at an exponential rate, which
intuitively means that the incidental parameter problem vanishes very rapidly
as T increases.
Extending the analysis of model (14) to a more general setup raises two main
challenges. First, consistency is not straightforward to establish since, as N and
T tend to infinity, both the number of group membership variables gi and the
number of group-specific time effects αgt tend to infinity, causing an inciden-
tal parameter problem in both dimensions.9 Second, the argument leading to
the exponential rate of convergence in (15) relies on i.i.d. normal errors. In or-
der to bound tail probabilities under more general conditions, approximations
based on a central limit theorem are not sufficient.


Theorem 1: Consistency is established (proof in A)

Theorem 2: Asymptotic distribution: (proof in B), Corollary 1

Under the conditions of Corollary 1, the GFE estimator of θ0 is root-NT
consistent and asymptotically normal in an asymptotic where T can increase
polynomially more slowly than N. The GFE estimates of group-specific time
effects are root-N consistent and asymptotically normal under the same con-
ditions. Moreover, the estimated group membership indicators are uniformly
consistent for the population ones as N/T ν → 0 for some ν > 0, in the sense
that Pr(supi∈{1�����N}|�gi − g0
i | > 0) → 0. As a result:15
These properties contrast with those of estimators that allow for unit-specific
fixed-effects in combination with time fixed-effects. Given the interactive struc-
ture of model (12), “interactive fixed-effects” estimators are particularly rele-
vant in our context. The interactive fixed-effects estimator of θ0, as fixed-effects
estimators in other settings, has a O(1/T) bias in general when N/T → c > 0;
see Theorem 3 in Bai (2009). In addition, the conditions for root-N consis-
tency of the time-varying factors require that N/T 2 → 0; see Theorem 1 in
Bai (2003).16 Lastly, when using interactive fixed-effects, the components α0
g0
i t
are estimated at a rate of min(
√
N�
√
T); see Theorem 3 in Bai (2003). These
properties suggest that, when a grouped structure is a reasonable assumption,
GFE may be better suited than interactive fixed-effects in panels of moderate
length. Simulations calibrated to the empirical application, summarized below,
are in line with this theoretical discussion.











Inference
The large-N�T asymptotic analysis above provides conditions under which
group membership estimation does not affect inference. In the Supplementa

Material, we discuss various estimators of the matrices defined in Assump-
tion 3 that allow to conduct feasible inference under those conditions.
When T is kept fixed as N tends to infinity, in contrast, estimation of group
membership matters for inference. In the Supplemental Material, we extend
previous results by Pollard (1981, 1982) to allow for covariates, and derive an
analytical formula for the fixed-T variance of the GFE estimator. In this alter-
native asymptotic framework, the variance reflects the additional contribution
of observations that are at the margin between two groups, so that an infinites-
imal change in parameter values may entail reclassifying these observations.
A fixed-T asymptotic analysis is not directly informative to perform valid in-
ference for the population parameters since, for fixed T, the GFE estimator
(�θ��α) is root-N consistent and asymptotically normal for a pseudo-true value
(θ�α). This pseudo-true value, which minimizes an expected within-group sum
of squared residuals, does not coincide with the true parameter value in gen-
eral, but the difference between the two vanishes as T increases. A practical
possibility to account for the effect of group membership estimation on in-
ference is to use the GFE estimator in combination with a fixed-T consistent
estimator of its variance. In the Supplemental Material, we propose two such
estimators: an estimator of the analytical variance formula, and a bootstrap-
based estimator.


Choice of the Number of Groups
Following Bai and Ng (2002), we study in the Supplemental Material how
to estimate the number of groups G0 using information criteria. In addition,
to explore the impact of misspecifying the number of groups, we analytically
study a simple model with time-invariant group-specific effects, where the true
number of groups is G0 = 1 but the researcher postulates G = 2 (so α0
1 = α0
2).
In this example, common parameter estimates are consistent for fixed T, but
group-specific effects suffer from large biases. Moreover, specifying G < G0
generally leads to biases on common parameters and group-specific effects.
The choice of G, and the related issue of how inference on the model’s pa-
rameters is affected by this choice, are difficult questions that deserve further
investigation.


Simulation Evidence
In order to assess the finite-sample performance of the GFE estimator, we
conduct several exercises on simulated data. The designs mimic the cross-
country data set that we use in the empirical application (N = 90, T = 7). We
find small probabilities of group misclassification (less than 10% when G = 3
and G = 5), and moderate biases on common parameters. Moreover, when
comparing the GFE estimator to the interactive fixed-effects estimator on a
simulated data set with grouped heterogeneity, we find that the latter has large
biases and imprecisely estimated components of unobserved heterogeneity. Fi-
nally, we compare different inference methods, and conclude that estimators
of the fixed-T variance lead to more reliable inference for the population pa-
rameters. Details and additional exercises can be found in the Supplementary
Material.



4. APPLICATION: INCOME AND (WAVES OF) DEMOCRACY

The left and middle panels show that this pattern is mostly
driven by a decrease in the coefficient of lagged democracy. This is consistent
with unobserved country heterogeneity being positively correlated with lagged
democracy, causing an upward bias in OLS.


 Grouped fixed-effects (GFE) offers a flexible yet parsimonious approach to
model unobserved heterogeneity. The approach delivers estimates of common
regression parameters, together with interpretable estimates of group-specific
time patterns and group membership. The framework allows for strictly ex-
ogenous covariates and lagged outcomes. It also easily accommodates unit-
specific fixed-effects in addition to the time-varying grouped patterns, and
grouped heterogeneity in coefficients. Importantly, the relationship between
group membership and observed covariates is left unrestricted.
The GFE approach should be useful in applications where time-varying
grouped effects may be present in the data. As a first example, the empirical
analysis of the evolution of democracy shows evidence of a clustering of po-
litical regimes and transitions. More generally, GFE should be well-suited in
difference-in-difference designs, as a way to relax parallel trend assumptions.
Other potential applications include models of social interactions and spatial
dependence where the reference groups or the spatial weights matrix are esti-
mated from the panel data.
The extension to nonlinear models is a natural next step. While it is possible
to define GFE estimators in more general models (see, e.g., equation (9)), the
analysis raises statistical challenges. One area of applications is static or dy-
namic discrete choice modeling, where a discrete specification of unobserved heterogeneity may be appealing (Kasahara and Shimotsu (2009), Browning
and Carro (2014)). See Saggio (2012) for a first attempt in this direction.
Lastly, another interesting extension is to relax the assumption that there is a
finite number of well-separated groups in the population. As an alternative ap-
proach, one could view the grouped model as an approximation to the underly-
ing data generating process, and characterize the statistical properties of GFE
as the number of groups G increases with the two dimensions of the panel.