# <center>Generalized Method of Moments</center>
### <center>Alastair R. Hall</center>

## Introduction
<div style="font-family:'Optima'">
1. **Generalized Method of Moments in Econometrics**  
  Generalized Method of Moment is very important in modern econometrics. Apart from GGM, MLE (first introduced in the early 20th century) has been the best available estimator within the classical statistics paradigm. However, depending on the probability distribution of the data (known as likelihood function), it can sometimes be restraint. There are some examples as follows:  
  - Sensitivity of statistical properties to the distributional assumption:  
  <span style="font-family:'Optima-Italic'; color: #666666">The desirable statistical properties of MLE are only attained if the distribution is correctly specified. Unfortunately, economic theory rarely provides the complete specification of the probability distribution of the data. One solution is to choose a distribution arbitrarily. However, unless this guess coincides with the truth, the resulting estimator is no longer optimal and, worse still, its use may lead to biased inferences.  </span>
  - Computational burden:  
  <span style="font-family:'Optima-Italic'; color: #666666">Maximum Likelihood estimation would be computationally very burdensome. In some cases, the economic model coincides with the joint probability distribution of the data but the implied likelihood function is extremely difficult to evaluate numerically. In other cases, the economic model only involves some aspects of the probability distribution and the completion of the specification introduces many additional parameters which must also be estimated.</span>  

  In contract, the GMM framework provides a computationally convenient method of performing inference in these models without the need to specify the likelihood function.

2. **Population Moment Conditions and the Statistical Antecedents of GMM**  
  Population moment was originally used in statistics to denote the expectation of the polynomial powers of a random variable. So if $v_t$ is a discrete random variable with probability mass function $P(v_t=v)$ defined on a sample space $\mathcal{V}$ then its $r^{th}$ population moment is given by  

  $$
  E[v_t^r]=\underset{\{v\in\mathcal{V}\}}{\sum}v^rP(v_t=v)=\nu_r
  $$  

  where the summation is over all values in $\mathcal{V}$ and $r$ is a positive integer. If $v_t$ is a continuous random variable with probability density function $p(v)$ then its $r^{th}$ moment is given by  
  
  $$
  E[v_t^r]=\int_{-\infty}^{\infty}v^rp(v)dv=\nu_r
  $$  
  
  From above we know that the population mean is the first population moment $\nu_1$, and the population variance is $\nu_2-\nu_1^2$. Now, consider a distribution from Pearson family with the population mean, $\mu_0$, and the population variance, $\sigma_0^2$. They satisfy the relation  
  
  $$
  \begin{aligned}
      E[v_t]-\mu_0&=0\\
      E[v_t^2]-(\sigma_0^2+\mu_0^2)&=0
  \end{aligned}\tag{1.1}
  $$  
  
  which are now called the [population moment conditions](#population moment condition).  
  
  1. **Method of Moments**  
    Pearson's method involves estimating ($\mu_0,\sigma_0^2$) by the values ($\hat\mu_T,\hat\sigma_T^2$) which satisfy the analogous sample moment conditions indexed by the sample size $T$. Therefore, ($\hat\mu_0,\hat\sigma_0^2$) are the solutions to  
  
    $$
    \begin{aligned}
        T^{-1}\sum_{t=1}^Tv_t-\hat\mu_T&=0\\
        T^{-1}\sum_{t=1}^Tv_t^2-(\hat\sigma_T^2+\hat\mu_T^2)&=0
    \end{aligned}
    $$  
      
    and thus with rearrangement, it follows that  
    
    $$
    \begin{aligned}
        \hat\mu_T&=T^{-1}\sum_{t=1}^Tv_t\\
        \hat\sigma_T^2&=T^{-1}\sum_{t=1}^T(v_t-\hat\mu_T)^2
    \end{aligned}\tag{1.2}
    $$
    
    which was called by Pearson the "Method of Moments" for obvious reasons. However, as all the higher moments depend on ($\mu_0,\sigma_0^2$), this technique could have been applied equally well to the $3^{rd}$ and $4^{th}$ moments, say, of the distribution. In these cases, the resulting estimators of ($\mu_0,\sigma_0^2$) would be different.  
  
    Let's consider another weakness inherent in the Method of Moments framework. Suppose we now derive estimation of ($\mu_0,\sigma_0^2$) on the first three moments of $v_t$, that is  
  
    $$
    \begin{aligned}
        E[v_t]-\mu_0&=0\\
        E[v_t^2]-(\sigma_0^2+\mu_0^2)&=0\\
        E[v_t^3]-3E[v_t^2]\mu_0+3E[v_t]\mu_0^2-\mu_0^3&=0
    \end{aligned}\tag{1.3}
    $$
    
    In this case, we form a system of three equations in two unknowns, and such a system typically has no solution, which means the Method of Moments is infeasible. Clearly, this problem is not specific to this example. Some modification is needed in order to produce estimates of $p$ parameters based on more than $p$ population moment conditions. This bring us to the second important statistical antecedent of GMM, namely the method of Minimum Chi-Square.
    
  2. **Minimum Chi-Square**  
    The method of Minimum Chi-Square was originally proposed to facilitate inference about whether or not an observed sample was generated from a particular distribution, but the basic idea can be applied to estimation in a wide variety of problems including the estimation of ($\mu_0,\sigma_0^2$) based on the aforementioned equations. However, in's instructive to introduce the method in the context of the specific example considered by Neyman and Pearson (1928). They considered the particular case in which a researcher wishes to model the probability that the outcome of an experiment lies in one of $k$ mutually exclusive and exhausting groups. If $p_i$ is used to denote the probability the outcome lies in the $i^{th}$ group then the null hypothesis of interest is that
    
    $$
    p_i=h(i,\theta_0)\tag{1.4}
    $$
    
    where $h(.)$ is some specified functional form indexed by an unknown paramete vector $\theta_0$. The question was how to test this hypothesis. Karl Person (1900) had shown that inference could be based on the goodness of fit statistic
    
    $$
    GF_T(\theta_0)=\sum_{i=1}^k\dfrac{[T_i-Th(i;\theta_0)]^2}{T_i}\tag{1.5}
    $$
    
    where $T_i$ is the frequency of outcomes in the $i^{th}$ group in a sample of size $T$. Pearson (1900) showed that this statistic was approximately distributed $\chi_{k-1}^2$ under the null hypothesis. Neyman and Pearson recognized that if $\theta_0$ is unknown then the goodness of fit statistic can provide the basis for estimation of $\theta_0$ as well as inference about the null hypothsis. Their idea was to estimate $\theta_0$ by $\hat\theta_T$, the value of $\theta$ which minimizes the goodness of fit statistic. This was later referred as a "Minimum Chi-Square estimator". Furthermore, they showed that under the null hypothesis, $GF_T(\hat\theta_0)$ is approximately distributed $\chi_{k-1-p}^2$ where $p$ is the dimension of $\theta_0$.
    
    In order to develop the connection between this example and the ($\mu_0,\sigma_0^2$) estimation, it is necessary to rewrite the goodness of fit statistic as
    
    $$
    GF_T(\theta_0)=T\sum_{i=1}^k\dfrac{[\hat p_i-h(i;\theta_0)]^2}{\hat p_i}\tag{1.6}
    $$
    
    where $\hat p_i=T_i/T$, the relative frequency in the sample of outcomes in the $i^{th}$ group. Now consider the set of indicator variables $\{D_t(i);i=1,2,\ldots,k;t=1,2,\ldots,T\}$ which take the value one if the $t^{th}$ outcome of teh experiment lies in the $i^{th}$ group and takes the value zero otherwise. Under the null hypothesis it follows that $\mathbb{P}[D_t(i)=1]=h(i;\theta_0)$ and hence that $E[D_t(i)]=h(i;\theta_0)$. So, the null hypothesis implies the following vector of $k$ population moment conditions
    
    $$
    E
    \begin{bmatrix}
    D_t(1)-h(1;\theta_0)\\
    D_t(2)-h(2;\theta_0)\\
    .\\
    .\\
    D_t(k)-h(k;\theta_0)
    \end{bmatrix}
    =0\tag{1.7}
    $$
    
    Since $\sum_{i=1}^k\{D_t(i)-h(i;\theta_0)\}=0$ by definition, only $k-1$ population moment conditions actually provide unique information about $\theta_0$. However, we retain all $k$ to elicit the connection with the goodness of fit statistic. If $k-1\ge p$, which we have assumed implicitly all along, then these population moment equations can be used to estimate $\theta_0$. The sample analogs to (1.7) are given by
    
    $$
    \begin{bmatrix}
    \hat p_1-h(1;\theta)\\
    \hat p_2-h(2;\theta)\\
    .\\
    .\\
    \hat p_k-h(k;\theta)\\
    \end{bmatrix}=0\tag{1.8}
    $$
    
    The elements on the left-hand side can be recognized as the same terms which appear inside the square in the numerator of the version of the goodness of fit statistic in (1.6). We are now in a position to establish the connection between Minimum Chi-Square estimation of $\theta_0$ and estimation based on the population moment conditions in (1.7). First consider the case in which there are as many unique moment conditions as unknown paramaters, that is $k-1=p$. By definition, the Method of Moments estimator, $\hat\theta_T$ say, satisfies $\hat p_i-h(i;\hat\theta_T)=0$ for $i=1,2,\ldots,p$. This property implies that $GF_T(\hat\theta_T)=0$, and since $GF_T(\theta)\ge 0$, it must follow that $\hat\theta_T$ also minimizes $GF_T(\theta)$. So if $k-1=p$ then the Minimum Chi-Square estimator is just the Method of Moments estimator based on (1.7). Now consider the case in which there are more unique moment conditions than parameters, that is $k-1>p$. In this case, the principle of Method of Moments estimation does not work, but Minimum Chi-Square is still valid. The key difference is that Method of Moments is defined as the solution to a set of moment conditions and this solution only exists if $k-1=p$, whereas Minimum Chi-Square is defined in terms of a minimization, which can be performed for any $k-1\ge p$. This suggests that to estimate ($\mu_0,\sigma_0^2$) from the first three moments of the distribution, it is necessary to formulate the estimation in terms of a minimization. To implement such a strategy, it is necessary to specify an appropriate minimand. Once again, Minimum Chi-Square provides the answer. It is easily verified that
    
    $$
    GF_T(\theta)=T
    \begin{bmatrix}
    \hat p_1-h(1;\theta)\\
    \hat p_2-h(2;\theta)\\
    .\\
    .\\
    \hat p_k-h(k-\theta)
    \end{bmatrix}^{\ \prime}
    \begin{bmatrix}
    \hat p_1^{-1} & 0 & . & . & 0\\
    0 & \hat p_2^{-1} & . & . & 0\\
    . & . & . & . & .\\
    . & . & . & . & .\\
    0 & 0 & . & . & \hat p_k^{-1}
    \end{bmatrix}
    \begin{bmatrix}
    \hat p_1-h(1;\theta)\\
    \hat p_2-h(2;\theta)\\
    .\\
    .\\
    \hat p_k-h(k-\theta)
    \end{bmatrix}\tag{1.9}
    $$
    
    and so $GF_T(\theta)$ can be interpreted as a quadratic form in the sample moment conditon (1.8). Notice that the matrix in the cneter of (1.9) is positive definite by construction and so ensures that $GF_T(\theta)\ge 0$. This structure leads to the following intuitively appealing interpretation of the Minimum Chi-Square estimator: it is the value of $\theta$ which is closest to solving the sample moment conditions in the metric of $GF_T(\theta)$. It takes only a little reflection to realize that the same approach can be applied to the estimation of any problem in which there are more moments than parameters to be estimated. To illustrate how, let us return to estimation of ($\mu_0,\sigma_0^2$) based on (1.1)-(1.3). For this problem, the minimand takes the form
    
    $$
    MC_T(\mu,\sigma^2)=
    \begin{bmatrix}
    m_v(1)-\mu\\
    m_v(2)-(\sigma^2+\mu^2)\\
    m_v(3)-3m_v(2)\mu+3m_v(1)\mu^2-\mu^3
    \end{bmatrix}^{\ \prime}
    M_T
    \begin{bmatrix}
    m_v(1)-\mu\\
    m_v(2)-(\sigma^2+\mu^2)\\
    m_v(3)-3m_v(2)\mu+3m_v(1)\mu^2-\mu^3
    \end{bmatrix}
    \tag{1.10}
    $$
    
    where $M_T$ is a positive definite matrix which may depend on $T$, and $m_v(i)=T^{-1}\sum_{t=1}^Tv_t^i$. The Minimum Chi-Square estimators of ($\mu_0,\sigma_0^2$) are the values of ($\mu,\sigma^2$) that minimize $MC_T(\mu,\sigma^2)$.
  
  3. **Instrumental Variables (IV)**  
    Unlike Method of Moments and Minimum Chi-Square, IV was specifically developed to exploit the information in moment conditions for the estimation of structural economic models. To illustrate we consider the system of equations
    
    $$
    \begin{aligned}
    q_t^D&=\alpha_0 p_t+u_t^D\\
    q_t^S&=\beta_{0,1}^{\prime}n_t+\beta_{0,2}p_t+u_t^S\\
    q_t^D&=q_t^S=q_t
    \end{aligned}\tag{1.11}
    $$
    
    where $q_t^D$, $q_t^S$ represent demand and supply in year $t$, $p_t$ is the price of the commodity in that year and $n_t$ is aa vector containing factors that affect supply. The market is assumed to clear and the total quantity produced is denoted $q_t$. For our purpose here, it suffices to consider the problem of how to estimate $\alpha_0$ given a sample of $T$ observations on $q_t$ and $p_t$. An Ordinary Least Squares (OLS) regression of $q_t$ on $p_t$ runs into problems here because price and output are simultaneously determined and this causes OLS estimates to be biased. Sewall Wright solved these problems as follows. Suppose there is an observable variable $z_t^D$ which is related to price but $Cov[z_t^D,u_t^D]=0$. An example would be any of the factors that affect supply, such as an input price or yield per acre. Then by taking the covariance of $z_t^D$ with both sides of the demand equation in (1.11) it follows that
    
    $$
    Cov[z_t^D,q_t]-\alpha_0Cov[z_t^D,p_t]=0\tag{1.12}
    $$
    
    It is convenient to simplify this moment condition using other properties of the model. Typically, it is assumed that $E[u_t^D]=0$ and thus $E[q_t]=\alpha_0E[p_t]$. Using this identity in (1.12), the moment condition can be rewritten as
    
    $$
    E[z_t^Dq_t]-\alpha_0E[z_t^Dp_t]=0\tag{1.13}
    $$
    
    Equation (1.13) provides a population moment condition involving the observable variables and the unknown parameter, $\alpha_0$, which can be used as a basis for estimation. Pearson's Method of Moments principle leads to the estimation of the prarmeters by the values which solve the analogous sample moments, namely
    
    $$
    \hat\alpha_T=\dfrac{\sum_{t=1}^Tz_t^Dq_t}{\sum_{t=1}^Tz_t^Dp_t}\tag{1.14}
    $$
    
    This equation is now called as Instrumental Variables estimator with $Z_t^D$ being refered to as the "instrument".
    
  4. **Some definitions**  
    **Population Moment Condition (总体矩条件)** Let $\theta_0$ be a vector of unknown parameters which are to be estimated, $v_t$ be a vector of random variables and $f(.)$ a vector of functions then a population moment condition takes the form
    
    $$
    E[f(v_t,\theta_0)]=0\tag{1.17}
    $$
    
    for all $t$.  
    **Generalized Method of Moments Estimator** The Generalized Method of Moments estimator based on (1.17) if the value of $\theta$ which minimizez
    
    $$
    Q_T(\theta)=T^{-1}\sum_{t=1}^Tf(v_t,\theta)^{\prime}W_TT^{-1}\sum_{t=1}^Tf(v_t,\theta)\tag{1.18}
    $$
    
    where $W_T$ is a positive semi-definite matrix which may depend on the data but converges in probability to a positive definite matrix of constants.
    
3. **Five Examples of Moment Conditions in Economic Models**
    1. **Consumption-Based Asset Pricing Model**
    2. **Evaluation of Mutual Fund Performance**
    3. **Consitional Capital Asset Pricing Model**
    4. **Inventory Holdings by Firms**
    5. **Stochastic Volatility Models of Exchange Rates**
4. **Review of Statistic Theory**
    1. **Properties of Random Sequences**  
      Here we only consider a sequence that is detrministic and so not random. Let $\{h_T;T=1,2,\ldots\}$ be a sequence of real numbers. If this sequence has a limit, $h$, then this is denoted by
      
      $$
      \lim_{T\to\infty}h_T=h
      $$
      
      which implies that for every $\epsilon>0$ there is a positive, finite integer $T_{\epsilon}$ such that
      
      $$
      \left|h_T-h\right|<\epsilon\quad\text{for}\quad T>T_{\epsilon}.
      $$
      
      **Convergence in Probability** means for all $\epsilon>0$
      
      $$
      \lim_{T\to\infty}P\left[\left|h_T-h\right|<\epsilon\right]=1
      $$
      
      In this case h is known as the probability limit of plim of $h_T$ and is denoted by $\text{plim }h_T=h$ or $h_T\overset{p}{\to}h$.  
      **Orders in Probability** are known as "of larger order in probability", $O_p$, and "of smaller order in probability", $o_p$.  
      **Slutsky's Theorem** For a continuous vector function $f(.)$, we have $f(h_T)=f(h)$ if random vector $h_T\overset{p}{\to}h$.  
      **Consistency of an Estimator** means $\text{plim }\hat\theta=\theta_0$.  
      **Convergence in Distribution** means $F_{h_T}\to F$.
    2. **Stationary Time Series, the Weak Law of Large Numbers and the Central Limit Theorem**  
      We define the following notations as to be used in the future.  
      **Strictly Stationary Processes** means the joint probability distribution function, $F(.)$, of any subset of random vector sequence $\{v_t;t\in\mathcal{N}(T)\}$ satisfies:
      
      $$
      F(v_{t_1},v_{t_2},\ldots,v_{t_n})=F(v_{t_1+c},v_{t_2+c},\ldots,v_{t_n+c})
      $$
      
      for any integer $n$ and integer constant $c$ such that $\{v_{t_1+c},v_{t_2+c},\ldots,v_{t_n+c}\}$ is a subset of $\mathcal{N}(T)$.  
      **Weak Law of Large Numbers (WLLN)** means
      
      $$
      T^{-1}\sum_{t=1}^T v_t\overset{p}{\to}\mu
      $$
      
      **Central Limit Theorem (CLT)** means
      
      $$
      T^{-1/2}\sum_{t=1}^T(v_t-\mu)\overset{d}{\to}N(0,\Sigma)
      $$
      
      where $N(0,\Sigma)$ denotes the $s$ dimentional multivariate normal distribution with mean $0$ and postive definite covariance matrix
      
      $$
      \Sigma=\lim_{T\to\infty}\text{Var}[T^{-1/2}\sum_{t=1}^T(v_t-\mu)]
      $$
      
      The matrix $\Sigma$ is called the long run covariance matrix of $v_t$ to distinguish it from the contemporaneous covariance matrix $E\left[(v_t-\mu)(v_t\mu)^{\prime}\right]$.  
      **The Limiting Distribution of Random Linear Functions of Vector Converging to a Normal Distribution** means given a sequence of random matrices $M_T\overset{p}{\to}M$ a constant matrix, and a sequence of random vectors $h_T\overset{d}{\to}N(0,\Sigma)$, then we have
      
      $$
      M_Th_T\overset{d}{\to}N(0,M\Sigma M^{\prime}).
      $$
      
5. **Overview of Later Chapters**

## **The Instrumental Variable Estimator in Linear Regression Models**
<div style='font-family:Optima'>
1. **The Population Moment Conditon and Parameter Identification**  
  Consider the linear regression model  
  
  $$
  y_t=x_t^{\prime}\theta_0+u_t,\quad t=1,2,\ldots,T
  $$
  
  in which $x_t$ is a $(p\times1)$ vector of observed explanatory variables for the observed variable $y_t$, and $u_t$ is the unobserved error term. The $(p\times1)$ vector $\theta_0$ is an element of the parameter space $\Theta$, a subspace of the p-dimentional Euclidean space $\mathfrak{R}^p.$ The instruments are contained in the $(q\times1)$ vector $z_t$. To facilitate the discussion, it is useful to define: $u_t(\theta)=y_t-x^{\prime}\theta$. Notice that $u_t(\theta_0)=u_t$. As the analysis progresses, certain restrictions need to be placed on the variables but these will only be imposed as they become necessary to emphasize their role. At this stage, we only require the following.  
  **Assumption 1. Strict Stationarity** requires that the random vector $v_t=(x_t^{\prime},z_t^{\prime},u_t)^{\prime}$ is a strictly stationary process. This implies that any population moments of $v_t$ are independent of $t$.  
  **Assumption 2. Population Moment Condition** requires that the $(q\times1)$ vector $z_t$ satisfies $E[z_tu_t(\theta_0)]=0$. This is also refered to as an "orthogonality condition" since it states $z_t$ is statistically orthogonal to $u_t$.  
  **Assumption 3. Identification Condition** requires that $\text{rank }E[z_tx_t^{\prime}]=p$. Together with assumption 2, these two imply there is a unique value in the parameter space at which $E[z_tu_t(\theta_0)]=0$.
2. **The Estimator and a Fundamental Decomposition**  
  Let $y$ be a $(T\times1)$ vector and $X$ a $(T\times p)$ matrix with $t^{th}$ row as $x_t^{\prime}$. Similarly we define the $(T\times q)$ matrix $Z$ and the vector $u(\theta)$. So now it holds that $u(\theta)=y-X\theta$. Using this notation, the GMM minimand for this model is 
  
  $$
  Q_T(\theta)=\left\{T^{-1}u(\theta)^{\prime}Z\right\}W_T\left\{T^{-1}Z^{\prime}u(\theta)\right\}
  $$
  
  and the GMM estimator is defined as 
  
  $$
  \hat\theta_T=\text{argmin}_{\theta\in\Theta}Q_T(\theta)
  $$
  
  After simplification, it can be proved that 
  
  $$
  \hat\theta=\left(T^{-1}Z^{\prime}X\right)^{-1}\left(T^{-1}Z^{\prime}y\right)
  $$
  
3. **Asymptotic Properties**  
  GMM estimation generates two important statistics which plat a central role in inference about the underlying model: the parameter estimator and the estimated sample moment. Since the latter depends on the former, it makes most sense to begin our discussion of their asymptotic properties with the parameter estimator, and then use these results to analyze the behaviour of the estimated sample moment. The asymptotic analysis of the parameter estimator focuses on the twin properties of consistency and asymptotic normality. The latter facilitates the construction of large sample confidence intercals for the elements of $\theta_0$. The asymptotic analysis rests on applications of the Weak Law of Large Numbers (WLLN) and Central Limit Theorem (CLT), and our purpose here is to illustrate the basic ideas and so it is convenient to assume away any dependence structure in the data for the time being.  
  **Assumption 4. Independence** requires that the vector $v_t=(x_t^{\prime},z_t^{\prime},u_t)^{\prime}$ is independent of $v_{t+s}$ for all $s\not =0$. While together with assumption 1 it also implies $v_t$ is an independently and identically distributed process.  
  **Assumption 5. Classical Assumptions about $u_t$** are generally:
      1. $E[u_t]=0$,
      2. $E[u_t^2]=\sigma_0^2$,
      3. $u_t$ and $z_t$ are independent.
4. **The Optimal Choice of Weighting Matrix**  
  It is best to introduce the asymptotic variance of $\hat\theta_T$ here, which is given by
  
  $$
  V(W)=\left\{\text{E}[x_tz_t^{\prime}]W\text{E}[z_t^{\prime}x_t]\right\}^{-1}\text{E}[x_tz_t^{\prime}]WSW\text{E}[z_tx_t^{\prime}]\left\{\text{E}[x_tz_t^{\prime}]W\text{E}[z_t^{\prime}x_t]\right\}^{-1}\tag{2.35}
  $$
  
  and thus we can say that the optimal value of $W$, $W^0$ say, is the value that minimizes $V(W)$ in a matrix sense and so satisfies
  
  $$
  V(W)-V(W^0)=\text{a psd matrix}
  $$
  
  where $W$ is any other valid choice of weighting matrix. Notice that if we substitute this value into (2.35) yields
  
  $$
  V(S^{-1})=\left\{\text{E}[x_tz_t^{\prime}]S^{-1}\text{E}[z_tx_t^{\prime}]\right\}^{-1}
  $$
  
  The matrix $V(S^{-1})$ represents the efficiency bound for GMM estimation of $\theta_0$ based on the population moment condition $\text{E}[z_tu_t(\theta_0)]=0$ because all other choices of $W$ result in a variance which is at least as large.
5. **Specification Error: Consequences and Detection**  
  Omitted.
6. **Summary**  
  In this chapter we introduced the main elements of the GMM framework using the example of the IV estimator in the static linear regression model. The advantage of deriving IV in this way is that it enables us to highlight seven key features of the GMM framework:
  1. Identification
  2. Identifying and overidentifying restrictions
  3. Asymptotic properties
  4. Estimated sample moment
  5. Long run covariance estimation
  6. Optimal choice of weighting matrix
  7. Model diagnostics

## **GMM Estimation in Correctly Specified Models**
<div style='font-family:optima'>
1. **Population Moment Condition and Paramater Identification**
2. **The Estimator and Numerical Optimization**
3. **The Identifying and Overidentifying Restrictions**
4. **Asymptotic Properties**
    1. **Consistency of the Parameter Estimator**
    2. **Asymptotic Normality of the Parameter Estimator**
    3. **Asymptotic Normality of the Estimated Sample Moments**
5. **Long Run Covariance Matrix Estimation**
    1. **Serially Uncorrelated Sequences**
    2. **VARMA Processes**
    3. **Heteroscedasticity and Autocorrelation Covariance Matrix Estimators**
6. **The Optimal Choice of Weighting Matrix**
7. **Transformations, Normalizations and the Continuous Updating GMM Estimator**
8. **GMM as a Unifying Principle of Estimation**
    1. **Single Step Estimators**
    2. **Sequential Estimators**
9. **Summary**

---
<div style='font-family:optima'>
1. <p id='population moment condition'> Population moment conditions are functions of model parameters and the data, such that their expectation is zero at the true values of the parameters.</p>