In [1]:
import pandas as pd

This notebook builds on the analysis performed by [Niehaus and Sukthankar (2013)](https://www.aeaweb.org/articles?id=10.1257/pol.5.4.230). I first assess potential shortcomings of their analysis and then reestimate some of their regressions using the Tobit model James Tobin (1958). I find stronger substitution effects for panchayats that make use of daily-wage projects while panchayats that relied only on piece-rate projects showed no increase in daily-wage rent extraction.

The following abbreviations are used frequently throughout the text. I therefore decided to list them at the beginning of my robustness analysis.

* Bwd.: Backward
* DiD: Difference-in-Difference 
* DW: Daily-Wage
* Forw.: Forward
* Frac.: Fraction
* PR: Piece-Rate

# Table of Contents
* Assessing the Quality of the Analysis
    * Multicollinearity
    * Limited Dependent Variables
    * Multiway Clustering
    * Unblocked Backdoor Paths
* Estimation Strategy
* Results
    * Forward Daily-Wage Fraction
    * Theft on Daily-Wage Projects
    * Theft on Piece-Rate Projects
    * Difference-in-Difference
* Conclusion and Discussion
* References

# Assessing the Quality of the Analysis

### Multicollinearity

When performing the replication in *Python 3.6.9* the *statsmodels* output informed me that in all regressions multicollinearity was present. To assess this problem I calculated the pearson correlation coefficient between the most important variables before performing additional robustness checks (Table 1). Out of interest I also added the variables $DW\ Diff.$ and $PR\ Diff.$ which is the difference between the official and the actual days worked/ amount paid. Correlation between the independent variables is mostly weak with two exceptions. First the correlation betweeen $Backw.\ fraction\ DW$ and $Forw.\ fraction\ DW$ is 0.789 which is quite high but not surprising. This could get problematic when estimating equation $(6)$. Second the correlation between $Shock$ and $Day$ is very high. This is especially problematic for the later regression analysis, since it is harder to isolate the effect of the wage hike from the time trend. Further multicollinearity will lead to wider standard errors, yet, this problem is partially alleviated by the big sample size.
When calculating the direct measure of theft from daily-wage and piece-rate projects I found that some observations were negative indicating that the actual payment was above that officially reported. Of course it might be that the official didn't report that much on that day. However, an official reporting less than the actual payment seems more like exposing himself to unnecessary risk. Incomplete recall from the survey respondents seems to be the more obvious explanation. In case measurement error is present this could explain the low correlation between the offially reported quantities and the actual quantities. Further it would bias the estimates if the measurement is systematic.

In [3]:
pd.read_csv("tables/CorTable.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,Off. DW days,Act. DW days,Off. rate PR,Act. rate PR,DW Diff.,Rate PR diff,Forw. fraction DW,Backw. fraction DW,Shock,Day,Day of Month
Off. DW days,1.0,0.234,-0.008,0.031,0.969,-0.013,0.187,0.162,-0.066,-0.091,-0.035
Act. DW days,,1.0,0.054,0.141,-0.014,0.03,0.021,0.047,-0.11,-0.104,-0.003
Off. rate PR,,,1.0,0.124,-0.022,0.986,-0.262,-0.241,-0.054,-0.046,-0.016
Act. rate PR,,,,1.0,-0.004,-0.043,-0.108,-0.118,0.02,0.008,0.014
DW Diff.,,,,,1.0,-0.021,0.187,0.156,-0.04,-0.067,-0.036
Rate PR diff,,,,,,1.0,-0.246,-0.223,-0.057,-0.047,-0.018
Forw. fraction DW,,,,,,,1.0,0.789,0.058,0.064,0.012
Backw. fraction DW,,,,,,,,1.0,-0.063,-0.071,-0.015
Shock,,,,,,,,,1.0,0.866,0.0
Day,,,,,,,,,,1.0,0.238


### Limited Dependnet Variables

__Figure 1: Histogram Forward Daily-Wage Fraction (2 month window)__

<img src="figures/Hist_fdwfrac.png" width=720 height=720 />

As can be seen in the above plot the variable $Fwd.\ DW\ Frac.$ from regression equation $(1')$ in the [Replication notebook](https://github.com/HumanCapitalAnalysis/student-project-LeonardMK/blob/master/Replication%20and%20Causal%20Graphs/Replication.ipynb) is limited to be between 0 and 1. When running OLS with all observations our estimates will be biased downwards. This problem isn't adressed by Niehaus and Sukthankar. In their analysis they find that the shock had no effect on project shelf composition which lead them to the conclusion that officials didn't try to reclassify piece-rate projects to daily-wage projects, for extract higher rents.  
Censoring is also a problem for the variables $Days\ DW\ Off.$ and $Rate\ PR\ Off.$. Here the value range is limited from below since it isn't possible to report a negative amount of days worked or money paid.

__Figure 2: Histogram Officially reported Days Daily-Wage Projects__

<img src="figures/Hist_daysdw.png" width=720 height=720/>

__Figure 3:  Histogram Officially reported Rate paid for Piece-Rate Projects__

<img src="figures/Hist_rateproff.png" width=720 height=720/>

Not adressing the limited range of our dependent variable will lead to downward biased coefficients. To solve this problem I make use of the Tobit model introduced by James Tobin (1958). The Tobit regression model is from the class of censored regression models which fits well in the context described above. The tobit model assumes that we want to estimate the following regression equation

$$ y_{pt}^{*}=x_{pt}^{'}\beta + \delta_p + \epsilon_{pt} \tag{1}$$

Where $p, t$ denote individual and time, $y^{*}$ is a latent variable which coincides with our observed variable if it falls in the uncensored value range. For the observed variable it holds that 

$$y_{pt} = \begin{cases}
a   &{if\ y_{pt}^{*} \leq a} \\
y_{pt}^{*}   &{if\ y_{pt}^{*}\ \in\ (a, b)} \\
b   &{if\ y_{pt}^{*} \geq b}
\end{cases} \tag{2} $$

The coefficient estimated by the Tobit model don't have the same interpretation as in the case of OLS. Instead it can be interpreted as the change in $y_{pt}^{*}$ if $y_{pt}$ is observed, times the probability of obersving $y_{pt}^{*}$. Note that for the value range in which we observe $y_{pt}^{*}$ the estimated coefficients have the same interpretation as the usual OLS coefficients. Crucial for Tobit is to assume an error distribution. For my analysis I will assume that $\epsilon ~ \mathcal{N}(0, \sigma^2)$. Modelling the censored part is then similiar to estimating a Probit regression.

### Multiway Clustering
    
To accounnt for the heterogeneity in their sample Niehaus and Sukthankar use two-way cluster robust standard errors. They cluster on the panchayat and day level. In the following I make reference to A. Colin Cameron and Douglas L. (2015). The two-way cluster robust covariance assumes that $E[u_iu_j|\bf{x_i}\ \bf{x_j}] = 0$ unless $i$ and $j$ share the same cluster. In the two-way cluster case the covariance matrix can be written in the following form

$$ \hat{V}_{clu}[\hat{\beta}] = {\bf({X}^{'} X)^{-1} \hat{B}_{clu}(X^{'} X)^{-1}} \tag{4} $$

Where $\bf{\hat{B}}$ is defined by 

$$ {\bf \hat{B}} = \sum_{i = 1}^N \sum_{j = 1}^N {\bf x_i x_j^{'}} \hat{u}_i \hat{u}_j {\bf 1 }[i,\ j\ \text{share}\ \textbf{any}\ \text{cluster}] \tag{5} $$

Where $\hat{u}_i$ is the residual estimated for individual $i$ and $\bf{x_i}$ is a vector containing the independent variables of observation $i$. However, the assumption that errors are uncorrelated beyond the panchayat level might be too restrictive. Compliance is amongst others controlled by block development officers. It might therefore be possible that blocks differ in the intensity of enforcement and therefore their level of corruption. This could lead to correlated residuals ($u_i$) on the block level. To control for this I compare significance of regression equations cluster by panchayat and day with the one's obtained from clustering by block and day.

### Unblocked Back-Door Paths

What is striking in Niehaus and Sukthankar's analysis is the low $R^2$. Throughout all specifications the value is below $0.14$ although this doesn't have to be a problem the low $R^2$ could be a sign that a backdoor path between the Shock variable and the dependent variable is unblocked and the estimate for the treatment is biased. Said situation is plotted in Figure 4. To account for this I try to block backdoor paths by including block district fixed effects in my regression equations. Unfortunately, not all models were solvable and I didn't achieve an improvement for all models.

__Figure 4: Causal Graph showing unblocked backdoor paths__

<img src="figures/CG2.png" width=720 height=720/>


# Estimation Strategy

As explained above I use Tobit-regression to estimate the effect of the wage increase. All models were fit using the package $AER$ which estimtes the model using maximum likelihood. Further clustered standard errors were adjusted for district/block and day. To account for lower levels of heterogeneity. To adjust for some unobserved effects I estimated some models with block fixed effects instead of district fixed effects. Unfortunately, due to multicollinearity, not all models were solvable. To get a sense of how good a model fits the data I also reported McFadden's pseudo $R^2$. I reestimated equations $(1') \text{to} (6')$ which are listed below, however, I excluded the squared time trend, and squared, cubic day of month effects from estimation since they didn't improve model fit. For an in depth explanation of the equations see the [Replication](https://github.com/HumanCapitalAnalysis/student-project-LeonardMK/blob/master/Robustness_checks.ipynb) notebook.

$$ {Fwd\ DW\ Frac}_{pt} = \beta_0 + \beta_1 Shock_t + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_pt \tag{1'}$$

$$ {y}_{pt} = \beta_0 + \beta_1 y_{pt} + \beta_2 Shock_t + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_pt \tag{2'} $$

$$ {y}_{pt} = \beta_0 + \beta_1 y_{pt} + \beta_2 (OR\ Shock) \times OR_p + \beta_3 (AP\ Shock\ 1)_t \times AP_p + \beta_4 (AP\ Shock\ 2)_t \times AP_p + \beta_5 (OR\ Shock)_t + \beta_6 (AP\ Shock\ 1)_t + \beta_7 (AP\ Shock\ 2)_t + OR_p + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_{pt} \tag{3'} $$

$$ {y}_{pt} = \beta_0 + \beta_1 y_{pt} + \beta_2 Shock_t + \beta_3 Shock_t \times (Always\ DW)_{pt} + \beta_4 (Always\ DW)_pt + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_pt \tag{4'} $$

$$ {y}_{pt} = \beta_0 + \beta_1 y_{pt} + \beta_2 Shock_t + \beta_3 Shock_t \times (Fdw\ All)_{pt} + \beta_4 (Fdw\ All)_{pt} + \beta_5 Shock_t \times (Fdw\ Some)_{pt}\\ + \beta_6 (Fdw\ Some)_{pt} + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_{pt} \tag{5'} $$

$$ {y}_{pt} = \beta_0 + \beta_1 y_{pt} + \beta_2 Shock_t + \beta_3 Shock_t \times (Fdw\ All)_{pt} + \beta_4 Shock_t \times (Bdw\ All)_{pt} + \beta_5 Shock_t \times (Fdw\ Some)_{pt} + \beta_6 Shock_t \times (Bdw\ Some)_{pt} + \beta_7 (Fdw\ All)_{pt} + \beta_8 (Bdw\ All)_{pt} + \beta_9 (Fdw\ Some)_{pt} + \beta_{10} (Bdw\ Some)_{pt} + \textbf{T}_t^{'} \gamma + \textbf{R}_p^{'} \zeta + \delta_{d(p)} + \epsilon_{pt} \tag{6'} $$

Where $p,\ t$ are panchayat and time index, $y_{pt}$ is either the officially reported days of daily-wage work or the reported piece-rate rate, $\textbf{T},\ \textbf{R},\ \delta_{d(p)}$ are time trend respectively panchayat level controls and district or block fixed effects.

# Results

### Forward Daily-Wage Fraction

Before I present the results from equation $(1')$ I want to use Figure 5 below to illustrate the problem the high correlation between the time trend and the shock indicator poses for the analysis. The figure plots the average $Fwd.\ DW\ Frac.$ for each day with the vertical line indicating the day the wage reform came into effect. As we can see there is a strong increase in $Fwd.\ DW\ Frac.$ over time and especially after the shock. From the beginning of the fiscal year on the 01.03.2007 the $Fwd.\ DW\ Frac.$ rose from 66% to 72% to the end of the sample period. Although it is quite tempting to suppose that the shock is responsible, this explanation might be too simple. Niehaus and Sukthankar note that there is a clear classification of daily-wage and piece-rate projects. For example enviromental conditions (Monsoon) could necessiate the switch to daily-wage projects over summer. Usually this problem could be solved using regression analysis. However, the high correlation between the time trend and the shock indicator prevents isolation of effects. To rule out seasonal patterns longer observational periods could be useful.

__Figure 5: Average Fwd DW Frac. over time__

<img src="figures/Fdwfrac.png" width=720 height=720/>

Table 2 below presents the coefficients from estimating equation $(1')$ using the Tobit model. The coefficients can be interpreted as the linear increase in $y$  if $y^{*}$ is observed. Column 1 and 2 are the same model with the difference that column 1 clusters by panchayat and column 2 by block. Although clustering by block increases the standard errors significance of the coefficients remains nearly unchanged. We see that the effect of the shock is negative but insignificant while the linear time trend is positive and highly significant. Column 3 replaces the district fixed effects by block fixed effects. This leads to a stronger and significant effect of the shock on the $Fwd\ DW\ Frac.$. The effect for the linear time-trend is still negative although weaker. Rough calculations based on the coefficients from column 3 show that day and shock together resulted in an increase in $Fwd.\ DW\ Frac.$ of 2.5%. The odd sign of the shock effect is probably due to multicollinearity. Contrary to Niehaus and Sukthankar my findings show that panchayats did adapte their project shelf over time. Due to multicollinearity we have to treat these results with caution.

In [9]:
pd.read_csv("tables/Table2.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3
Shock,-0.0173,-0.0173,-0.0367 **
Shock SE,(0.0393),(0.0616),(0.015)
Day,0.0023 ***,0.0023 **,0.0016 ***
Day SE,(6e-04),(9e-04),(3e-04)
Mc-Fadden R2,0.09,0.09,0.34
Time controls,Day,Day,Day
Fixed effects,District,District,Block
Observations,12103,12103,12103


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

### Theft from Daily-Wage Projects

Turning next to the results from estimating the effect of the shock on theft from daily-wage projects. Column 1-3 present the results from estimating equation $(2')$ while column 4-6 report estimates from equation $(4')$. As in Table 2 the estimates in column 1 and 2 respectively 4 and 5 differ only in their standard errors since the first one clusters by panchayats while the latter clusters by block. Significance of coefficients is unaffected by this. The exact same structure also holds for Table 3B.
Interestingly we can see that the shock is still consistent with proposition 1, however, contrary to Niehaus and Sukthankar the effect is much stonger (2.68 compared to 0.89) and highly signifiant. Meaning that if a panchayat actually employed daily-wage projects the shock increased the days reported. Same holds for $Shock x(Always\ DW)$, the effect is stronger than calculated by Niehaus and Sukthankar. A backdraw in this setting is that the McFadden Psuedo $R^2$ actually decreases by implementing Tobit. Adjusting for block fixed effect leads only to a marginal imporvement. However, keep in mind that it is not directly comparable to the $R^2$ calculated from an OLS regression.

In [3]:
# FE are missing
pd.read_csv("tables/Table3A.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3,4,5,6
Shock,2.68 ***,2.68 ***,2.53 ***,3.43 ***,3.43 ***,3.33 ***
Shock SE,(0.5),(0.67),(0.59),(0.47),(0.63),(0.55)
Shock x (Always DW),,,,-2.31 ***,-2.31 ***,-2.37 ***
Shock x (Always DW) SE,,,,(0.48),(0.52),(0.5)
Always DW,,,,7.19 ***,7.19 ***,7.28 ***
Always DW SE,,,,(0.36),(0.37),(0.32)
Mc-Fadden R2,0.03,0.03,0.05,0.04,0.04,0.06
Time controls,Day,Day,Day,Shock x Day,Shock x Day,Shock x Day
Fixed effects,District,District,Block,District,District,Block
Observations,13542,13542,13542,13542,13542,13542


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

Table 3B presents equation $(5')$ and $(6')$ in columns 1-3 respectively 4-6. Here I estimate a dynamic equation testing whether future and past rent extraction opportunities influence the officially reported days worked on daily-wage projects. Similar to Niehaus and Sukthankar the shock coefficient is positive and highly significant although stronger. Keep in mind here that the class that had $Fwd.\ DW\ Frac. = 0$ is the reference class. We ca see from column 1-3 that daily-wage reporting increased the most for this reference class while the increase was weaker for panchayats with higher levels of $Fwd.\ DW\ Frac$. This could mean that panchayatas that rely more heavily on daily-wage projects fear detection the most. Turning to column 4-6 we see that the effect of the shock increases for those panchayats that don't have $Fwd\ DW\ All = 1$. Interestingly the effect is completly offset for panchayats that have only planned daily-wage projects in the next two months. Further supporting the claim that those panchayats might fear detection the most. Not in line with the findings of Niehaus and Sukthankar is the effect of $Shock x (Bwd. DW Some)$. The sign of the coefficient is rather odd and hard to explain.

In [5]:
pd.read_csv("tables/Table3B.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3,4,5,6
Shock,6.48 ***,6.48 ***,5.36 ***,8.96 ***,8.96 ***,7.4 ***
Shock SE,(0.85),(0.97),(0.92),(1.11),(1.28),(1.17)
Shock x (Forward DW All),-5.1 ***,-5.1 ***,-4.45 ***,-6.84 ***,-6.84 ***,-7.34 ***
Shock x (Forward DW All) SE,(0.83),(0.87),(0.84),(1),(0.98),(0.9)
Shock x (Forward DW Some),-2.91 ***,-2.91 ***,-1.9 **,0.66,0.66,0.41
Shock x (Forward DW Some) SE,(0.85),(0.83),(0.85),(0.86),(0.82),(0.83)
Shock x (Backward DW All),,,,0.36,0.36,1.85 *
Shock x (Backward DW All) SE,,,,(1.06),(1.1),(1.06)
Shock x (Backward DW Some),,,,-7.2 ***,-7.2 ***,-6.08 ***
Shock x (Backward DW Some) SE,,,,(0.93),(0.92),(0.87)


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

### Theft on Piece-Rate Projects

We see from Table 4A that the we observe similar patterns for piece-rate projects. Estimating block fixed effects was not possible since the maximum likelihood found no unique solution. Consequently these controls are missing. Column 1-2 and respectively 3-4 yield the same estimates and only differ in their standard errors. The first column clusters by panchayat while the second by block. Column 1-2 estimate equation $(2')$ and column 3-4 equation $(4')$.  
The shock coefficient has the same sign as in the original analysis but is bigger in size. Meaning that panchayats that actually made use of piece-rate projects tend to overreport less on piece-rate projects after the shock. However, this only holds for panchayats with $Always\ PR = 0$. For panchayats that only employed piece-rate projects ($Always\ PR=1$) the shock lead only to a small increase in piece-rate projects. There could be a reason for some panchayats to only use piece-rate project. As mentioned by Niehaus and Sukthankar there is a clear classification of daily-wage and piece-rate projects. It might be that most of the projects they can perform are piece-rate projects. An official, from said panchayats, substituting to daily-wage projects might stand out and might draw unwanted attention. To avoid this these officials would miss out on the increased rent extraction profitability due to the expected costs.

In [7]:
pd.read_csv("tables/Table4A.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3,4
Shock,-108.31 **,-108.31 *,-139 ***,-139 **
Shock SE,(44.73),(55.75),(46.44),(59.2)
Shock x (Always PR),,,126.96 ***,126.96 **
Shock x (Always PR) SE,,,(49.23),(56.76)
Always PR,,,145.43 ***,145.43 ***
Always PR SE,,,(36.1),(39.38)
Mc-Fadden R2,0.04,0.04,0.05,0.05
Time controls,Day,Day,Shock x Day,Shock x Day
Fixed effects,District,District,District,District
Observations,13542,13542,13542,13542


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

The structure from above holds for the results from Table 4B as well. Column 1-2 estimate equation $(5')$ and column 3-4 estimate equation $(6')$. I find no effect on panchayats that had no daily-wage projects planned in the next two months while the decrease in officially piece-rate rates is strongest for panchayats that have $Fwd. DW Some = 1$. Implying that the substitution effect is strongest for these panchayats. Again these could be due to panchayat level constrains. While officials from panchayats that only make use of piece-rate projects might be unable to credibly switch to daily-wage projects, this constraint might be less restrictive for officials from panchayats that employ a mix of projects. When adding the shock trend interaction term the magnitudes of effects for $Fwd.\ DW\ Some$ and $Fwd.\ DW\ All$ switch, keep in mind that $Fwd.\ DW\ Frac.$ and $Bwd.\ DW\ Frac.$ are strongly correlated (Table 1). It will then often be the case that a panchayat that has $Bwd.\ DW\ All = 1$ also has $Fwd.\ DW\ All = 1$. The reduction in piece-rate payment is then most likely still strongest for panchayats with $Fwd.\ DW\ Frac. \in (0, 1)$.

In [8]:
pd.read_csv("tables/Table4B.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3,4
Shock,68.8,68.8,45.79,45.79
Shock SE,(50.93),(58.04),(55.29),(56.21)
Shock x (Forward DW All),-224.93 ***,-224.93 ***,-349.66 ***,-349.66 ***
Shock x (Forward DW All) SE,(72.11),(72.39),(82.67),(90.82)
Shock x (Forward DW Some),-315.03 ***,-315.03 ***,-228.9 ***,-228.9 ***
Shock x (Forward DW Some) SE,(48.88),(51.88),(57.35),(61.34)
Shock x (Backward DW All),,,205.08 **,205.08 **
Shock x (Backward DW All) SE,,,(92.86),(95.77)
Shock x (Backward DW Some),,,-93.11 *,-93.11 *
Shock x (Backward DW Some) SE,,,(55.07),(52.69)


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

### Difference-in-Difference

As the last step of my analysis I reestimated equation $(5')$. Before doing so I want to graphically assess whether the assumptions needed for DiD to be consistent are actually met. The most important assumption is that the treated and the untreated follow the same trend pre-treatment. If this condition is not met the estimated treatment effect will be biased. Looking at Figure 6 below plots the sum of rates paid by piece-rate projects by day for Andra Pradesh and Orissa. The vertical lines indicate the time at which Andra Pradesh or Orissa made adjustments to their payment scheme. Remember that Andra Pradesh employs only piece-rate projects during the sample period. It is therefore debatable whether such differing institutional setups prohibit the parallel trend assumption. Interestingly the wage hike in Orissa coincides with a strong increase in piece-rate payments in Andra-Pradesh. Also, the curves only start moving in parallel after the wage hike in Orissa.

__Figure 6: Sum of Rates Piece-Rate over Time__

<img src="figures/DiD_assump.png" width=720 height=720/>

Table 5 presents estimates from equation $(5')$. Column 3 and 4 yield the same point estimates, however, the first clusters by panchayat while the second cluster by block. The estimate of interest is $OR x (OR\ Shock)$ which is stronger than the one by Niehaus and Sukthankar (-87.54) and also stronger than the coefficients estimated in Table 4A and 4B. The coefficient is highly significant. This is a strong support for proposition 2.

In [3]:
pd.read_csv("tables/Table5.csv", index_col = [0], keep_default_na = False)

Unnamed: 0,1,2,3,4
OR x (OR Shock),-209.77 ***,-204.93 ***,-202.74 ***,-202.74 ***
OR x (OR Shock) SE,(54.1),(53.85),(53.75),(75.39)
AP x (AP Shock 1),1.9,11.13,11.85,11.85
AP x (AP Shock 1) SE,(69.64),(70.29),(70.08),(77.22)
AP x (AP Shock 2),368.61 ***,378.62 ***,383.41 ***,383.41 ***
AP x (AP Shock 2) SE,(55.07),(55.07),(54.94),(76.73)
OR Shock,104.06 ***,108.25 ***,167.64 ***,167.64 ***
OR Shock SE,(39.35),(39.22),(40),(52.41)
AP Shock 1,193.34 ***,184.96 ***,82.09,82.09
AP Shock 1 SE,(51.69),(52.71),(54.93),(69.56)


\*\*\*, \*\*, \* Denotes significance at the 1, 5, 10 percent level

# Conlusion and Discussion

Using Tobit regression I can confirm the propositions made by Niehaus and Sukthankar. However, due to the use of the Tobit regression I find that the actual effect is much stronger for those panchayats that were uncensored than found by Niehaus and Sukthankar. I further find that clustering on the block level rarely changes the significance of my estimates and doesn't change my conclusions. As a point of concern, remains the low explanatory power of my regressions leaving the possibility of an unblocked back door path. I was also unable to fully use the panel-structure of the dataset in *R*. Adding lagged dependent variables could account for many uncontrolled effects and help isolate the treatment effect. Also, the strong correlation between the time-trend and the shock indicator variable makes reliable inference hard in this case longer observation periods could help to circumvent this problem.  
Given the setting it could also be of interest to control  whether a worker's social status has an effect. Data from India might be especially useful in this regard since the NREGS officials also collect data about caste, age and family status. It is possible that officials also adjust the risk from underpaying a certain individual.

# References

***A Practitioner's Guide to Cluster-Robust Inference***, A. Colin Cameron and Douglas L. Miller; (*Journal of Human Resources*, Spring 2015, 50(2); 317-373).

***Econometric Analysis, 5th edition***, William H. Greene; (*Upper Saddle River, NJ Prentice Hall*, 2003).

***Corruption Dynamics: The Golden Goose Effect***, Paul Niehaus and Sandip Sukhtankar; (*American Economic Journal: Economic Policy*, 2013, 5(4); 230-269).

***Estimation of Relationships for Limited Dependent Variables*** James Tobin; (*Econometrica*, 1958, 26; 24-36).