Panel-Data-Regression

Panel Data Regression models UN general assembly data from https://www.kaggle.com/unitednations/general-assembly (compiled and published by Professor Erik Voeten of Georgetown University.)

Panel data is essentially cross sectional data but rather than sampled once, it is sampled many times adding a time aspect to the data that can be controlled for as well as group variables (in this case nations). Controlling for time can allow the ability to see variables that change over time but are constant amongst certain groups.

This U.N assembly voting data qualifies as panel data because it

1.) Samples the same group muliple times, throughout time 2.) collects attributes on these groups (yes votes, no vote, abstain, affinity scores)

This code compares three panel data regression techniques: Pooled OlS, Fixed Effects, and Random Effects.

Assumptions of each model:

Pooled Effects assumes that there are universal effects across time and that there is individual heterogeneity
Fixed Effecs asssumes individual heterogeneity that does not vary over time, and that may or may not be correlated with dependent variable.
Random effects asume there is unique, time constant atributes of groups/individuals (not correlated with regressors)

#Packages required
library(plm) #for panel data regressions

Data Description

Lots of missing values in this data so I imputed all NA's with averages for the continuous variables. Excluded the missing values for categorical variable years in excel.

abstain[is.na(abstain) ] <- mean(abstain, na.rm = TRUE)
yes_votes[is.na(yes_votes) ] <- mean(yes_votes, na.rm = TRUE)   
no_votes [is.na(no_votes ) ] <- mean(no_votes , na.rm = TRUE)   
idealpoint_estimate[is.na(idealpoint_estimate) ] <- mean(idealpoint_estimate, na.rm = TRUE)   
affinityscore_usa[is.na(affinityscore_usa) ] <- mean(affinityscore_usa, na.rm = TRUE)   
affinityscore_russia[is.na(affinityscore_russia) ] <- mean(affinityscore_russia, na.rm = TRUE)   
affinityscore_israel[is.na(affinityscore_israel) ] <- mean(affinityscore_israel, na.rm = TRUE)   
affinityscore_china[is.na(affinityscore_china) ] <- mean(affinityscore_china, na.rm = TRUE)   
affinityscore_brazil[is.na(affinityscore_brazil) ] <- mean(affinityscore_brazil, na.rm = TRUE)   
affinityscore_india[is.na(affinityscore_india) ] <- mean(affinityscore_india, na.rm = TRUE)

1.) Pooled OLS

Simple OLS regression that ignores the time and group aspect of the data.

pooled = lm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
+affinityscore_israel+affinityscore_russia,data=paneldata)

summary(pooled)

.65 adjusted r-squared, can be better with time dummy variables.

#Pooled OLS estimator with time dummies:
Pooled2=plm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
            +affinityscore_israel+affinityscore_russia+factor(year),data=paneldata,index=c("state_name","year"),model='pooling')
summary(Pooled2)

There were a lot of significant years that affected the number of no votes. Adjusted R-squared increased to .75, meaning 75% of the variation of no_votes is explained by the model.

# can use this function to get cluster robust standard errors clustered by time. (can be group or both)

coeftest(Pooled2,vcov.=vcovHC,cluster="time")

2.) Fixed Effects

Takes into consideration group variable

fixedeffects =plm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
        +affinityscore_israel+affinityscore_russia,data=paneldata,index=c("state_name","year"),model='within')
summary(fixedeffects)

Pretty low R-squared of .27, this is most likely due to missing important time related factors.

OlS with dummy variables for country

olscountrydv =lm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
      +affinityscore_israel+affinityscore_russia+factor(state_name),data=paneldata)
summary(olscountrydv)

Actually performs quite well with an adjusted R-squared of .786. However is still missing time related factors.

3.) Random Effects

Takes into consideration group and time variables, eliminating bias from unobserved time related factors (prevents omitted variable bias).

#random effects model
randomeffects=plm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
                  +affinityscore_israel+affinityscore_russia,data=paneldata,index=c("state_name","year"),model='random')
summary(randomeffects)

Predictive power is still relatively low, let's try adding time dummy variables:

randomeffect2 =plm(no_votes~yes_votes+abstain+idealpoint_estimate+affinityscore_usa+affinityscore_brazil+affinityscore_china+affinityscore_india
        +affinityscore_israel+affinityscore_russia,data=paneldata,index=c("state_name","year"),effect="time",model='random')
summary(randomeffect2)

72% of variation within the data no_votes can be explained by our random effects model.

Panel Data conclusion:

Fixed effects with dummy for countries had the highest predictive power with an adjusted of R squared .78. 2nd was OLS with time dummies adjusted with an adjusted R squared .75. Lastly, random effects with time dummies had an adjusted R squared of .72. All models had very similar predictive power, the DV effect of being in a certain country had slightly higher significance than the no_votes being in a certain year.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
UN General Assembly Voting data.xltx		UN General Assembly Voting data.xltx
paneldataregression.R		paneldataregression.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Panel-Data-Regression

Data Description

1.) Pooled OLS

2.) Fixed Effects

3.) Random Effects

About

Releases

Packages

Languages

danortega2014/Panel-Regression

Folders and files

Latest commit

History

Repository files navigation

Panel-Data-Regression

Data Description

1.) Pooled OLS

2.) Fixed Effects

3.) Random Effects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages