Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

may you share what book you used for First-Difference Estimator #1

Open
Sandy4321 opened this issue May 12, 2022 · 15 comments
Open

may you share what book you used for First-Difference Estimator #1

Sandy4321 opened this issue May 12, 2022 · 15 comments

Comments

@Sandy4321
Copy link

great material thanks
only may you share what book you used for First-Difference Estimator
https://www.youtube.com/watch?v=p9NhSrTugYM&list=PLOQU3c_3DSpLTBa0vqPFVwDCqXlXiu49j&index=55

also for all DiD
14.2) Algebra of Difference-in-Differences (DID)
14.3) Python: Diff-in-Diff (DD)
14.4) Quasi-Experiment Diff-in-Diff (DID)

what else material may be helpful to understand DiD?

and alpha_i

@causal-methods
Copy link

Angrist, J. D. and Pischke, J. (2014). Mastering ’Metrics: The Path from Cause to Effect, Princeton University Press.

Jeffrey M. Wooldridge (2016), Introductory Econometrics: A Modern Approach, 6th Edition, Cengage Learning.

Kamada, Vitor. (2020b). Causal Inference with Python. https://causal-methods.github.io/Book

Using Python for Introductory Econometrics by Florian Heiss and Daniel Brunner

Angrist, Joshua D. and Pischke, Jörn-Steffen (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press

Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data. 2ed, Cambridge: MIT Press

@Sandy4321
Copy link
Author

Sandy4321 commented May 13, 2022 via email

@causal-methods
Copy link

These are the simple and best books for beginners. They have great chapters of Panel Data (Fixed effects, first difference, etc.)

Real Econometrics: The Right Tools to Answer Important Questions
by Michael Bailey Jan 3, 2019

Introduction to Econometrics (3rd Edition) by STOCK JAMES & W. WATSON MARK | Jan 1, 2017

@Sandy4321
Copy link
Author

Sandy4321 commented May 13, 2022 via email

@causal-methods
Copy link

Econometrics textbooks do no cover Machine Learning. Econometrics focus on causal inference and not forecasting. The exception is Time Series Econometrics.

If you want to see examples and solutions for your example, study the book: An Introduction to Statistical Learning
https://www.statlearning.com/
It is easy to find Python code for the examples of book on Internet.

@Sandy4321
Copy link
Author

Sandy4321 commented May 13, 2022 via email

@causal-methods
Copy link

Panel data has a time dimension. But Econometrics of Panel Data doesn't deal traditionally with this type of problem: prediction with 200 features. You are better off using Machine Learning textbooks. The combination of Panel Data techniques and Machine learning methods are only covered at high level technical papers. There is no simple book for beginners. You can study both techniques in separated, using different books.

@Sandy4321
Copy link
Author

Sandy4321 commented May 13, 2022 via email

@causal-methods
Copy link

  1. Panel data may refer to that data structure, that is, the same entities are observed across time. 2) Another meaning is Panel methods (Econometrics estimators for causal inference, such as fixed effects, fist difference, DID, etc.

The article "Assigning Panel Data to Training, Testing and Validation Groups for Machine Learning Models" is about (1) panel data forecasting using Machine Learning Methods. It is what you learn using Machine Learning textbooks.

The article "A Guide to Panel Data Regression: Theoretics and Implementation with Python" is about (2). It is what you learn from Econometrics textbooks.

If your goal is forecasting, go for Deep Learning (Neural Network). If you want to establish causality, study econometrics. There is no reason to run a marathon with ballet point shoes or dance ballet with running shoes.

If you can read the papers of Susan Athey and implement her method, it is excellent. She has been developing methods at the intersection of Causal Inference and Machine Learning. She and her coauthors are using Machine Learning Methods to leverage the Causal Inference Methods. Fundamentally, they are attacking Causal Inference questions.

@Sandy4321
Copy link
Author

Sandy4321 commented May 15, 2022 via email

@causal-methods
Copy link

First I would ignore the Panel Data structure and deploy Neural Network using Keras. The best book is: Deep Learning with Python, Second Edition by Francois Chollet | Dec 21, 2021

Another decent approach is xgboost. Book: Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python by Corey Wade (Author), Kevin Glynn.

If the results are unsatisfactory or/and you want to go deeper, try to integrate the Panel Data structure.
Paper: Interpretable Neural Networks for Panel Data Analysis in Economics
Yucheng Yang, Zhong Zheng, Weinan E

[How to process panel data for use in a recurrent neural network (RNN)]
https://stackoverflow.com/questions/40008240/how-to-process-panel-data-for-use-in-a-recurrent-neural-network-rnn

@Sandy4321
Copy link
Author

Sandy4321 commented May 15, 2022 via email

@causal-methods
Copy link

Before you said a Matrix with 1000000 rows. This is more than enough for Deep Learning.

The estimators of Panel Data use the information that we observe the same unit at a different point in time. Let's say that we observe the revenue of Microsoft over several years. The observations (rows) of Microsoft are likely to be dependent because, at the end of the day, they are observations of the same company Microsoft. This information is useful to mitigate bias, that is, to deal with endogeneity problems. This is unlikely to improve the accuracy of the forecasting. The Machine Learning algorithm is designed to maximize forecasting. Panel Data is not the typical data structure of most Machine Learning problems. Panel Data estimators are actually transforming data (time demeaning, fist difference, etc). All these transformations in data are not useful for forecasting.

Each Machine Learning algorithm needs the data in a "certain way". Whatever the way, is your job to make the modifications. Even for Panel Data estimators, you have to set (declare) the time and unity of analysis variables. In this case, you would have two columns as indices. Usually, you cannot use this data format for Machine Learning algorithms.

If you have a small sample size, use whatever Machine Learning algorithm is more appropriate.

@causal-methods
Copy link

Even if Panel Data, you can run the regular OLS that ignores the Panel Data Structure. In this case, each observation of Microsoft is treated as independent. Obvious the results are different. The regular OLS suppose to be biased. Roughly speaking, the Machine Learning algorithm does the same as regular OLS.

@Sandy4321
Copy link
Author

Sandy4321 commented May 16, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants