Data Processing of MA Project

This Github repo contains the data pre-processing part of the MA Project.

This repo does not contain all the data file due to the size restriction of Github(The data file should be loaded seperatly).

Folder Structure

By default, the overall dir should be like this:

root
-- MA_data
---- SDC
------ [year].xlsx # the annually Merge and Acquisition xlsx file downloaded from SDC Platinum DataBase
---- wrds_bridge.csv # the GVKEY-CUSIP Linking table downloaded from WRDS
---- evans_bridge.csv # 
---- tmp
------ [filename].pickle # where the cache pickle file stored in
-- MA
---- <where all the code stored, everything on Github>

Data Sources

Thomason Reuter's SDC Platinum
- provide comprehensive MA event data
Compustat (financial variable)
- Unfortunately, there are a lot missing values in this dataset(poor quality). I suggest big data guy should not consider using this database.
EDGAR (financial disclosures)
- operated by SEC. Since the annual disclosure (10-K) is required by law. There is no missing data. However, they are all raw textutal data.
TNIC (Text-based Network Industry Classifications Data)
- TNIC is created based on bag-of-words (may seems to be a out-of-date NLP technique in Machine Learning area.). However, TNIC is surprisingly popular in Finance area.

Usage

Please Run all Master files to generate approparate dataset for predicting Merger and Acquisition. The generated dataset.pkl file can be further loaded via Pytorch DataLoader Object in the modeling repository.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.DS_Store		.DS_Store
Appendix_1.1_variable_description.ipynb		Appendix_1.1_variable_description.ipynb
Appendix_1.1x_sdc_data_explore.ipynb		Appendix_1.1x_sdc_data_explore.ipynb
Appendix_1.2_majority_takeover.ipynb		Appendix_1.2_majority_takeover.ipynb
Appendix_1.3_CRSP_all_CUSIP.ipynb		Appendix_1.3_CRSP_all_CUSIP.ipynb
Appendix_1.4_PERMNO_GVKEY.ipynb		Appendix_1.4_PERMNO_GVKEY.ipynb
Appendix_1.5_EWENS.ipynb		Appendix_1.5_EWENS.ipynb
Appendix_1.6_check_linking_correctness.ipynb		Appendix_1.6_check_linking_correctness.ipynb
Appendix_1.7_sdc_deal_distribution.ipynb		Appendix_1.7_sdc_deal_distribution.ipynb
Appendix_2.1.x_choose_financial_vars.ipynb		Appendix_2.1.x_choose_financial_vars.ipynb
Appendix_2.1_prepare_fin_vars.ipynb		Appendix_2.1_prepare_fin_vars.ipynb
Appendix_2.2_financial_var_NA_threshold.ipynb		Appendix_2.2_financial_var_NA_threshold.ipynb
Appendix_3.1_TNIC_dataexplore.ipynb		Appendix_3.1_TNIC_dataexplore.ipynb
Appendix_3.2_TNIC_to_Adjacency.ipynb		Appendix_3.2_TNIC_to_Adjacency.ipynb
Appendix_3.3_TNIC_top_k_peers.ipynb		Appendix_3.3_TNIC_top_k_peers.ipynb
Appendix_4.1_frequent_acquirers.ipynb		Appendix_4.1_frequent_acquirers.ipynb
Appendix_4.2_processing_neighbors.ipynb		Appendix_4.2_processing_neighbors.ipynb
Appendix_4.3_timeline_of_a_firm.ipynb		Appendix_4.3_timeline_of_a_firm.ipynb
Appendix_4.4_timeline_of_a_firm_2.ipynb		Appendix_4.4_timeline_of_a_firm_2.ipynb
Benchmark1_Logistic_regression.ipynb		Benchmark1_Logistic_regression.ipynb
Master1_data_prepare.py		Master1_data_prepare.py
Master_1_Linking.ipynb		Master_1_Linking.ipynb
Master_2_financial_var.ipynb		Master_2_financial_var.ipynb
Master_3_TNIC.ipynb		Master_3_TNIC.ipynb
Master_4_DataLoader_2.ipynb		Master_4_DataLoader_2.ipynb
Master_4_DataLoader_3.ipynb		Master_4_DataLoader_3.ipynb
Master_4_DataLoader_tesing_4.ipynb		Master_4_DataLoader_tesing_4.ipynb
Master_4_Dataloader_1.ipynb		Master_4_Dataloader_1.ipynb
Master_5_testing.ipynb		Master_5_testing.ipynb
Master_6_main.ipynb		Master_6_main.ipynb
OLD_Master1_Linking.ipynb		OLD_Master1_Linking.ipynb
OLD_Master1_data_prepare.ipynb		OLD_Master1_data_prepare.ipynb
OLD_Master_2_add_financial_vars.ipynb		OLD_Master_2_add_financial_vars.ipynb
README.md		README.md
Testing_1_creating_predictive_data.ipynb		Testing_1_creating_predictive_data.ipynb
config.yaml		config.yaml
data_loader.py		data_loader.py
dataloader_helpers.py		dataloader_helpers.py
filter_helpers.py		filter_helpers.py
fin_var_helpers.py		fin_var_helpers.py
merge_helpers.py		merge_helpers.py

dayuyang1999/Merger_Acquisition_Data

Folders and files

Latest commit

History

Repository files navigation

Data Processing of MA Project

Folder Structure

Data Sources

Usage

About

Resources

Stars

Watchers

Forks

Languages