Skip to content

dayuyang1999/Merger_Acquisition_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Processing of MA Project

This Github repo contains the data pre-processing part of the MA Project.

  • This repo does not contain all the data file due to the size restriction of Github(The data file should be loaded seperatly).

Folder Structure

By default, the overall dir should be like this:

root
-- MA_data
---- SDC
------ [year].xlsx # the annually Merge and Acquisition xlsx file downloaded from SDC Platinum DataBase
---- wrds_bridge.csv # the GVKEY-CUSIP Linking table downloaded from WRDS
---- evans_bridge.csv # 
---- tmp
------ [filename].pickle # where the cache pickle file stored in
-- MA
---- <where all the code stored, everything on Github>


Data Sources

  • Thomason Reuter's SDC Platinum
    • provide comprehensive MA event data
  • Compustat (financial variable)
    • Unfortunately, there are a lot missing values in this dataset(poor quality). I suggest big data guy should not consider using this database.
  • EDGAR (financial disclosures)
    • operated by SEC. Since the annual disclosure (10-K) is required by law. There is no missing data. However, they are all raw textutal data.
  • TNIC (Text-based Network Industry Classifications Data)
    • TNIC is created based on bag-of-words (may seems to be a out-of-date NLP technique in Machine Learning area.). However, TNIC is surprisingly popular in Finance area.

Usage

Please Run all Master files to generate approparate dataset for predicting Merger and Acquisition. The generated dataset.pkl file can be further loaded via Pytorch DataLoader Object in the modeling repository.

About

Data Pre-processing Step of MA prediction project. The generated pickle file could be directly loaded by Pytorch DataLoader which defined in the modeling repo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published