Predict first day performance of Hong Kong IPO stocks: A pipeline example of machine learning projects.
Crawl, Parse and Clean Hong Kong IPO data from AAStocks.com using selenium webdriver and phantomjs (around 400 data points).
Use pandas for data cleaning and feature engineering, including feature selection and handling big values, missing values and categorical values (one hot encoding)
Use xgboost for regression model to predict first day performance. Generated feature importance plot is very interesting.