In this repository, I put data analysis projects I have done for real-world data by using different statistical models and algorithms in machine learning or deep learning.
To complete each project, I mainly use Python, MySQL, and some useful libraries, such as NumPy (for vectorization calculation), Pandas (for data manipulation and analysis), SciPy (for scientific computing), Matplotlib (for plotting), nltk (for natural language processing), and sckit-learn (for machine learning). In addition, I implemented a learning algorithm "from scratch" by using a Tensorflow framework and coded them in object oriented programming.
- Predicting sentiment from product reviews (plan to do)
- Predicting house pricing in Seattle (Regression, Tensorflow, Object Oriented Programming, Python) (ongoing)
- Studying Test Relationships from Dognition Database (Queries by MySQL)
- Spelling Recommender (Natural Language Processing, Python)
- Predicting the propensity to pay renewal premium and building an incentive plan for its agents to maximise the net revenue (McKinsey Analytics Online Hackathon) (Binary Classification, sckit-learn, Python)
- Life Satisfaction vs. GDP per capita (Linear Regression, sckit-learn, Python)