Skip to content
Demonstrates data munging, analysis and visualization techniques
Python
Branch: master
Clone or download

README.md

Hacker's Guide to Machine Learning and Predictive Modelling

In November 2014, I completed Andrew Ng's Machine Learning course and wanted to apply some of the techniques I learnt to see how they work in real world situations. I realized that there are very few practical machine learning guides available on internet. This project aims to fill this hole.

The repository serves as a step-by-step guide for people who want to get started in machine learning and predictive modelling. The project is still in it's initial stages and any contributions are highly appreciated. It currently contains code snippets I have written while participating in a few kaggle competitions. I plan to work on it full time in December so that it could help people like me become better at machine learning and predictive modelling.

Titanic: Machine Learning from Disaster

This is one of the best place to get your hands dirty. In this kaggle challenge, we are asked to complete the analysis of what sorts of people were likely to survive. In particular, we are asked to apply the tools of machine learning to predict which passengers survived the tragedy.

Here is my example code for this competition.

Digit Recognizer

It is a kaggle competition with the goal of taking an image of a handwritten single digit, and determining what that digit is. The data for this competition were taken from the popular MNIST dataset.

Here is my example code for this competition.

Sentiment Analysis on Movie Reviews

It is a kaggle competition which presents a chance to benchmark our sentiment-analysis ideas on the Rotten Tomatoes dataset. We are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.

Here is my example code for this competition.

Forecast use of a city bikeshare system

It is a kaggle competition in which participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C. Here, we will create an ensemble of Random Forest and Gradient Boosting and tune parameters for these learning algorithms.

Here is my example code for this competition.

License

All code in this repository is released under the MIT License.

You can’t perform that action at this time.