Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
# Predicting Tips in the New York Taxi Market

Read the report:

# Contents - Downloads spatial and tabular data from the NYC TLC website, samples it and stores it locally - Implements data cleaning on the tabular data - Implements data cleaning on the spatial data - Implements data cleaning on the weather data - Implements new feature creation - Produces a number of plots relating tipping to key features - Regression approach to tip prediction
08_Classification - Classification approach to tip prediction - This contains a number of utility functions used by the scripts and must be included in the same location. It is
not intended to be run directly.

# Instructions
Run all files except in order of the two digit prefix of the filename.

# Notes
- All scripts are written in Python 2.7.
- Scripts should be run in order as some have dependencies on data produced by the previous scripts.
- can be configured to download data from specific months or years. It is recommended that at least
one full year's data is downloaded.
- You will need to manually deploy the weather data CSV to a directory called data in the location the script runs from
i.e. ./data/Weather_extract_2018.csv. The rest of the data will be downloaded and unzipped automatically by the scripts.
- You will also need to create the directory ./figures where the plots will be created.

# Dependencies
Beyond core python 2.7 libraries, you will need the following packages installed to run these scripts:
- numpy
- pandas
- geopandas
- keras
- SKLearn
- XGBoost
- scipy
- statsmodels
- prettytable
- matplotlib
- shapely


My MSc Data Science Thesis Submission



No releases published


No packages published