Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science. In a series of notebooks, we show how we can build predictive models from raw data within a day - all using open source software.
pandas
is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.Featuretools
is a DARPA sponsored open source software that enables data scientists to automatically extract features from time varying temporal data.scikit-learn
is a free software machine learning library for the Python programming language.
Prediction engineering
Feature engineering
NYC-Taxi-Dataset
-Learn feature engineeringRetail-Dataset
- Learn prediction engineering
Linux
sh install_linux.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook
Mac
sh install_osx.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook