Skip to content
forked from alteryx/DSx

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science.

Notifications You must be signed in to change notification settings

claudio-toledo/DSx

 
 

Repository files navigation

DSx

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science. In a series of notebooks, we show how we can build predictive models from raw data within a day - all using open source software.

Open source tools used

  • pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Featuretools is a DARPA sponsored open source software that enables data scientists to automatically extract features from time varying temporal data.
  • scikit-learn is a free software machine learning library for the Python programming language.

Concepts to learn

  • Prediction engineering
  • Feature engineering

Notebooks

  • NYC-Taxi-Dataset -Learn feature engineering
  • Retail-Dataset - Learn prediction engineering

Installation

Linux

sh install_linux.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook

Mac

sh install_osx.sh
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook

About

Hands on tutorials demonstrating the concepts of Prediction engineering, Feature engineering and automation in data science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.3%
  • Python 4.6%
  • Shell 0.1%