Please visit shifu.ml for download infomation, installation instructions, and tutorials.
Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.
Shifu provides a simple command-line interface for each step of the model building process, including
- Statistic calculation & variable selection to determine the most predictive variables in your data
- Variable normalization
- Distributed variable selection based on sensitivity analysis
- Distributed neural network model training
- Post training analysis & model evaluation
Shifu’s fast Hadoop-based, distributed neural network training can reduce model training time from days to hours on 500GB data sets. Shifu integrates with Pig workflows on Hadoop, and Shifu-trained models can be integrated into production code with a simple Java API. Shifu leverages Pig, Akka, Encog and other open source projects.
- Zhanghao Hu
- Grahame Jastrebski
- Lavar Li
- Mark Liu
- David Zhang
- Xin Zhong
- Simon Zhang
Please join Shifu group if questions, bugs or anything else.
Copyright 2012-2015, PayPal Software Foundation under the Apache License.