Skip to content

BruceZhou2012/shifu

 
 

Repository files navigation

Shifu

Build Status

Getting Started

Please visit shifu.ml for download infomation, installation instructions, and tutorials.

What is Shifu?

Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop. Shifu is designed for data scientists, simplifying the life-cycle of building machine learning models. While originally built for fraud modeling, Shifu is generalized for many other modeling domains.

Shifu provides a simple command-line interface for each step of the model building process, including

  • Statistic calculation & variable selection to determine the most predictive variables in your data
  • Variable normalization
  • Distributed variable selection based on sensitivity analysis
  • Distributed neural network model training
  • Post training analysis & model evaluation

Shifu’s fast Hadoop-based, distributed neural network training can reduce model training time from days to hours on 500GB data sets. Shifu integrates with Pig workflows on Hadoop, and Shifu-trained models can be integrated into production code with a simple Java API. Shifu leverages Pig, Akka, Encog and other open source projects.

Contributors

  • Zhanghao Hu
  • Grahame Jastrebski
  • Lavar Li
  • Mark Liu
  • David Zhang
  • Xin Zhong
  • Simon Zhang

Google Group

Please join Shifu group if questions, bugs or anything else.

Copyright and License

Copyright 2012-2015, PayPal Software Foundation under the Apache License.

Packages

No packages published

Languages

  • Java 97.9%
  • PigLatin 1.6%
  • Shell 0.5%