Feature Engineering as Composable Functions
Scala Shell Thrift
Latest commit 97cee32 Feb 1, 2017 @toddmowen toddmowen committed on GitHub Automatic version update (#98)
Failed to load latest commit information.
core/src Fix apparent race condition with conforms type lookup (#80) Sep 7, 2016
examples/src Algebird 0.12 (#90) Oct 31, 2016
plugin Json spec (#40) Jul 24, 2016
project Automatic version update (#98) Jan 31, 2017
scalding/src Lower minTestsOk for thermometer specs (#95) Jan 11, 2017
test/src Add raw data source types to SourceBinder Aug 12, 2016
tools Automatic version update (#88) Oct 4, 2016
.gitignore Remove .DS_Store from .gitignore. Jul 12, 2016
.travis.yml Bump CI Jul 21, 2016
LICENSE Add Apache Licence file Aug 21, 2015
README.markdown Add quickstart to README with link to latest version (#72) Aug 17, 2016
TROUBLESHOOTING.markdown Add joinMulti type inference to TROUBLESHOOTING (#85) Sep 27, 2016
USERGUIDE.markdown Add mergeFiles option to HiveParquetSink (#96) Jan 10, 2017
formatting.md Settings Aug 20, 2015
sbt Update sbt script Jan 4, 2016
version.sbt Automatic version update (#98) Jan 31, 2017


Coppersmith - Feature Generation, as Functions

Build Status Gitter chat

  1. a person who makes artifacts from copper.

  2. data is malleable; fold and hammer it into various shapes that are more attractive to analysts and data scientists.

coppersmith is a library to enable the joining, aggregation, and synthesis of "features", streams of facts about entities derived from "analytical records".

This library was originally written by a squad within the Analytics & Information group at Commonwealth Bank, looking to improve the task of authoring and maintaining features for use in predictive analytics and machine learning.

Our working hypothesis was that for all the complexity of the business domain and the size of the data sets involved, fundamentally the logic used in feature generation can be described as simple functions and those functions should be able to be composed. The framework now called coppersmith grew out of our efforts to improve the lives of feature authors.

Quick Start

Add the following dependency to your SBT build configuration (typically build.sbt)

libraryDependencies += "au.com.cba.omnia" %% "coppersmith-scalding" % "<coppersmith-version>"

, where <coppersmith-version> is replaced with the version number of coppersmith you want to use (click the preceding link to find the latest version).


We have a richly detailed user guide, which we consider a good introduction to coppersmith. PR's to the user guide as you become familiar with the library are especially encouraged!!!

There is also a troubleshooting guide available and a GitHub Pages site which provides additional information including latest version and usage information.

Generated Code

Classes and objects from the commbank.coppersmith.generated, commbank.coppersmith.scalding.generated and commbank.coppersmith.lift.generated packages are generated at build time with MultiwayJoinGenerator. The generated files can be found under the target/scala-2.11/src_managed/main/ directory of the core, scalding and test subprojects respectively.


The change log lists all backwards-incompatible changes to the library (i.e. changes which might break existing client code). Any such changes require bumping the second number in the version.