More detailed CONTRIBUTING.md

Adding more detailed and structured process to contributors' guide.
microsoft · Jul 11, 2017 · bb6a495 · bb6a495
1 parent 882f4c2
commit bb6a495
Showing 1 changed file with 74 additions and 25 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,34 +1,83 @@
 ## Interested in contributing to MMLSpark?  We're excited to work with you.
 
-### You can contribute in many ways
+### You can contribute in many ways:
 
-* Use the library and give feedback
-* Report a bug
-* Request a feature
-* Fix a bug
-* Add examples and documentation
-* Code a new feature
-* Review pull requests
+* Use the library and give feedback: report bugs, request features.
+* Add sample Jupyter notebooks, Python or Scala code examples, documentation
+  pages.
+* Fix bugs and issues.
+* Add new features, such as data transformations or machine learning algorithms.
+* Review pull requests from other contributors.
 
 ### How to contribute?
 
-You can give feedback, report bugs and request new features anytime by
-opening an issue. Also, you can up-vote and comment on existing issues.
+You can give feedback, report bugs and request new features anytime by opening
+an issue.  Also, you can up-vote or comment on existing issues.
 
-To make a pull request into the repo, such as bug fixes, documentation
-or new features, follow these steps:
+If you want to add code, examples or documentation to the repository, follow
+this process:
 
-* If it's a new feature, open an issue for preliminary discussion with
-  us, to ensure your contribution is a good fit and doesn't duplicate
+#### Propose a contribution
+
+* Preferably, get started by tackling existing issues to get yourself acquainted
+  with the library source and the process.
+* Open an issue, or comment on an existing issue to discuss your contribution
+  and design, to ensure your contribution is a good fit and doesn't duplicate
   on-going work.
-* Typically, you'll need to accept Microsoft Contributor Licence
-  Agreement (CLA).
-* Familiarize yourself with coding style and guidelines.
-* Fork the repository, code your contribution, and create a pull
-  request.
-* Wait for an MMMLSpark team member to review and accept it.  Be patient
-  as we iron out the process for a new project.
-
-A good way to get started contributing is to look for issues with a "help
-wanted" label.  These are issues that we do want to fix, but don't have
-resources to work on currently.
+* Any algorithm you're planning to contribute should be well known and accepted
+  for production use, and backed by research papers.
+* Algorithms should be highly scalable and suitable for very large datasets.
+* All contributions need to comply with the MIT License.  Contributors external
+  to Microsoft need to sign CLA.
+
+#### Implement your contribution
+
+* Fork the MMLSpark repository.
+* Implement your algorithm in Scala, using our wrapper generation mechanism to
+  produce PySpark bindings.
+* Use SparkML `PipelineStage`s so your algorithm can be used as a part of
+  pipeline.
+* For parameters use `MMLParam`s.
+* Implement model saving and loading by extending SparkML `MLReadable`.
+* Use good Scala style.
+* Binary dependencies should be on Maven Central.
+* See this [pull request](https://github.com/Azure/mmlspark/pull/22) for an
+  example contribution.
+
+#### Implement tests
+
+* Set up build environment.  Use a Linux machine or VM (we use Ubuntu, but other
+  distros should work too), and install environment using the [`runme`
+  script](runme).
+* Test your code locally.
+* Add tests using ScalaTests — unit tests are required.
+* A sample notebook is required as an end-to-end test.
+
+#### Implement documentation
+
+* Add a [sample Jupyter notebook](notebooks/samples) that shows the intended use
+  case of your algorithm, with instructions in step-by-step manner.  (The same
+  notebook could be used for testing the code.)
+* Add in-line ScalaDoc comments to your source code, to generate the [API
+  reference documentation](https://mmlspark.azureedge.net/docs/pyspark/)
+
+#### Open a pull request
+
+* In most cases, you should squash your commits into one.
+* Open a pull request, and link it to the discussion issue you created earlier.
+* An MMLSpark core team member will trigger a build to test your changes.
+* Fix any build failures.  (The pull request will have comments from the build
+  with useful links.)
+* Wait for code reviews from core team members and others.
+* Fix issues found in code review and re-iterate.
+
+#### Build and check-in
+
+* Wait for a core team member to merge your code in.
+* Your feature will be available through a Docker image and script installation
+  in the next release, which typically happens around once a month.  You can try
+  out your features sooner by using build artifacts for the version that has
+  your changes merged in (such versions end with a `.devN`).
+
+If in doubt about how to do something, see how it was done in existing code or
+pull requests, and don't hesitate to ask.