From 8c293a5c93efc1bb196dcf3ac5b42d0827141caa Mon Sep 17 00:00:00 2001
From: BenFradet <benjamin.fradet@gmail.com>
Date: Thu, 10 Dec 2015 22:40:06 +0100
Subject: [PATCH 1/2] added a paragraph regarding
 StringIndexer#setHandleInvalid to the ml-features doc

---
 docs/ml-features.md | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 7ad7c4eb7ea65..3478963056ddb 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -459,6 +459,42 @@ column, we should get the following:
 "a" gets index `0` because it is the most frequent, followed by "c" with index `1` and "b" with
 index `2`.
 
+Additionaly, there are two strategies regarding how `StringIndexer` will handle
+unseen labels when you have set up a `StringIndexer` on a dataset which you want
+to reuse on another:
+
+- throw an exception (which is the default)
+- skip the row containing the unseen label entirely
+
+**Examples**
+
+Let's go back to our previous example but this time reuse our previously defined
+`StringIndexer` on the following dataset:
+
+~~~~
+ id | category
+----|----------
+ 0  | a
+ 1  | b
+ 2  | c
+ 3  | d
+~~~~
+
+If you've not set how `StringIndexer` handles unseen labels or set it to
+"error", an exception will be thrown.
+However, if you had called `setHandleInvalid("skip")`, the following dataset
+will be generated:
+
+~~~~
+ id | category | categoryIndex
+----|----------|---------------
+ 0  | a        | 0.0
+ 1  | b        | 2.0
+ 2  | c        | 1.0
+~~~~
+
+Notice that the row containing "d" does not appear.
+
 <div class="codetabs">
 
 <div data-lang="scala" markdown="1">

From 0fb5e2b9880477501dc959f503fb10d142350ee9 Mon Sep 17 00:00:00 2001
From: BenFradet <benjamin.fradet@gmail.com>
Date: Fri, 11 Dec 2015 23:28:14 +0100
Subject: [PATCH 2/2] addressed comments

---
 docs/ml-features.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/ml-features.md b/docs/ml-features.md
index 3478963056ddb..72eb2de6aeae1 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -460,8 +460,8 @@ column, we should get the following:
 index `2`.
 
 Additionaly, there are two strategies regarding how `StringIndexer` will handle
-unseen labels when you have set up a `StringIndexer` on a dataset which you want
-to reuse on another:
+unseen labels when you have fit a `StringIndexer` on one dataset and then use it
+to transform another:
 
 - throw an exception (which is the default)
 - skip the row containing the unseen label entirely