diff --git a/docs/quick-start.md b/docs/quick-start.md index a85e5b28a6996..200b97230e866 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily: {% highlight python %} ->>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count() +>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count() {% endhighlight %} Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`: