Skip to content

Commit

Permalink
[SPARK-22107] Change as to alias in python quickstart
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Updated docs so that a line of python in the quick start guide executes. Closes #19283

## How was this patch tested?

Existing tests.

Author: John O'Leary <jgoleary@gmail.com>

Closes #19326 from jgoleary/issues/22107.
  • Loading branch information
John O'Leary authored and HyukjinKwon committed Sep 25, 2017
1 parent 576c43f commit 20adf9a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ This first maps a line to an integer value and aliases it as "numWords", creatin
One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily:

{% highlight python %}
>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).as("word")).groupBy("word").count()
>>> wordCounts = textFile.select(explode(split(textFile.value, "\s+")).alias("word")).groupBy("word").count()
{% endhighlight %}

Here, we use the `explode` function in `select`, to transfrom a Dataset of lines to a Dataset of words, and then combine `groupBy` and `count` to compute the per-word counts in the file as a DataFrame of 2 columns: "word" and "count". To collect the word counts in our shell, we can call `collect`:
Expand Down

0 comments on commit 20adf9a

Please sign in to comment.