Skip to content

Commit

Permalink
address comments
Browse files Browse the repository at this point in the history
  • Loading branch information
brkyvz committed May 5, 2015
1 parent a417ba5 commit b30ace2
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 8 deletions.
9 changes: 5 additions & 4 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -934,10 +934,11 @@ def cov(self, col1, col2):
def crosstab(self, col1, col2):
"""
Computes a pair-wise frequency table of the given columns. Also known as a contingency
table. The number of distinct values for each column should be less than 1e4. The first
column of each row will be the distinct values of `col1` and the column names will be the
distinct values of `col2`. The name of the first column will be `$col1_$col2`. Pairs that
have no occurrences will have `null` as their counts.
table. The number of distinct values for each column should be less than 1e4. At most, 1e6
non-zero pair frequencies returned will be returned.
The first column of each row will be the distinct values of `col1` and the column names
will be the distinct values of `col2`. The name of the first column will be `$col1_$col2`.
Pairs that have no occurrences will have `null` as their counts.
:func:`DataFrame.crosstab` and :func:`DataFrameStatFunctions.crosstab` are aliases.
:param col1: The name of the first column. Distinct items will make the first item of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {

/**
* Computes a pair-wise frequency table of the given columns. Also known as a contingency table.
* The number of distinct values for each column should be less than 1e4. The first
* column of each row will be the distinct values of `col1` and the column names will be the
* distinct values of `col2`. The name of the first column will be `$col1_$col2`. Counts will be
* returned as `Long`s. Pairs that have no occurrences will have `null` as their counts.
* The number of distinct values for each column should be less than 1e4. At most, 1e6 non-zero
* pair frequencies returned will be returned.
* The first column of each row will be the distinct values of `col1` and the column names will
* be the distinct values of `col2`. The name of the first column will be `$col1_$col2`. Counts
* will be returned as `Long`s. Pairs that have no occurrences will have `null` as their counts.
*
* @param col1 The name of the first column. Distinct items will make the first item of
* each row.
Expand Down

0 comments on commit b30ace2

Please sign in to comment.