title	description	keywords	author	ms.author	manager	ms.date	ms.topic	ms.service	ms.assetid	ROBOTS	audience	ms.devlang	ms.reviewer	ms.suite	ms.tgt_pltfrm	ms.custom
rx_summary: Generate summary statistics (revoscalepy)	Produce univariate summaries of objects in revoscalepy.	summary	chuckheinzelman	charlhe	cgronlun	07/15/2019	reference	mlserver				Python

rx_summary

Usage

revoscalepy.rx_summary(formula: str, data, by_group_out_file=None,
    summary_stats: list = None, by_term: bool = True, pweights=None,
    fweights=None, row_selection: str = None, transforms=None,
    transform_objects=None, transform_function=None, transform_variables=None,
    transform_packages=None, transform_environment=None,
    overwrite: bool = False, use_sparse_cube: bool = None,
    remove_zero_counts: bool = None, blocks_per_read: int = None,
    rows_per_block: int = 100000, report_progress: int = None,
    verbose: int = 0, compute_context=None, **kwargs)

Description

Produce univariate summaries of objects in revoscalepy.

Arguments

formula

Statistical model using symbolic formulas. The formula typically does not contain a response variable, i.e. it should be of the form ~ terms.

data

either a data source object, a character string specifying a ‘.xdf’ file, or a data frame object to summarize. If a Spark compute context is being used, this argument may also be an RxHiveData, RxOrcData, RxParquetData or RxSparkDataFrame object or a Spark data frame object from pyspark.sql.DataFrame.

by_group_out_file

None, a character string or vector of character strings specifying .xdf file names(s), or an RxXdfData object or list of RxXdfData objects. If not None, and the formula includes computations by factor, the by-group summary results will be written out to one or more ‘.xdf’ files. If more than one .xdf file is created and a single character string is specified, an integer will be appended to the base by_group_out_file name for additional file names. The resulting RxXdfData objects will be listed in the categorical component of the output object.

summary_stats

A list of strings containing one or more of the following values: “Mean”, “StdDev”, “Min”, “Max”, “ValidObs”, “MissingObs”, “Sum”.

by_term

bool variable. If True, missings will be removed by term (by variable or by interaction expression) before computing summary statistics. If False, observations with missings in any term will be removed before computations.

pweights

Character string specifying the variable to use as probability weights for the observations.