Skip to content

Latest commit

 

History

History
347 lines (301 loc) · 7.01 KB

File metadata and controls

347 lines (301 loc) · 7.01 KB

DataFrame

pyspark.pandas

Constructor

DataFrame

Attributes and underlying data

DataFrame.index DataFrame.columns DataFrame.empty

DataFrame.dtypes DataFrame.shape DataFrame.axes DataFrame.ndim DataFrame.size DataFrame.select_dtypes DataFrame.values

Conversion

DataFrame.copy DataFrame.isna DataFrame.astype DataFrame.isnull DataFrame.notna DataFrame.notnull DataFrame.pad DataFrame.bool

Indexing, iteration

DataFrame.at DataFrame.iat DataFrame.head DataFrame.idxmax DataFrame.idxmin DataFrame.loc DataFrame.iloc DataFrame.items DataFrame.iteritems DataFrame.iterrows DataFrame.itertuples DataFrame.keys DataFrame.pop DataFrame.tail DataFrame.xs DataFrame.get DataFrame.where DataFrame.mask DataFrame.query

Binary operator functions

DataFrame.add DataFrame.radd DataFrame.div DataFrame.rdiv DataFrame.truediv DataFrame.rtruediv DataFrame.mul DataFrame.rmul DataFrame.sub DataFrame.rsub DataFrame.pow DataFrame.rpow DataFrame.mod DataFrame.rmod DataFrame.floordiv DataFrame.rfloordiv DataFrame.lt DataFrame.gt DataFrame.le DataFrame.ge DataFrame.ne DataFrame.eq DataFrame.dot DataFrame.combine_first

Function application, GroupBy & Window

DataFrame.apply DataFrame.applymap DataFrame.pipe DataFrame.agg DataFrame.aggregate DataFrame.groupby DataFrame.rolling DataFrame.expanding DataFrame.transform

Computations / Descriptive Stats

DataFrame.abs DataFrame.all DataFrame.any DataFrame.clip DataFrame.corr DataFrame.corrwith DataFrame.count DataFrame.cov DataFrame.describe DataFrame.ewm DataFrame.kurt DataFrame.kurtosis DataFrame.mad DataFrame.max DataFrame.mean DataFrame.min DataFrame.median DataFrame.mode DataFrame.pct_change DataFrame.prod DataFrame.product DataFrame.quantile DataFrame.nunique DataFrame.sem DataFrame.skew DataFrame.sum DataFrame.std DataFrame.var DataFrame.cummin DataFrame.cummax DataFrame.cumsum DataFrame.cumprod DataFrame.round DataFrame.diff DataFrame.eval

Reindexing / Selection / Label manipulation

DataFrame.add_prefix DataFrame.add_suffix DataFrame.align DataFrame.at_time DataFrame.between_time DataFrame.drop DataFrame.droplevel DataFrame.drop_duplicates DataFrame.duplicated DataFrame.equals DataFrame.filter DataFrame.first DataFrame.head DataFrame.last DataFrame.rename DataFrame.rename_axis DataFrame.reset_index DataFrame.set_index DataFrame.swapaxes DataFrame.swaplevel DataFrame.take DataFrame.isin DataFrame.sample DataFrame.truncate

Missing data handling

DataFrame.backfill DataFrame.dropna DataFrame.fillna DataFrame.replace DataFrame.bfill DataFrame.ffill DataFrame.interpolate

Reshaping, sorting, transposing

DataFrame.pivot_table DataFrame.pivot DataFrame.sort_index DataFrame.sort_values DataFrame.nlargest DataFrame.nsmallest DataFrame.stack DataFrame.unstack DataFrame.melt DataFrame.explode DataFrame.squeeze DataFrame.T DataFrame.transpose DataFrame.reindex DataFrame.reindex_like DataFrame.rank

Combining / joining / merging

DataFrame.append DataFrame.assign DataFrame.merge DataFrame.join DataFrame.update DataFrame.insert

DataFrame.resample DataFrame.shift DataFrame.first_valid_index DataFrame.last_valid_index

Serialization / IO / Conversion

DataFrame.from_records DataFrame.info DataFrame.to_table DataFrame.to_delta DataFrame.to_parquet DataFrame.to_spark_io DataFrame.to_csv DataFrame.to_pandas DataFrame.to_html DataFrame.to_numpy DataFrame.to_spark DataFrame.to_string DataFrame.to_json DataFrame.to_dict DataFrame.to_excel DataFrame.to_clipboard DataFrame.to_markdown DataFrame.to_records DataFrame.to_latex DataFrame.style

DataFrame.spark provides features that does not exist in pandas but in Spark. These can be accessed by DataFrame.spark.<function/property>.

DataFrame.spark.frame DataFrame.spark.cache DataFrame.spark.persist DataFrame.spark.hint DataFrame.spark.to_table DataFrame.spark.to_spark_io DataFrame.spark.apply DataFrame.spark.repartition DataFrame.spark.coalesce

Plotting

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

DataFrame.plot DataFrame.plot.area DataFrame.plot.barh DataFrame.plot.bar DataFrame.plot.hist DataFrame.plot.box DataFrame.plot.line DataFrame.plot.pie DataFrame.plot.scatter DataFrame.plot.density DataFrame.hist DataFrame.boxplot DataFrame.kde

Pandas-on-Spark specific

DataFrame.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark. These can be accessed by DataFrame.pandas_on_spark.<function/property>.

DataFrame.pandas_on_spark.apply_batch DataFrame.pandas_on_spark.transform_batch