Skip to content

Latest commit

 

History

History
377 lines (292 loc) · 7.09 KB

pyspark.ml.rst

File metadata and controls

377 lines (292 loc) · 7.09 KB

MLlib (DataFrame-based)

Pipeline APIs

pyspark.ml

Transformer UnaryTransformer Estimator Model Predictor PredictionModel Pipeline PipelineModel

Parameters

pyspark.ml.param

Param Params TypeConverters

Feature

pyspark.ml.feature

Binarizer BucketedRandomProjectionLSH BucketedRandomProjectionLSHModel Bucketizer ChiSqSelector ChiSqSelectorModel CountVectorizer CountVectorizerModel DCT ElementwiseProduct FeatureHasher HashingTF IDF IDFModel Imputer ImputerModel IndexToString Interaction MaxAbsScaler MaxAbsScalerModel MinHashLSH MinHashLSHModel MinMaxScaler MinMaxScalerModel NGram Normalizer OneHotEncoder OneHotEncoderModel PCA PCAModel PolynomialExpansion QuantileDiscretizer RobustScaler RobustScalerModel RegexTokenizer RFormula RFormulaModel SQLTransformer StandardScaler StandardScalerModel StopWordsRemover StringIndexer StringIndexerModel Tokenizer UnivariateFeatureSelector UnivariateFeatureSelectorModel VarianceThresholdSelector VarianceThresholdSelectorModel VectorAssembler VectorIndexer VectorIndexerModel VectorSizeHint VectorSlicer Word2Vec Word2VecModel

Classification

pyspark.ml.classification

LinearSVC LinearSVCModel LinearSVCSummary LinearSVCTrainingSummary LogisticRegression LogisticRegressionModel LogisticRegressionSummary LogisticRegressionTrainingSummary BinaryLogisticRegressionSummary BinaryLogisticRegressionTrainingSummary DecisionTreeClassifier DecisionTreeClassificationModel GBTClassifier GBTClassificationModel RandomForestClassifier RandomForestClassificationModel RandomForestClassificationSummary RandomForestClassificationTrainingSummary BinaryRandomForestClassificationSummary BinaryRandomForestClassificationTrainingSummary NaiveBayes NaiveBayesModel MultilayerPerceptronClassifier MultilayerPerceptronClassificationModel MultilayerPerceptronClassificationSummary MultilayerPerceptronClassificationTrainingSummary OneVsRest OneVsRestModel FMClassifier FMClassificationModel FMClassificationSummary FMClassificationTrainingSummary

Clustering

pyspark.ml.clustering

BisectingKMeans BisectingKMeansModel BisectingKMeansSummary KMeans KMeansModel KMeansSummary GaussianMixture GaussianMixtureModel GaussianMixtureSummary LDA LDAModel LocalLDAModel DistributedLDAModel PowerIterationClustering

Functions

pyspark.ml.functions

array_to_vector vector_to_array predict_batch_udf

Vector and Matrix

pyspark.ml.linalg

Vector DenseVector SparseVector Vectors Matrix DenseMatrix SparseMatrix Matrices

Recommendation

pyspark.ml.recommendation

ALS ALSModel

Regression

pyspark.ml.regression

AFTSurvivalRegression AFTSurvivalRegressionModel DecisionTreeRegressor DecisionTreeRegressionModel GBTRegressor GBTRegressionModel GeneralizedLinearRegression GeneralizedLinearRegressionModel GeneralizedLinearRegressionSummary GeneralizedLinearRegressionTrainingSummary IsotonicRegression IsotonicRegressionModel LinearRegression LinearRegressionModel LinearRegressionSummary LinearRegressionTrainingSummary RandomForestRegressor RandomForestRegressionModel FMRegressor FMRegressionModel

Statistics

pyspark.ml.stat

ChiSquareTest Correlation KolmogorovSmirnovTest MultivariateGaussian Summarizer SummaryBuilder

Tuning

pyspark.ml.tuning

ParamGridBuilder CrossValidator CrossValidatorModel TrainValidationSplit TrainValidationSplitModel

Evaluation

pyspark.ml.evaluation

Evaluator BinaryClassificationEvaluator RegressionEvaluator MulticlassClassificationEvaluator MultilabelClassificationEvaluator ClusteringEvaluator RankingEvaluator

Frequency Pattern Mining

pyspark.ml.fpm

FPGrowth FPGrowthModel PrefixSpan

Image

pyspark.ml.image

ImageSchema _ImageSchema

Distributor

pyspark.ml.torch.distributor

TorchDistributor

Utilities

pyspark.ml.util

BaseReadWrite DefaultParamsReadable DefaultParamsReader DefaultParamsWritable DefaultParamsWriter GeneralMLWriter HasTrainingSummary Identifiable MLReadable MLReader MLWritable MLWriter