from sklearn.pipeline import Pipeline

#### Pipeline Functions

The sklearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms and estimators.

- pipeline.FeatureUnion(transformer_list[, …])	Concatenates results of multiple transformer objects.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html#sklearn.pipeline.FeatureUnion

- pipeline.Pipeline(steps[, memory, verbose])	Pipeline of transforms with a final estimator.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

- pipeline.make_pipeline(\*steps, \*\*kwargs)	Construct a Pipeline from the given estimators.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html#sklearn.pipeline.make_pipeline
- pipeline.make_union(\*transformers, \*\*kwargs)	Construct a FeatureUnion from the given transformers.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_union.html#sklearn.pipeline.make_union

#### Pipeline for Classifier

clf | ('classifier', ClassifierName(options))

#### Pipeline for PCA

pca | ('PCA', PCA(options))

#### Pipeline w/ GridSearch

- set GridSearch parameters
- run GridSearch using estimator=pipeline name
- fit using GridSearch 

#### Naming pipelines for looping

pipelines = [pipe_svm, pipe_tree, pipe_rf]
pipeline_names = ['Support Vector Machine','Decision Tree','Random Forest']

# Loop to fit pipelines
for pipe in pipelines:
    print(pipe)
    pipe.fit(X_train, y_train)

# Compare accuracies
for index, val in enumerate(pipelines):
    print('%s pipeline test accuracy: %.3f' % (pipeline_names[index], val.score(X_test, y_test)))

#### Sample RF Pipeline w/ GridSearch


pipe_rf = Pipeline([('pca', PCA(n_components=27)),
            ('clf', RandomForestClassifier(random_state = 123))])

# Set grid search params
param_grid_forest = [ 
  {'clf__n_estimators': [120],
   'clf__criterion': ['entropy', 'gini'], 
   'clf__max_depth': [4, 5, 6],  
   'clf__min_samples_leaf':[0.05 ,0.1, 0.2],  
   'clf__min_samples_split':[0.05 ,0.1, 0.2]
  }
]

# Construct grid search
gs_rf = GridSearchCV(estimator=pipe_rf,
            param_grid=param_grid_forest,
            scoring='accuracy',
            cv=3, verbose=2, return_train_score = True)

# Fit using grid search
gs_rf.fit(X_train, y_train)

# Best accuracy
print('Best accuracy: %.3f' % gs_rf.best_score_)

# Best params
print('\nBest params:\n', gs_rf.best_params_)