#### Q1. You are working on a machine learning project where you have a dataset containing numerical and categorical features. You have identified that some of the features are highly correlated and there are missing values in some of the columns. You want to build a pipeline that automates the feature engineering process and handles the missing values.

#### Q2. Build a pipeline that includes a random forest classifier and a logistic regression classifier, and then use a voting classifier to combine their predictions. Train the pipeline on the iris dataset and evaluate its accuracy.

In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create individual pipelines for classifiers
rf_pipeline = Pipeline([
    ('rf_classifier', RandomForestClassifier(random_state=42))
])

lr_pipeline = Pipeline([
    ('lr_classifier', LogisticRegression(random_state=42))
])

# Create a Voting Classifier that combines the two pipelines
voting_classifier = VotingClassifier(estimators=[
    ('random_forest', rf_pipeline),
    ('logistic_regression', lr_pipeline)
], voting='soft')  # Use 'soft' voting for probability-based predictions

# Train the Voting Classifier on the training data
voting_classifier.fit(X_train, y_train)

# Make predictions using the Voting Classifier
y_pred = voting_classifier.predict(X_test)

# Evaluate the accuracy of the Voting Classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of Voting Classifier:", accuracy)

Accuracy of Voting Classifier: 1.0
