# Q1. You are working on a mach#ne learning project where you have a dataset containing numer#cal and categorical features. You have identified that some of the features are highly correlated and there are missing values in some of the columns. You want to build a pipeline that automates the feature engineering process and handles the missing values.

# Design a pipeline that includes the following steps"

- Use an automated feature selection method to identify the important features in the dataset.

- Create a numerical pipeline that includes the following steps.

- Impute the missing values in the numerical columns using the mean of the column values.

- Scale the numerical columns using standardisation.

- Create a categorical pipeline that includes the following steps"

- Impute the missing values in the categorical columns using the most frequent value of the column.

- One-hot encode the categorical columns.

- Combine the numerical and categorical pipelines using a ColumnTransformer.

- Use a Random Forest Classifier to build the final model.

- Evaluate the accuracy of the model on the test dataset.

Here is a pipeline that includes the steps you mentioned:

```python
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier

# Automated feature selection method
# Not included in the pipeline as it is not a transformer
# You can use any feature selection method of your choice
# and select the important features before building the pipeline

# Numerical pipeline
numerical_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Categorical pipeline
categorical_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

# ColumnTransformer to combine the numerical and categorical pipelines
preprocessor = ColumnTransformer([
    ('num', numerical_pipeline, numerical_cols),
    ('cat', categorical_pipeline, categorical_cols)
])

# Final pipeline with preprocessor and classifier
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier())
])

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Evaluate the accuracy of the model on the test dataset
score = pipeline.score(X_test, y_test)
```

# Q2. Build a pipeline that includes a random forest classifier and a logistic regression classifier, and then use a voting classifier to combine their predictions. Train the pipeline on the iris dataset and evaluate its accuracy.

In [1]:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline that includes a Random Forest Classifier and a Logistic Regression Classifier
rf_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('rf', RandomForestClassifier(random_state=42))
])

lr_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('lr', LogisticRegression(random_state=42))
])

# Use a Voting Classifier to combine the predictions of the Random Forest Classifier and the Logistic Regression Classifier
voting_pipeline = VotingClassifier(
    estimators=[('rf', rf_pipeline), ('lr', lr_pipeline)],
    voting='soft'
)

# Train the pipeline on the iris dataset
voting_pipeline.fit(X_train, y_train)

# Evaluate the accuracy of the pipeline on the test dataset
y_pred = voting_pipeline.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 1.00
