># ***Pipe_Line***

- In machine learning, a pipeline is a sequence of data processing steps.
- A pipeline allows you to combine multiple data preprocessing and model training steps into a single object.
- Making it easier to organize and manage your machine learning code.

>**Here are the key components of a pipeline:**

**`Data Preprocessing Steps:`**
- Pipelines typically start with data preprocessing steps, such as feature scaling, feature encoding, handling missing values, or dimensionality reduction.

**`Model Training:`**
- After the data preprocessing steps, the pipeline includes the training of a machine learning model.
- This can be a classifier for classification tasks, a regressor for regression tasks.
  
**`Model Evaluation:`**
- Once the model is trained, the pipeline often incorporates steps for evaluating its performance.
- This may involve metrics calculation, cross-validation, or any other evaluation technique to assess the model's effectiveness.

**`Predictions:`**
After the model has been evaluated, the pipeline allows you to make predictions on new, unseen data using the trained model.

In [1]:
# import Libraries
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
# Load the Titanic dataset from Seaborn
titanic_data = sns.load_dataset('titanic')

In [3]:
# Select features and target variable
X = titanic_data[['pclass', 'sex', 'age', 'fare', 'embarked']]
y = titanic_data['survived']


In [4]:
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
# Define the column transformer for imputing missing values
numeric_features = ['age', 'fare']
categorical_features = ['pclass', 'sex', 'embarked']

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median'))
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

In [6]:
# Create a pipeline with the preprocessor and RandomForestClassifier
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=42))
])

In [7]:
# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

In [8]:
# Make predictions on the test data
y_pred = pipeline.predict(X_test)

In [9]:
# Calculate accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.7821229050279329


**Explaination:**

In this example, we start by loading the Titanic dataset from Seaborn using sns.load_dataset('titanic'). We then select the relevant features and target variable (survived) to train our model. Next, we split the data into training and test sets using train_test_split from scikit-learn.

The pipeline is created using the Pipeline class from scikit-learn. It consists of three steps:

Data preprocessing step: The SimpleImputer is used to handle missing values by replacing them with the most frequent value in each column.

Feature encoding step: The OneHotEncoder is used to encode categorical variables (`sex and embarked`) as binary features.

Model training step: The RandomForestClassifier is used as the machine learning model for classification.

We then fit the pipeline on the training data using pipeline.fit(X_train, y_train). Afterward, we make predictions on the test data using pipeline.predict(`X_test`).

Finally, we calculate the accuracy score by comparing the predicted values (`y_pred`) with the actual values (`y_test`).

Note that you may need to install Seaborn (`pip install seaborn`) if it's not already installed in your environment.