## Automating Data Cleaning in Python

    Task: Basic Pipeline with Scaling
1. Objective: Create a pipeline that scales numerical features in a dataset.
2. Steps:
    - Load a sample dataset with Pandas.
    - Define a pipeline using Pipeline from sklearn.pipeline .
    - Use StandardScaler to scale features.

In [5]:
# Write your code from here
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Sample dataset
data = {
    'Feature1': [10, 20, 30, 40, 50],
    'Feature2': [5, 15, 25, 35, 45]
}
df = pd.DataFrame(data)

# Define pipeline with StandardScaler
pipeline = Pipeline([
    ('scaler', StandardScaler())
])

# Apply pipeline to scale data
scaled_features = pipeline.fit_transform(df)

# Convert back to DataFrame for better visualization
df_scaled = pd.DataFrame(scaled_features, columns=df.columns)
print(df_scaled)

   Feature1  Feature2
0 -1.414214 -1.414214
1 -0.707107 -0.707107
2  0.000000  0.000000
3  0.707107  0.707107
4  1.414214  1.414214


    Task: Pipeline with Imputation
1. Objective: Automate data cleaning by handling missing values.
2. Steps:
    - Load a dataset with missing values.
    - Define a pipeline to use SimpleImputer for filling missing values.

In [6]:
# Write your code from here
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Sample dataset with missing values
data = {
    'Age': [25, 30, None, 22, 28, None, 35],
    'Income': [50000, None, 62000, 58000, None, 52000, 61000]
}
df = pd.DataFrame(data)

# Define pipeline with SimpleImputer (mean strategy)
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean'))
])

# Apply pipeline to fill missing values
imputed_data = pipeline.fit_transform(df)

# Convert back to DataFrame for better visualization
df_imputed = pd.DataFrame(imputed_data, columns=df.columns)
print(df_imputed)

    Age   Income
0  25.0  50000.0
1  30.0  56600.0
2  28.0  62000.0
3  22.0  58000.0
4  28.0  56600.0
5  28.0  52000.0
6  35.0  61000.0
