# **Breast Cancer Wisconsin (Diagnostic) Dataset Preprocessing**

In this tutorial, we will preprocess the **Breast Cancer Wisconsin (Diagnostic) Dataset** from the UCI Machine Learning Repository. We will apply Min-Max Scaling to the features to normalize the data.

---

Step 1: Import Necessary Libraries

We start by importing the necessary libraries: `fetch_ucirepo` to fetch the dataset, `MinMaxScaler` for normalization, and `pandas` for data manipulation.



In [None]:
# Import necessary libraries
from ucimlrepo import fetch_ucirepo
from sklearn.preprocessing import MinMaxScaler
import pandas as pd


Step 2: Fetch the Dataset
We fetch the Breast Cancer Wisconsin (Diagnostic) Dataset from the UCI repository using the fetch_ucirepo function.

In [None]:
# Fetch dataset from UCI repository
breast_cancer_wisconsin_diagnostic = fetch_ucirepo(id=17)


Step 3: Extract Features and Target
We extract the features (denoted as X) and the target variable (denoted as y) from the dataset.

In [None]:
# Extract features (X) and target (y)
X = breast_cancer_wisconsin_diagnostic.data.features
y = breast_cancer_wisconsin_diagnostic.data.targets


Step 4: Display Metadata and Variable Information
To understand the dataset better, we display the metadata and the information regarding the variables (features and target).

In [None]:
# Display metadata and variable information
print("Metadata:\n", breast_cancer_wisconsin_diagnostic.metadata)
print("\nVariable Information:\n", breast_cancer_wisconsin_diagnostic.variables)


Step 5: Initialize the Min-Max Scaler
We initialize the MinMaxScaler from sklearn.preprocessing to normalize the feature values to the range [0, 1].

In [None]:
# Initialize the MinMaxScaler
scaler = MinMaxScaler()


Step 6: Apply Min-Max Scaling
We apply Min-Max Scaling to the features to scale the feature values between 0 and 1.

In [None]:
# Apply Min-Max Scaling to the features
X_scaled = scaler.fit_transform(X)


Step 7: Convert Scaled Features Back to DataFrame
For readability, we convert the scaled features back into a pandas DataFrame, retaining the original column names.

In [None]:
# Convert the scaled features back into a DataFrame for readability (optional)
X_scaled_df = pd.DataFrame(X_scaled, columns=X.columns)


Step 8: Display the First Few Rows of Scaled Data
Finally, we display the first few rows of the scaled dataset to verify the normalization process.

In [None]:
# Display the first few rows of the scaled dataset
print("\nScaled Data:\n", X_scaled_df.head())
