## Importing Necessary Libraries

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.feature_selection import SelectKBest, chi2

## Data Collection

In [3]:
# Load the dataset
data = pd.read_csv("C:/Users/Feras/Downloads/breast-cancer.csv")

# We divide features and target variables to implement Feature Selection
X = data.drop(columns=['id', 'diagnosis'])
y = data['diagnosis']



## Data Preprocessing / Normalizing data using StandardScaler

In [4]:
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the input features to the range [0, 1]
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # Scale X_test using the same scaler


Min-Max scaling rescales the features to a defined range (often [0, 1]), while maintaining the relationships between the original data points. This is especially crucial when dealing with features of varied scales, as no feature should dominate others in terms of magnitude.

## Perform Feature Selection

In [23]:
# Initialize SelectKBest with chi-squared test

selector = SelectKBest(score_func=chi2, k=6)

# Fitting selector to training data
selector.fit(X_train_scaled, y_train)

# Get selected features for better interpretability
selected_features = X.columns[selector.get_support()]


# Selecting features
X_train_selected = selector.transform(X_train_scaled)
X_test_selected = selector.transform(X_test_scaled)  

For feature selection, the chi-squared test works best when working with categorical target variables and either numerical or categorical input characteristics. In this instance, the input features comprise both categorical and numerical characteristics linked to the diagnosis of breast cancer, while the target variable (diagnostic) is categorical, indicating whether a tumor is malignant or benign.
SelectKBest with chi-squared test is computationally efficient compared to other approaches like recurrent feature elimination (RFE), which require training the model many times. Because it assesses every feature separately, it can be used with big datasets that contain a lot of features.

## Model Building

In [6]:
mlp_classifier = MLPClassifier(
    hidden_layer_sizes=(100,),  # For this task we will use Single hidden layer with 100 neurons
    activation='relu',  # we implement ReLU activation function
    solver='adam',  # We utilize Adam optimizer
    alpha=0.0001,  # refers to L2 regularization strength
    batch_size=32,  # Batch size of 32
    learning_rate_init=0.001,  
    max_iter=200,  
    random_state=42
)


A single hidden layer with enough neurons can capture the underlying patterns in data without overfitting. The rectified linear unit (ReLU) activation function is used in MLPs because it is simple and effective. The Adam optimizer was used because it is well-suited for training MLPs on big datasets with multiple characteristics.

In [7]:
# Train the classifier
mlp_classifier.fit(X_train_selected, y_train)

# Predictions
y_pred = mlp_classifier.predict(X_test_selected)


## Evaluation Metrics

In [8]:
# Evaluate the model
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, pos_label='M')
recall = recall_score(y_test, y_pred, pos_label='M')
f1 = f1_score(y_test, y_pred, pos_label='M')

# Print evaluation metrics
print("Confusion Matrix:")
print(conf_matrix)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)

Confusion Matrix:
[[69  2]
 [ 3 40]]
Accuracy: 0.956140350877193
Precision: 0.9523809523809523
Recall: 0.9302325581395349
F1 Score: 0.9411764705882352


 The accuracy of 95.61% suggests that the model's overall performance is decent. 