# New Section

##Abstarct
This study presents a deep learning approach using PyTorch to predict breast cancer diagnoses from clinical and imaging-derived features. A publicly available breast cancer dataset was preprocessed by handling missing values, encoding categorical variables, and scaling numerical features. An artificial neural network (ANN) with two hidden layers was designed, employing ReLU activations and a sigmoid output for binary classification. The model was trained for 100 epochs using binary cross-entropy loss and the Adam optimizer. Evaluation on the held-out test set achieved an accuracy of 96.5%, demonstrating the model’s strong predictive capability. These results indicate that PyTorch-based deep learning models can effectively classify malignant and benign breast cancer cases. The framework can be further extended with hyperparameter optimization, alternative architectures, or cross-validation for enhanced robustness, and it highlights the potential of deep learning to support computer-aided diagnostic systems in oncology.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd

In [14]:
df=pd.read_csv('/content/Breast_cancer_dataset.csv')

## Data preprocessing

### Subtask:
Handle missing values if any, encode categorical variables (the 'diagnosis' column), and scale the features. Split the data into training and testing sets.


**Reasoning**:
Inspect the DataFrame for missing values and handle the 'Unnamed: 32' column which has only missing values. Encode the 'diagnosis' column and split the data into features and target.



In [15]:
print(df.isnull().sum())

# The column 'Unnamed: 32' has only missing values, so we drop it.
df = df.drop('Unnamed: 32', axis=1)

# Encode the 'diagnosis' column (M=1, B=0)
df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

# Separate features (X) and target (y), dropping 'id'
X = df.drop(['id', 'diagnosis'], axis=1)
y = df['diagnosis']

id                           0
diagnosis                    0
radius_mean                  0
texture_mean                 0
perimeter_mean               0
area_mean                    0
smoothness_mean              0
compactness_mean             0
concavity_mean               0
concave points_mean          0
symmetry_mean                0
fractal_dimension_mean       0
radius_se                    0
texture_se                   0
perimeter_se                 0
area_se                      0
smoothness_se                0
compactness_se               0
concavity_se                 0
concave points_se            0
symmetry_se                  0
fractal_dimension_se         0
radius_worst                 0
texture_worst                0
perimeter_worst              0
area_worst                   0
smoothness_worst             0
compactness_worst            0
concavity_worst              0
concave points_worst         0
symmetry_worst               0
fractal_dimension_worst      0
Unnamed:

**Reasoning**:
Scale the features and split the data into training and testing sets.



In [16]:
# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

## Model definition

### Subtask:
Define a simple deep learning model using PyTorch. This could be a feedforward neural network for this dataset.


**Reasoning**:
Define a PyTorch neural network class for binary classification.



In [17]:
class BreastCancerNet(nn.Module):
    def __init__(self, input_features):
        super(BreastCancerNet, self).__init__()
        self.fc1 = nn.Linear(input_features, 64)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.sigmoid(x)
        return x

# Initialize the model
input_features = X_train.shape[1]
model = BreastCancerNet(input_features)

print(model)

BreastCancerNet(
  (fc1): Linear(in_features=30, out_features=64, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=64, out_features=32, bias=True)
  (fc3): Linear(in_features=32, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)


## Model training

### Subtask:
Train the defined model using the training data. This involves defining a loss function, an optimizer, and iterating over the training data for a number of epochs.


**Reasoning**:
Convert the training data to PyTorch tensors, create a DataLoader, define the loss function and optimizer, and then train the model for a specified number of epochs, printing the loss periodically.



In [18]:
# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1) # Ensure correct shape

# Create DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Define Loss Function and Optimizer
criterion = nn.BCELoss() # Binary Cross-Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training Loop
epochs = 100
model.train() # Set model to training mode

for epoch in range(epochs):
    for inputs, labels in train_loader:
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')


Epoch [10/100], Loss: 0.0031
Epoch [20/100], Loss: 0.0046
Epoch [30/100], Loss: 0.1519
Epoch [40/100], Loss: 0.0003
Epoch [50/100], Loss: 0.0072
Epoch [60/100], Loss: 0.0007
Epoch [70/100], Loss: 0.0233
Epoch [80/100], Loss: 0.0000
Epoch [90/100], Loss: 0.0003
Epoch [100/100], Loss: 0.0003


## Model evaluation

### Subtask:
Evaluate the trained model on the testing data and calculate the accuracy of the predictions.


**Reasoning**:
Convert test data to tensors, set the model to evaluation mode, and evaluate the model on the test data to calculate accuracy.



In [19]:
# Convert test data to PyTorch tensors
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)

# Set model to evaluation mode
model.eval()

# Disable gradient calculation
with torch.no_grad():
    # Get predictions
    outputs = model(X_test_tensor)
    # Apply threshold to get binary predictions

    predicted = (outputs > 0.5).float()

    # Calculate accuracy
    correct = (predicted == y_test_tensor).sum().item()
    accuracy = correct / y_test_tensor.size(0)

print(f'Accuracy of the model on the test data: {accuracy:.4f}')

Accuracy of the model on the test data: 0.9737


## Summary:

### Data Analysis Key Findings

*   The 'Unnamed: 32' column was removed from the dataset as it contained only missing values.
*   The 'diagnosis' column was successfully encoded into numerical values, with 'M' mapped to 1 and 'B' mapped to 0.
*   The features were scaled using StandardScaler, and the data was split into training and testing sets with a test size of 20%.
*   A deep learning model with two hidden layers and a sigmoid output layer was defined using PyTorch.
*   The model was trained for 100 epochs using the Binary Cross-Entropy Loss function and the Adam optimizer.
*   The accuracy of the trained model on the test data is 0.9649.

### Insights or Next Steps

*   The high accuracy on the test set suggests the model is performing well in predicting breast cancer based on the provided features.
*   Further steps could involve exploring different model architectures, hyperparameter tuning, or using cross-validation to ensure the robustness of the model's performance.
