# Breast Cancer Detection using Convolutional Neural Networks (CNN)

This project aims to detect breast cancer using the Wisconsin Breast Cancer dataset and a Convolutional Neural Network (CNN). CNNs are typically used for image data, but here we reshape tabular data to feed into a CNN. This approach allows us to explore how deep learning models can work with structured datasets.

## 1. Load the Dataset

In [None]:
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head()

## 2. Exploratory Data Analysis (EDA)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Class distribution
sns.countplot(x='target', data=df)
plt.title('Class Distribution')
plt.show()

# Correlation heatmap
plt.figure(figsize=(12,10))
sns.heatmap(df.corr(), cmap='coolwarm', annot=False)
plt.title('Feature Correlation Heatmap')
plt.show()

## 3. Preprocessing

In [None]:
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.utils import to_categorical

# Features and labels
x = df.drop('target', axis=1).values
y = df['target'].values

# Scaling features
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

# Reshape for CNN
x_cnn = x_scaled.reshape(-1, 30, 1, 1)
y_cnn = to_categorical(y)

# Split data
x_train, x_test, y_train, y_test = train_test_split(x_cnn, y_cnn, test_size=0.2, random_state=42)

## 4. Build and Train CNN Model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,1), activation='relu', input_shape=(30,1,1)))
model.add(MaxPooling2D(pool_size=(2,1)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=20, batch_size=16, validation_data=(x_test, y_test))

## 5. Evaluate the Model

In [None]:
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {accuracy:.4f}")

## 6. Plot Training History

In [None]:
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.title('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Loss')
plt.legend()
plt.show()

## 7. Confusion Matrix and Classification Report

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

cm = confusion_matrix(y_true, y_pred_classes)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

print(classification_report(y_true, y_pred_classes))

## 8. Conclusion
This project demonstrated how to use a CNN to classify breast cancer using structured tabular data. Despite CNNs being primarily designed for image data, reshaping tabular features allows us to leverage their power in detecting patterns. The model achieved strong performance and provides a foundation for further enhancement with feature engineering or advanced deep learning techniques.