# Breast Tumour Classification

In [25]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score as accuracy

data = pd.read_csv('breast cancer.csv')
data.drop('Unnamed: 32', inplace=True, axis=1)
data['diagnosis'] = data['diagnosis'].map({'M':1, 'B':0})
data.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,1,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,1,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,1,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,1,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [26]:
# splitting the data into X and y
X = data.drop('diagnosis', axis=1)
y = data['diagnosis']

# scaling the data
scaler = MinMaxScaler(feature_range=(-1, 1))
X_scaled = scaler.fit_transform(X)

# splitting the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.33, random_state=42)

## Logistic Regression Model

In [27]:
from logistic_regression import LogisticRegression

# creating an instance of the model
lrmodel = LogisticRegression()
lrmodel.fit(X_train, y_train, epochs=500, lr=0.5)

y_pred = lrmodel.predict(X_test)

print('Accuracy:', accuracy(y_test, y_pred))

Accuracy: 0.973404255319149


## SVM Model

In [30]:
from sklearn.svm import SVC

# creating an instance of the model
svmmodel = SVC()
svmmodel.fit(X_train, y_train)

y_pred = svmmodel.predict(X_test)

print('Accuracy:', accuracy(y_test, y_pred))

Accuracy: 0.9787234042553191


## Neural Network Model

In [31]:
from sklearn.neural_network import MLPClassifier

# creating an instance of the model
mlpmodel = MLPClassifier()
mlpmodel.fit(X_train, y_train)

y_pred = mlpmodel.predict(X_test)

print('Accuracy:', accuracy(y_test, y_pred))

Accuracy: 0.9680851063829787




# Summary

## Objective

The goal of this project was to classify a breast tumor dataset into two classes, benign or malignant, using various machine learning models, including Logistic Regression, Support Vector Machine (SVM), and Neural Network. The primary focus was on achieving accurate predictions and evaluating the performance of each model.

## Dataset Overview

The dataset comprises a total of 569 biopsy samples, with 357 labeled as benign and 212 as malignant. Each cell nucleus is characterized by ten real-valued features, including mean, standard error, and "worst" (largest of the three largest values), resulting in a total of 30 features computed for each image.

## Model Accuracies

The project reported the following accuracies for each machine learning model:

- Logistic Regression: 97.34%
- Support Vector Machine (SVM): 97.87%
- Neural Network: 96.81%

## Note

It is noteworthy that the logistic regression function was not used directly; instead, the code was implemented from scratch.