# Breast Cancer Diagnosis prediction model

Project Overview
This project aims to build a machine learning model that predicts whether a breast tumor is benign (non-cancerous) or malignant (cancerous) based on various diagnostic features derived from breast mass cell nuclei.
By using this model, healthcare professionals could potentially assist in early diagnosis and improve patient treatment outcomes.

In [1]:
#import the libraries required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
#load the dataset
df=pd.read_csv("/kaggle/input/breast-cancer-wisconsin-data/data.csv")
df.head()
df.shape

(569, 33)

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 33 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       569 non-null    int64  
 1   diagnosis                569 non-null    object 
 2   radius_mean              569 non-null    float64
 3   texture_mean             569 non-null    float64
 4   perimeter_mean           569 non-null    float64
 5   area_mean                569 non-null    float64
 6   smoothness_mean          569 non-null    float64
 7   compactness_mean         569 non-null    float64
 8   concavity_mean           569 non-null    float64
 9   concave points_mean      569 non-null    float64
 10  symmetry_mean            569 non-null    float64
 11  fractal_dimension_mean   569 non-null    float64
 12  radius_se                569 non-null    float64
 13  texture_se               569 non-null    float64
 14  perimeter_se             5

In [3]:
#cleaning the data 
df = df.drop(columns=['id', 'Unnamed: 32'], errors='ignore')

In [4]:
df['diagnosis']=df['diagnosis'].map({'M':1,'B':0})

In [5]:
#checking for any null values
df.isnull().sum()

diagnosis                  0
radius_mean                0
texture_mean               0
perimeter_mean             0
area_mean                  0
smoothness_mean            0
compactness_mean           0
concavity_mean             0
concave points_mean        0
symmetry_mean              0
fractal_dimension_mean     0
radius_se                  0
texture_se                 0
perimeter_se               0
area_se                    0
smoothness_se              0
compactness_se             0
concavity_se               0
concave points_se          0
symmetry_se                0
fractal_dimension_se       0
radius_worst               0
texture_worst              0
perimeter_worst            0
area_worst                 0
smoothness_worst           0
compactness_worst          0
concavity_worst            0
concave points_worst       0
symmetry_worst             0
fractal_dimension_worst    0
dtype: int64

In [6]:
X=df.drop(columns=['diagnosis'])# it specifies that all the other columns are X except diagnosis
y=df['diagnosis']



In [7]:
X_train,X_test,y_train,y_test=train_test_split(
    X,y,test_size=0.2,random_state=42
)

In [8]:
#make the meausurement ro be standard 
from sklearn.preprocessing import StandardScaler
scalar=StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test=scalar.fit_transform(X_test)

In [9]:
#initialize the model
dt_model = DecisionTreeClassifier(random_state=42)

In [10]:
# Train the model
dt_model.fit(X_train,y_train)

In [11]:
#predict
y_pred=dt_model.predict(X_test)

In [12]:
#calculate the accuracy
print("Accuracy: ",accuracy_score(y_pred,y_test))

Accuracy:  0.9210526315789473


Conclusion:
In this project, I developed a machine learning model to predict whether a breast tumor is malignant or benign using the Breast Cancer Wisconsin dataset. After cleaning and standardizing the data, we trained a Decision Tree Classifier.The model achieved an accuracy of approximately 0.9473684210526315, showing strong predictive performance. This indicates that the features provided (such as radius, texture, and area) are highly effective in distinguishing between malignant and benign tumors.Such models can support medical professionals by providing an early, data-driven indication of potential breast cancer cases, improving the speed and accuracy of diagnosis.