# Breast Cancer Wisconsin (Diagnostic) ML Uygulaması

### Veri Seti Hakkında

<p>Bu veri seti "Meme Kanseri Wisconsin (Tanısal) Veri Seti"dir. Veri seti, meme kanseri hücrelerinin özelliklerini içerir ve hücrelerin iyi huylu (benign) veya kötü huylu (malignant) olup olmadığını sınıflandırmak için kullanılabilir.

Veri seti toplam 569 örnekten oluşur. Her bir örnek, meme kanseri hücrelerinin çeşitli özelliklerini içeren 30 özellikten oluşur. Bu özellikler, hücrelerin radyus, doku dokusu, çevresi, alanı, pürüzlülüğü, kompaklığı, içbükeylik, simetri gibi çeşitli özelliklerini temsil eder.

Her örnek aynı zamanda bir sınıf etiketine (diagnosis) sahiptir. Bu etiketler "M" (kötü huylu) veya "B" (iyi huylu) olarak iki değere sahiptir.

Bu veri seti, meme kanseri teşhisi için sınıflandırma algoritmalarının eğitilmesi ve değerlendirilmesi için yaygın olarak kullanılan bir veri setidir. Meme kanseri teşhisi gibi önemli bir tıbbi uygulamada kullanıldığı için, bu veri seti üzerinde yapılan analizler ve geliştirilen modellerin gerçek dünya uygulamalarında önemli etkileri olabilir.





</p><br>

<div style="display: flex;">
    <img src="jpg/breastcancer.png" alt="breastcancer" style="width:500px;height:200px; margin-left: 10px;">
</div>

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)


- a) radius 
- b) texture
- c) perimeter
- d) area
- e) smoothness 
- f) compactness 
- g) concavity 
- h) concave points 
- i) symmetry
- j) fractal dimension 

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
df = pd.read_csv("wdbc.csv")

In [3]:
df.head(5)

Unnamed: 0,842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
0,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
1,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
2,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
3,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678
4,843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,...,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244


In [4]:
df.columns = ['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean',
                'smoothness_mean', 'compactness_mean', 'concavity_mean', 'concave_points_mean',
                'symmetry_mean', 'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se',
                'area_se', 'smoothness_se', 'compactness_se', 'concavity_se', 'concave_points_se',
                'symmetry_se', 'fractal_dimension_se', 'radius_worst', 'texture_worst', 'perimeter_worst',
                'area_worst', 'smoothness_worst', 'compactness_worst', 'concavity_worst', 'concave_points_worst',
                'symmetry_worst', 'fractal_dimension_worst']

In [5]:
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave_points_worst,symmetry_worst,fractal_dimension_worst
0,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
1,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
2,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
3,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678
4,843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,...,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 568 entries, 0 to 567
Data columns (total 32 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       568 non-null    int64  
 1   diagnosis                568 non-null    object 
 2   radius_mean              568 non-null    float64
 3   texture_mean             568 non-null    float64
 4   perimeter_mean           568 non-null    float64
 5   area_mean                568 non-null    float64
 6   smoothness_mean          568 non-null    float64
 7   compactness_mean         568 non-null    float64
 8   concavity_mean           568 non-null    float64
 9   concave_points_mean      568 non-null    float64
 10  symmetry_mean            568 non-null    float64
 11  fractal_dimension_mean   568 non-null    float64
 12  radius_se                568 non-null    float64
 13  texture_se               568 non-null    float64
 14  perimeter_se             5

In [7]:
df.isnull().sum()

id                         0
diagnosis                  0
radius_mean                0
texture_mean               0
perimeter_mean             0
area_mean                  0
smoothness_mean            0
compactness_mean           0
concavity_mean             0
concave_points_mean        0
symmetry_mean              0
fractal_dimension_mean     0
radius_se                  0
texture_se                 0
perimeter_se               0
area_se                    0
smoothness_se              0
compactness_se             0
concavity_se               0
concave_points_se          0
symmetry_se                0
fractal_dimension_se       0
radius_worst               0
texture_worst              0
perimeter_worst            0
area_worst                 0
smoothness_worst           0
compactness_worst          0
concavity_worst            0
concave_points_worst       0
symmetry_worst             0
fractal_dimension_worst    0
dtype: int64

In [8]:
X = df.drop(['id', 'diagnosis'], axis=1) 
y = df['diagnosis']

# 'diagnosis' sütununda M (kötü huylu) ve B (iyi huylu) , " 0-1 mapping "
y = y.map({'M': 1, 'B': 0})

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
X_train

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave_points_worst,symmetry_worst,fractal_dimension_worst
248,11.520,14.93,73.87,406.3,0.10130,0.07808,0.04328,0.02929,0.1883,0.06168,...,12.65,21.19,80.88,491.8,0.1389,0.1582,0.18040,0.09608,0.2664,0.07809
88,14.640,15.24,95.77,651.9,0.11320,0.13390,0.09966,0.07064,0.2116,0.06346,...,16.34,18.24,109.40,803.6,0.1277,0.3089,0.26040,0.13970,0.3151,0.08473
334,17.060,21.00,111.80,918.6,0.11190,0.10560,0.15080,0.09934,0.1727,0.06071,...,20.99,33.15,143.20,1362.0,0.1449,0.2053,0.39200,0.18270,0.2623,0.07599
362,16.500,18.29,106.60,838.1,0.09686,0.08468,0.05862,0.04835,0.1495,0.05593,...,18.13,25.45,117.20,1009.0,0.1338,0.1679,0.16630,0.09123,0.2394,0.06469
33,16.130,17.88,107.00,807.2,0.10400,0.15590,0.13540,0.07752,0.1998,0.06515,...,20.21,27.26,132.70,1261.0,0.1446,0.5804,0.52740,0.18640,0.4270,0.12330
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,17.200,24.52,114.20,929.4,0.10710,0.18300,0.16920,0.07944,0.1927,0.06487,...,23.32,33.82,151.60,1681.0,0.1585,0.7394,0.65660,0.18990,0.3313,0.13390
106,12.360,18.54,79.01,466.7,0.08477,0.06815,0.02643,0.01921,0.1602,0.06066,...,13.29,27.49,85.56,544.1,0.1184,0.1963,0.19370,0.08442,0.2983,0.07185
270,11.290,13.04,72.23,388.0,0.09834,0.07608,0.03265,0.02755,0.1769,0.06270,...,12.32,16.18,78.27,457.5,0.1358,0.1507,0.12750,0.08750,0.2733,0.08022
435,12.870,19.54,82.67,509.2,0.09136,0.07883,0.01797,0.02090,0.1861,0.06347,...,14.45,24.38,95.14,626.9,0.1214,0.1652,0.07127,0.06384,0.3313,0.07735


In [11]:
X_test

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave_points_mean,symmetry_mean,fractal_dimension_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave_points_worst,symmetry_worst,fractal_dimension_worst
218,19.53,32.47,128.00,1223.0,0.08420,0.11300,0.11450,0.06637,0.1428,0.05313,...,27.90,45.41,180.20,2477.0,0.14080,0.4097,0.3995,0.16250,0.2713,0.07568
79,11.45,20.97,73.81,401.5,0.11020,0.09362,0.04591,0.02233,0.1842,0.07005,...,13.11,32.16,84.53,525.1,0.15570,0.1676,0.1755,0.06127,0.2762,0.08851
104,13.11,15.56,87.21,530.2,0.13980,0.17650,0.20710,0.09601,0.1925,0.07692,...,16.31,22.40,106.40,827.2,0.18620,0.4099,0.6376,0.19860,0.3147,0.14050
208,15.27,12.91,98.17,725.5,0.08182,0.06230,0.05892,0.03157,0.1359,0.05526,...,17.38,15.92,113.70,932.7,0.12220,0.2186,0.2962,0.10350,0.2320,0.07474
543,13.87,20.70,89.77,584.8,0.09578,0.10180,0.03688,0.02369,0.1620,0.06688,...,15.05,24.75,99.17,688.6,0.12640,0.2037,0.1377,0.06845,0.2249,0.08492
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
148,13.74,17.91,88.12,585.0,0.07944,0.06376,0.02881,0.01329,0.1473,0.05580,...,15.34,22.46,97.19,725.9,0.09711,0.1824,0.1564,0.06019,0.2350,0.07014
247,10.65,25.22,68.01,347.0,0.09657,0.07234,0.02379,0.01615,0.1897,0.06329,...,12.25,35.19,77.98,455.7,0.14990,0.1398,0.1125,0.06136,0.3409,0.08147
15,14.68,20.13,94.74,684.5,0.09867,0.07200,0.07395,0.05259,0.1586,0.05922,...,19.07,30.88,123.40,1138.0,0.14640,0.1871,0.2914,0.16090,0.3029,0.08216
557,14.59,22.68,96.39,657.1,0.08473,0.13300,0.10290,0.03736,0.1454,0.06147,...,15.48,27.27,105.90,733.5,0.10260,0.3171,0.3662,0.11050,0.2258,0.08004


In [12]:
y_train

248    0
88     0
334    1
362    0
33     1
      ..
71     1
106    0
270    0
435    0
102    0
Name: diagnosis, Length: 454, dtype: int64

In [13]:
y_test

218    1
79     0
104    1
208    0
543    0
      ..
148    0
247    0
15     1
557    0
245    0
Name: diagnosis, Length: 114, dtype: int64

In [14]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [15]:
X_train_scaled

array([[-0.73316907, -1.02361042, -0.73799935, ..., -0.25761464,
        -0.38719873, -0.30523587],
       [ 0.14937775, -0.94846571,  0.16257668, ...,  0.40762874,
         0.44940541,  0.06920992],
       [ 0.83391728,  0.44777149,  0.82176544, ...,  1.06341658,
        -0.45763153, -0.42365999],
       ...,
       [-0.79822861, -1.48175075, -0.80543975, ..., -0.38846719,
        -0.2686655 , -0.18511998],
       [-0.35129785,  0.09386415, -0.37612405, ..., -0.74930301,
         0.72770084, -0.34696628],
       [-1.19820336,  0.05992783, -1.14593151, ..., -0.2361109 ,
        -0.4593494 ,  0.07879663]])

In [16]:
X_test_scaled

array([[ 1.53260018,  3.22812578,  1.48794497, ...,  0.75534881,
        -0.30302296, -0.44114165],
       [-0.7529698 ,  0.44049942, -0.74046668, ..., -0.78849777,
        -0.21884718,  0.28237333],
       [-0.28340963, -0.87089697, -0.18942929, ...,  1.30590557,
         0.44253392,  3.21421619],
       ...,
       [ 0.16069246,  0.2368815 ,  0.12022082, ...,  0.7309474 ,
         0.23982491, -0.07571865],
       [ 0.13523438,  0.85500734,  0.18807244, ..., -0.03769695,
        -1.08465517, -0.19527062],
       [-0.25795155, -0.41760469, -0.31608565, ..., -0.96495045,
        -0.21025782, -0.64979367]])

In [17]:
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

Accuracy: 0.9649122807017544
Confusion Matrix:
 [[67  1]
 [ 3 43]]

Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.99      0.97        68
           1       0.98      0.93      0.96        46

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



In [18]:
from sklearn.svm import SVC

svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train_scaled, y_train)

svm_y_pred = svm_model.predict(X_test_scaled)

svm_accuracy = accuracy_score(y_test, svm_y_pred)
print("SVM Accuracy:", svm_accuracy)

svm_conf_matrix = confusion_matrix(y_test, svm_y_pred)
svm_class_report = classification_report(y_test, svm_y_pred)
print("SVM Confusion Matrix:\n", svm_conf_matrix)
print("\nSVM Classification Report:\n", svm_class_report)

SVM Accuracy: 0.9649122807017544
SVM Confusion Matrix:
 [[67  1]
 [ 3 43]]

SVM Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.99      0.97        68
           1       0.98      0.93      0.96        46

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



In [19]:
from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=3)  # k = 3 için (en yakın 3 komşu için)
knn_model.fit(X_train_scaled, y_train)

knn_y_pred = knn_model.predict(X_test_scaled)

knn_accuracy = accuracy_score(y_test, knn_y_pred)
print("KNN Accuracy:", knn_accuracy)

knn_conf_matrix = confusion_matrix(y_test, knn_y_pred)
knn_class_report = classification_report(y_test, knn_y_pred)
print("KNN Confusion Matrix:\n", knn_conf_matrix)
print("\nKNN Classification Report:\n", knn_class_report)

KNN Accuracy: 0.9649122807017544
KNN Confusion Matrix:
 [[68  0]
 [ 4 42]]

KNN Classification Report:
               precision    recall  f1-score   support

           0       0.94      1.00      0.97        68
           1       1.00      0.91      0.95        46

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



In [20]:
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)  # 100 karar ağacı kullanarak
rf_model.fit(X_train_scaled, y_train)

rf_y_pred = rf_model.predict(X_test_scaled)

rf_accuracy = accuracy_score(y_test, rf_y_pred)
print("Random Forest Accuracy:", rf_accuracy)

rf_conf_matrix = confusion_matrix(y_test, rf_y_pred)
rf_class_report = classification_report(y_test, rf_y_pred)
print("Random Forest Confusion Matrix:\n", rf_conf_matrix)
print("\nRandom Forest Classification Report:\n", rf_class_report)

Random Forest Accuracy: 0.9736842105263158
Random Forest Confusion Matrix:
 [[68  0]
 [ 3 43]]

Random Forest Classification Report:
               precision    recall  f1-score   support

           0       0.96      1.00      0.98        68
           1       1.00      0.93      0.97        46

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



In [21]:
from sklearn.naive_bayes import GaussianNB # other attribute's sürekli değişken o yüzden gauss.

nb_model = GaussianNB()
nb_model.fit(X_train_scaled, y_train)

nb_y_pred = nb_model.predict(X_test_scaled)

nb_accuracy = accuracy_score(y_test, nb_y_pred)
print("Naive Bayes Accuracy:", nb_accuracy)

nb_conf_matrix = confusion_matrix(y_test, nb_y_pred)
nb_class_report = classification_report(y_test, nb_y_pred)
print("Naive Bayes Confusion Matrix:\n", nb_conf_matrix)
print("\nNaive Bayes Classification Report:\n", nb_class_report)

Naive Bayes Accuracy: 0.9298245614035088
Naive Bayes Confusion Matrix:
 [[66  2]
 [ 6 40]]

Naive Bayes Classification Report:
               precision    recall  f1-score   support

           0       0.92      0.97      0.94        68
           1       0.95      0.87      0.91        46

    accuracy                           0.93       114
   macro avg       0.93      0.92      0.93       114
weighted avg       0.93      0.93      0.93       114

