<div style='text-align:center'>
  <h2 style="color:Orange">
    Fully Automated Code
    <img
      src="https://cdn-icons-png.flaticon.com/512/2825/2825945.png"
      style="width:60px;margin-right: 10px"
    />
  </h2>
  <p>All you got to do is to change only the dataset name and the desired target name.</p>
</div>

<hr />

### Importing the dataset

In [1]:
import pandas as pd

Data_set = pd.read_csv("Breast_Cancer.csv", delimiter=",")
print(Data_set.shape)  # 4024 rows, 16 columns
print(Data_set)

(4024, 16)
      Age   Race Marital Status T Stage  N Stage 6th Stage  \
0      68  White        Married       T1      N1       IIA   
1      50  White        Married       T2      N2      IIIA   
2      58  White       Divorced       T3      N3      IIIC   
3      58  White        Married       T1      N1       IIA   
4      47  White        Married       T2      N1       IIB   
...   ...    ...            ...      ...     ...       ...   
4019   62  Other        Married       T1      N1       IIA   
4020   56  White       Divorced       T2      N2      IIIA   
4021   68  White        Married       T2      N1       IIB   
4022   58  Black       Divorced       T2      N1       IIB   
4023   46  White        Married       T2      N1       IIB   

                  differentiate  Grade   A Stage  Tumor Size Estrogen Status  \
0         Poorly differentiated      3  Regional           4        Positive   
1     Moderately differentiated      2  Regional          35        Positive   
2   

<hr />

### Choosing, retrieving and formatting the target/goal (i.e., differentiate feature)

In [2]:
target_name = 'Status'

target = Data_set[target_name].tolist()
target = list(set(target))  # we used a set to retrieve the unique names only

print(target)

['Alive', 'Dead']


<hr />

### Getting the desired features (i.e., columns) from the dataset

In [3]:
print(f'{Data_set.shape[0]} rows, {Data_set.shape[1]} columns\n')  # 4024, 16

Features_names = Data_set.columns[0:Data_set.shape[1]-1]  # all except the 'Status' column
print(Features_names)

4024 rows, 16 columns

Index(['Age', 'Race', 'Marital Status', 'T Stage ', 'N Stage', '6th Stage',
       'differentiate', 'Grade', 'A Stage', 'Tumor Size', 'Estrogen Status',
       'Progesterone Status', 'Regional Node Examined',
       'Reginol Node Positive', 'Survival Months'],
      dtype='object')


<hr />

### Getting the values of the retrieved columns

In [4]:
X = Data_set[Features_names].values
print(X)

[[68 'White' 'Married' ... 24 1 60]
 [50 'White' 'Married' ... 14 5 62]
 [58 'White' 'Divorced' ... 14 7 75]
 ...
 [68 'White' 'Married' ... 11 3 69]
 [58 'Black' 'Divorced' ... 11 1 72]
 [46 'White' 'Married' ... 7 2 100]]


<hr />

### Data Preprocessing Step (Categorical data to numeric data, for distance functions)
##### LabelEncoder() is used to encode categorical data as numeric data

In [5]:
from sklearn import preprocessing

for index, feature in enumerate(Features_names):

  # Ignore numerical features
  if (Data_set.dtypes[feature] == 'object'):
    print(f'{index}: {feature} → {Data_set.dtypes[feature]}')

    # 1) Get the unique values
    label = Data_set[feature].tolist()
    label = list(set(label))
    print(label, end='\n\n')

    # 2) Convert to numerical
    label_numeric = preprocessing.LabelEncoder()
    label_numeric.fit(label)
    X[:, index] = label_numeric.transform(X[:, index])


print(X)

1: Race → object
['Black', 'White', 'Other']

2: Marital Status → object
['Widowed', 'Single ', 'Married', 'Separated', 'Divorced']

3: T Stage  → object
['T4', 'T2', 'T3', 'T1']

4: N Stage → object
['N1', 'N3', 'N2']

5: 6th Stage → object
['IIB', 'IIIC', 'IIA', 'IIIA', 'IIIB']

6: differentiate → object
['Well differentiated', 'Undifferentiated', 'Moderately differentiated', 'Poorly differentiated']

8: A Stage → object
['Regional', 'Distant']

10: Estrogen Status → object
['Negative', 'Positive']

11: Progesterone Status → object
['Negative', 'Positive']

[[68 2 1 ... 24 1 60]
 [50 2 1 ... 14 5 62]
 [58 2 0 ... 14 7 75]
 ...
 [68 2 1 ... 11 3 69]
 [58 0 0 ... 11 1 72]
 [46 2 1 ... 7 2 100]]


<hr />

### Splitting the dataset into training and testing sets

In [9]:
import pandas as pd
from sklearn.model_selection import train_test_split

Y = Data_set[target_name]  # Terget/Goal
print(Y)

# Dimensions of the dataset
print(Data_set.shape)  # 4024, 16

# Split the data into 80% for training and 20% for testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)  # 3 samples per iteration

print(X_train.shape)
print(X_test.shape)

print(Y_train.shape)
print(Y_test.shape)

0       Alive
1       Alive
2       Alive
3       Alive
4       Alive
        ...  
4019    Alive
4020    Alive
4021    Alive
4022    Alive
4023    Alive
Name: Status, Length: 4024, dtype: object
(4024, 16)
(3219, 15)
(805, 15)
(3219,)
(805,)


<hr />

### KNN (K-Nearest Neighbors) Classifier

In [7]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from math import sqrt

# Choosing the best value for n. 'sqrt(n)'
n = int(sqrt(Data_set.shape[0]))
print(n)

# Apply the KNN classifier with n neighbors
neigh = KNeighborsClassifier(n_neighbors=n)  


# Train the Model using Training Sets (i.e. Classified data)
neigh.fit(X_train, Y_train)


# Do prediction on the testing set
predicted = neigh.predict(X_test)
print(predicted.shape)
print("\nPredicted by KNN:\n", predicted)


# Compare the predicted results with the predefined data (i.e., Accuracy)
results = metrics.confusion_matrix(Y_test, predicted)
print("\nKNN confusion matrix:\n", results)

print("\nKNN Accuracy: ", metrics.accuracy_score(Y_test, predicted))

63
(805,)

Predicted by KNN:
 ['Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead'
 'Alive' 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Dead' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead'
 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Al

<hr />

### Naive Bayes Classifier

In [8]:
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics

# Create a GaussianNB Classifier
model = GaussianNB()

# Train the Model using Training Sets (i.e. Classified data)
model.fit(X_train, Y_train)


# Do prediction on the testing set
predicted = model.predict(X_test)
print(predicted.shape)
print("\nPredicted by Naive Bayes:\n", predicted)


# Compare the predicted results with the predefined data (i.e., Accuracy)
results = metrics.confusion_matrix(Y_test, predicted)
print("\nNaive Bayes confusion matrix:\n", results)

print("\nNaive Bayes Accuracy: ", metrics.accuracy_score(Y_test, predicted))

(805,)

Predicted by Naive Bayes:
 ['Alive' 'Alive' 'Dead' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead'
 'Alive' 'Alive' 'Dead' 'Alive' 'Dead' 'Alive' 'Dead' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Dead' 'Alive' 'Dead' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Dead' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Dead' 'Dead' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive' 'Alive'
 'Alive' 'Dead' 'Dead' 'Alive'