# Breast Cancer Recurrance Prediction - Fuzzy System Implementation
The system is built to predict if the breast cancer patient will have recurring breast cancer in the future. <br>

To implement the system, we first have install the necessary libraries

In [1]:
%pip install pandas
%pip install numpy
%pip install -U scikit-fuzzy

Collecting pandas
  Obtaining dependency information for pandas from https://files.pythonhosted.org/packages/ae/d9/3741b344f57484b423cd22194025a8489992ad9962196a62721ef9980045/pandas-2.1.4-cp312-cp312-win_amd64.whl.metadata
  Downloading pandas-2.1.4-cp312-cp312-win_amd64.whl.metadata (18 kB)
Collecting numpy<2,>=1.26.0 (from pandas)
  Obtaining dependency information for numpy<2,>=1.26.0 from https://files.pythonhosted.org/packages/28/75/3b679b41713bb60e2e8f6e2f87be72c971c9e718b1c17b8f8749240ddca8/numpy-1.26.2-cp312-cp312-win_amd64.whl.metadata
  Downloading numpy-1.26.2-cp312-cp312-win_amd64.whl.metadata (61 kB)
     ---------------------------------------- 0.0/61.2 kB ? eta -:--:--
     -------------------------- ------------- 41.0/61.2 kB 1.9 MB/s eta 0:00:01
     ---------------------------------------- 61.2/61.2 kB 1.6 MB/s eta 0:00:00
Collecting pytz>=2020.1 (from pandas)
  Obtaining dependency information for pytz>=2020.1 from https://files.pythonhosted.org/packages/32/4d/aaf7e

ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'c:\\Python312\\Scripts\\f2py.exe' -> 'c:\\Python312\\Scripts\\f2py.exe.deleteme'


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip






[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting scikit-fuzzy
  Downloading scikit-fuzzy-0.4.2.tar.gz (993 kB)
     ---------------------------------------- 0.0/994.0 kB ? eta -:--:--
     ---------------------------------------- 10.2/994.0 kB ? eta -:--:--
     - ----------------------------------- 41.0/994.0 kB 487.6 kB/s eta 0:00:02
     ------------ ------------------------- 317.4/994.0 kB 2.8 MB/s eta 0:00:01
     --------------------------- ---------- 727.0/994.0 kB 4.6 MB/s eta 0:00:01
     -------------------------------------- 994.0/994.0 kB 4.8 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting scipy>=0.9.0 (from scikit-fuzzy)
  Obtaining dependency information for scipy>=0.9.0 from https://files.pythonhosted.or


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Then we import the following libraries:
* pandas: To manipulate dataframe
* numpy: To perform array calculations
* scikit-fuzzy: To implement the fuzzy system
* control: To determine the fuzzy variables

In [1]:
import pandas as pd
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl

Then we load the dataset from the uciml repository. We will be using the Breast Cancer Data UCI Repository

In [2]:
from ucimlrepo import fetch_ucirepo 
breast_cancer = fetch_ucirepo(id=14)
X = breast_cancer.data.features 
y = breast_cancer.data.targets 
data = X
data['Class'] = y  # Adding the target variable 'Class'
data

Unnamed: 0,age,menopause,tumor-size,inv-nodes,node-caps,deg-malig,breast,breast-quad,irradiat,Class
0,30-39,premeno,30-34,0-2,no,3,left,left_low,no,no-recurrence-events
1,40-49,premeno,20-24,0-2,no,2,right,right_up,no,no-recurrence-events
2,40-49,premeno,20-24,0-2,no,2,left,left_low,no,no-recurrence-events
3,60-69,ge40,15-19,0-2,no,2,right,left_up,no,no-recurrence-events
4,40-49,premeno,0-4,0-2,no,2,right,right_low,no,no-recurrence-events
...,...,...,...,...,...,...,...,...,...,...
281,30-39,premeno,30-34,0-2,no,2,left,left_up,no,recurrence-events
282,30-39,premeno,20-24,0-2,no,3,left,left_up,yes,recurrence-events
283,60-69,ge40,20-24,0-2,no,1,right,left_up,no,recurrence-events
284,40-49,ge40,30-34,5-Mar,no,3,left,left_low,no,recurrence-events


Then we will clean the data:
* Select only the age, tumor-size, inv-nodes, deg-malig and Class columns
* For tumor-size and inv-nodes, some data is imported as dates. Convert these dates (EG: 5-Mar) to range (EG: 3-5).
* Clean the age, tumor-size and inv-nodes column by replacing range (EG: 3-5) with midpoint (EG: 4)
* Clean Class column by replacing no-recurrence-events and recurrence-events with 0 and 1 respectively
* Convert datatype of age, tumor-size,inv-nodes and Class to float
* Save clean data as CSV file named "clean-breast-cancer.csv"

In [3]:
#Select only the age, tumor-size, inv-nodes, deg-malig and Class columns
data_clean = data[["age","tumor-size","inv-nodes", "deg-malig", "Class"]]

from datetime import datetime
for index, row in data_clean.iterrows():

    #Clean age Column
    age_list = row["age"].split("-")    #Extract range values
    age_average = (float(age_list[0]) + float(age_list[1])) / 2 #Calculate midpoint
    data_clean.at[index, "age"] = age_average   #Replacing range with age midpoint

    #Clean tumor-size column
    tumor_size_list = row["tumor-size"].split("-")  #Extract range values
    try:    #Convert these dates to range
        tumor_size_list[1] = datetime.strptime(tumor_size_list[1], '%b').month
    except:
        pass
    tumor_size_average = (float(tumor_size_list[0]) + float(tumor_size_list[1])) /2 #Calculate midpoint
    data_clean.at[index, "tumor-size"] = tumor_size_average #Replacing range with age midpoint

    #Clean inv-nodes column
    inv_nodes_list = row["inv-nodes"].split("-")  #Extract range values
    try:    #Convert these dates to range
        inv_nodes_list[1] = datetime.strptime(inv_nodes_list[1], '%b').month
    except:
        pass
    inv_nodes_average = (float(inv_nodes_list[0]) + float(inv_nodes_list[1])) /2    #Calculate midpoint
    data_clean.at[index, "inv-nodes"] = inv_nodes_average   #Replacing range with age midpoint
    
    #Clean Class column
    data_clean.at[index, "Class"] = 0 if row["Class"] == "no-recurrence-events" else 1  #replace no-recurrence-events and recurrence-events with 0 and 1 respectively

#Convert datatype of age, tumor-size,inv-nodes and Class to float
data_clean["age"] = data_clean["age"].astype(float)
data_clean["tumor-size"] = data_clean["tumor-size"].astype(float)
data_clean["inv-nodes"] = data_clean["inv-nodes"].astype(float)
data_clean["Class"] = data_clean["Class"].astype(float)

#Save clean data as CSV file named "clean-breast-cancer.csv"
data_clean.to_csv('clean-breast-cancer.csv', index=False, header=True)

#Display the clean data
data_clean

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_clean["age"] = data_clean["age"].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_clean["tumor-size"] = data_clean["tumor-size"].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_clean["inv-nodes"] = data_clean["inv-nodes"].astype(float)
A value is trying to be

Unnamed: 0,age,tumor-size,inv-nodes,deg-malig,Class
0,34.5,32.0,1.0,3,0.0
1,44.5,22.0,1.0,2,0.0
2,44.5,22.0,1.0,2,0.0
3,64.5,17.0,1.0,2,0.0
4,44.5,2.0,1.0,2,0.0
...,...,...,...,...,...
281,34.5,32.0,1.0,2,1.0
282,34.5,22.0,1.0,3,1.0
283,64.5,22.0,1.0,1,1.0
284,44.5,32.0,4.0,3,1.0


Then we define the fuzzy variables:
* age: 10 - 100
* tumor size = 0 - 60
* number of inv nodes = 0 - 40
* degree of malignance = 1-3
* recurrence = 0 - 100

In [4]:
# Define fuzzy variables
age = ctrl.Antecedent(np.arange(10, 101, 1), 'age')
tumor_size = ctrl.Antecedent(np.arange(0, 61, 1), 'tumor_size')
inv_nodes = ctrl.Antecedent(np.arange(0, 41, 1), 'inv_nodes')
deg_malig = ctrl.Antecedent(np.arange(1, 4, 1), 'deg_malig')
recurrence = ctrl.Consequent(np.arange(0, 101, 1), 'recurrence')

Then we define the membership functions for each variable

In [5]:
#Age membership functions
age['young'] = fuzz.trapmf(age.universe, [10, 10, 25, 50])
age['middle_aged'] = fuzz.trimf(age.universe, [25, 50, 70])
age['elderly'] = fuzz.trapmf(age.universe, [50, 70, 100, 100])

#Tumor size membership function
tumor_size['small'] = fuzz.trapmf(tumor_size.universe, [0, 0, 2, 10])
tumor_size['medium'] = fuzz.trimf(tumor_size.universe, [2, 10, 18])
tumor_size['large'] = fuzz.trapmf(tumor_size.universe, [10, 18, 60, 60])

#inv nodes membership function
inv_nodes['few'] = fuzz.trapmf(inv_nodes.universe, [0, 0, 2, 10])
inv_nodes['moderate'] = fuzz.trimf(inv_nodes.universe, [2, 10, 20])
inv_nodes['many'] = fuzz.trapmf(inv_nodes.universe, [10, 20, 40, 40])

#deg_malig membership function
deg_malig['low'] = fuzz.trimf(deg_malig.universe, [1, 1, 4])
deg_malig['high'] = fuzz.trimf(deg_malig.universe, [2, 4, 4])

# Define membership functions for the output (recurrence)
recurrence['low'] = fuzz.trimf(recurrence.universe, [0, 0, 50])
recurrence['medium'] = fuzz.trimf(recurrence.universe, [0, 50, 100])
recurrence['high'] = fuzz.trimf(recurrence.universe, [50, 100, 100])


Then we define the fuzzy rules. <br>
After that, we add the rules to the control system. <br>
Finally, we create a fuzzy system simulation based on those controls.

In [6]:
# Define fuzzy rules based on the membership functions and reasoning
rule =[]
rule.append(ctrl.Rule(age['young'], recurrence['high']))
rule.append(ctrl.Rule(age['middle_aged'], recurrence['medium']))
rule.append(ctrl.Rule(age['elderly'], recurrence['medium']))

rule.append(ctrl.Rule(inv_nodes['few'], recurrence['low']))
rule.append(ctrl.Rule(inv_nodes['moderate'], recurrence['medium']))
rule.append(ctrl.Rule(inv_nodes['many'], recurrence['high']))

rule.append(ctrl.Rule(deg_malig['low'], recurrence['low']))
rule.append(ctrl.Rule(deg_malig['high'], recurrence['high']))

rule.append(ctrl.Rule(tumor_size['small'], recurrence['low']))
rule.append(ctrl.Rule(tumor_size['medium'], recurrence['medium']))
rule.append(ctrl.Rule(tumor_size['large'], recurrence['high']))

rule.append(ctrl.Rule(inv_nodes['few'] & tumor_size['large'], recurrence['medium']))
rule.append(ctrl.Rule(deg_malig['high'] & tumor_size['large'], recurrence['high']))
rule.append(ctrl.Rule(age['middle_aged'] & inv_nodes['few'], recurrence['low']))
rule.append(ctrl.Rule(age['middle_aged'] & inv_nodes['moderate'], recurrence['low']))

#add the rules to the control system
recurrence_ctrl = ctrl.ControlSystem(rules=rule)

# Create simulation
recurrence_sim = ctrl.ControlSystemSimulation(recurrence_ctrl)

The Fuzzy System will return a fuzzy output score from 0-100.<br>
If score is greater than 50 the system will predict that recurrence is likely for the patient.<br>
Below shows a demonstration of the fuzzy system.

In [7]:
# Define inputs
recurrence_sim.input['age'] = 35
recurrence_sim.input['tumor_size'] = 50
recurrence_sim.input['inv_nodes'] = 30
recurrence_sim.input['deg_malig'] = 3

# Compute output
recurrence_sim.compute()
fuzzy_output_demo = recurrence_sim.output['recurrence']
crisp_output_demo = "recurrence" if fuzzy_output_demo>50 else "no recurrence" 
# Access output
print(f"The patient will experience {crisp_output_demo} at with score of {fuzzy_output_demo}.")

The patient will experience recurrence at with score of 58.576856890377215.


After that, we evaluate the fuzzy system by passing the clean data into the fuzzy system <br>
We will calculate and display the following details:
* Number of correct diagnoses
* Number of wrong diagnoses
* Total number of diagnoses
* Accuracy of Fuzzy System in %<br>

In [8]:
corrects = []
outputs = []
count = 0
correct = 0
for index, row in data_clean.iterrows():
    count += 1
    recurrence_sim.input['age'] = row["age"]
    recurrence_sim.input['tumor_size'] = row["tumor-size"]
    recurrence_sim.input['inv_nodes'] = row["inv-nodes"]
    recurrence_sim.input['deg_malig'] = row["deg-malig"]
    recurrence_sim.compute()

    crisp_output = 1 if recurrence_sim.output['recurrence'] > 50 else 0
    outputs.append(crisp_output)
    
    correct = correct + 1 if crisp_output == row["Class"] else correct
    corrects.append(True) if crisp_output == row["Class"] else corrects.append(False)

print(f"Number of correct diagnoses {correct}") 
print(f"Number of wrong diagnoses {count - correct}") 
print(f"Total diagnoses {count}") 
print(f"Accuracy {correct / count * 100}%")

Number of correct diagnoses 208
Number of wrong diagnoses 78
Total diagnoses 286
Accuracy 72.72727272727273%


Then we create an output table that compares the actual class with output class and a Boolean "Correct" column if actual class matches output class. <br>
The output table will be saved in output_table.csv

In [9]:
output_table = data_clean.copy()
output_table["Output"] = outputs
output_table["Correct"] = corrects
output_table.to_excel('output_table.xlsx', index=False)
output_table

Unnamed: 0,age,tumor-size,inv-nodes,deg-malig,Class,Output,Correct
0,34.5,32.0,1.0,3,0.0,0,True
1,44.5,22.0,1.0,2,0.0,0,True
2,44.5,22.0,1.0,2,0.0,0,True
3,64.5,17.0,1.0,2,0.0,0,True
4,44.5,2.0,1.0,2,0.0,0,True
...,...,...,...,...,...,...,...
281,34.5,32.0,1.0,2,1.0,0,False
282,34.5,22.0,1.0,3,1.0,0,False
283,64.5,22.0,1.0,1,1.0,0,False
284,44.5,32.0,4.0,3,1.0,1,True


## Conclusion
The fuzzy system has an accuracy of 72.72%. This is satisfactory.<br>
Other methods such has neural network classification and logistic regression has an accuracy of 66.67% and 68.056% respectively.<br>
Source: <a href="https://archive.ics.uci.edu/dataset/14/breast+cancer">Breast Cancer - UCI ML Repository</a>