# Prediction of Recurrence in Differentiated Thyroid Cancer

### Problem Statement: __Are we able to Predict Recurrence of Thyroid Cancer based on Individuals’ Physical Attributes?__

## Part 1: Data Preparation for Exploratory Data Analysis(EDA)

In [12]:
# For data processing
import pandas as pd


In [30]:
thyroid_data= pd.read_csv("Thyroid_Diff.csv")


# Data Cleaning

### Data before Cleaning

In [34]:
thyroid_data.head()

Unnamed: 0,Age,Gender,Smoking,Hx Smoking,Hx Radiothreapy,Thyroid Function,Physical Examination,Adenopathy,Pathology,Focality,Risk,T,N,M,Stage,Response,Recurred
0,27,F,No,No,No,Euthyroid,Single nodular goiter-left,No,Micropapillary,Uni-Focal,Low,T1a,N0,M0,I,Indeterminate,No
1,34,F,No,Yes,No,Euthyroid,Multinodular goiter,No,Micropapillary,Uni-Focal,Low,T1a,N0,M0,I,Excellent,No
2,30,F,No,No,No,Euthyroid,Single nodular goiter-right,No,Micropapillary,Uni-Focal,Low,T1a,N0,M0,I,Excellent,No
3,62,F,No,No,No,Euthyroid,Single nodular goiter-right,No,Micropapillary,Uni-Focal,Low,T1a,N0,M0,I,Excellent,No
4,62,F,No,No,No,Euthyroid,Multinodular goiter,No,Micropapillary,Multi-Focal,Low,T1a,N0,M0,I,Excellent,No


### Cleaning Column Titles and Values for better clarity

In [41]:
thyroid_data = thyroid_data.rename(columns={'Hx Smoking': 'Smoking History', 'Smoking': 'Currently Smoking',
                                         'Hx Radiothreapy': 'Radiotherapy History',
                                         'Pathology': 'Types of Thyroid Cancer (Pathology)',
                                         'T': 'Tumor',
                                         'N': 'Lymph Nodes',
                                         'M': 'Cancer Metastasis',
                                         'Response': 'Treatment Response'})

thyroid_data['Adenopathy'] = thyroid_data['Adenopathy'].replace({'No': 'No Lympth Adenopathy',
                                                               'Left': 'Left Side Body Adenopathy',
                                                               'Right': 'Right Side Body Adenopathy',
                                                               'Extensive': 'Extensive and Widespread'})

thyroid_data['Stage'] = thyroid_data['Stage'].replace({'I': 'First-Stage',
                                                     'II': 'Second-Stage',
                                                     'III': 'Third-Stage'})

thyroid_data['Tumor'] = thyroid_data['Tumor'].replace({'T1a': 'tumor is less than or equal to 1cm',
                                                     'T1b': 'tumor between the size of 1cm to 2cm inclusive',
                                                     'T2': 'tumor between the size of 2cm to 4cm inclusive',
                                                     'T3a': 'tumor larger than the size of 4 cm',
                                                     'T3b': 'tumor that has grown outside the thyroid',
                                                     'T4a': 'tumor that has invaded nearby Head and Neck structures',
                                                     'T4b': 'tumor that has invaded nearby Cervicothoracic Spine and Vascular structures'})
thyroid_data['Lymph Nodes'] = thyroid_data['Lymph Nodes'].replace({'N0': 'no evidence of regional lymph node metastasis',
                                                                 'N1b': 'regional lymph node metastasis in the central of the neck',
                                                                 'N1a': 'regional lymph node metastasis in the lateral of the neck'})

thyroid_data['Cancer Metastasis'] = thyroid_data['Cancer Metastasis'].replace({'M0': 'no evidence of distant metastasis',
                                                                             'M1': 'presence of distant metastasis'})


### Technical Terms

1. __Currently Smoking__: Presence of individuals' current smoking habits.

2. __Smoking History__: Presence of individuals' have a history of smoking.

3. Radiotherapy History: Status of whether indviduals' have a history of radiotherapy treatment.

4. __Thyroid Function__: The functionality of the Thyroid Glands.
    - __Subclinical Hyper/Hypo-thyroidism__: Milder form of hyper/hypo-thyroidism; Patients may be asymptomatic (with presence of Thyroid Cancer)
    - __Clinical Hyper/Hypo-thyroidism__: More severe and noticeable form of hyper/hypo-thyroidism (with presence of Thyroid Cancer)
    - __Euthyroid__: Normal thyroid function (with presence of Thyroid Cancer)

5. __Physical Examination__: Results of a physical examination conducted on the thyroids.
    - __Diffuse goiter__: Refers to an enlargement of the thyroid gland where the entire gland is swollen, appearing smooth and uniformly enlarged. It can be a simple goiter, where thyroid hormone levels are normal, or a toxic goiter, where there is an overproduction of thyroid hormones, often associated with Graves' disease.
    - __Multinodular goiter__: Condition where the thyroid gland becomes enlarged and contains multiple nodules. These nodules can be either benign or cancerous.
    - __Single nodular goiter left/right__: Refers to an enlarged thyroid gland (on the left/right) with a single, palpable nodule. This nodule is a localized overgrowth of thyroid tissue, often benign, but can be a sign of thyroid cancer in some cases.

6. __Adenopathy__: Presence and location of enlarged lympth nodes.
    - __No Lymph Adenopathy__: No swelling or enlargement of the lymph nodes.
    - __Left Side Body Adenopathy__: Refers to swollen lymph nodes on the left side of the body, indicating that the body is fighting off an infection or illness that is present on the left.
    - __Right Side Body Adenopathy__: Refers to swelling or enlargement of lymph nodes on the right side of the body. This can be a sign of various conditions, including infections, immune system issues, or cancer.
    - __Extensive and Widespread (Adenopathy)__: Swollen lymph nodes are present in multiple areas throughout the body, rather than being localized to just one or two regions. This often suggests a systemic illness, meaning a problem affecting the entire body, rather than a localized infection. 

7. __Focality__: Presence of localized or specific areas of abnormality within the thyroid glands
    - __Uni-Focal__: refers to a thyroid tumor that has a single, isolated cancer cell focus.
    - __Multifocal__: thyroid tumor would have two or more cancer cell foci within the thyroid gland.

8. __Lymph Nodes__: Represents the N (Node) stage of thyroid cancer, indicating the involvement if nearby lymph nodes.
    - __No evidence of regional lymph node metastasis__: Cancer cells have not spread to the nearby lymph nodes, which are part of the body's immune system, indicating that the cancer is potentially contained and not yet in a later stage of progression.
    - __Regional Lymph Node Metastasis in the central neck__: Refers to cancer cells spreading from a primary tumor in the head and neck region to the lymph nodes in the central part of the neck.
    - __Regional Lymph Node Metastasis in the lateral neck__: Refers to cancer cells spreading from a primary tumor to lymph nodes located on the sides of the neck.

9. __Cancer Metastasis__: Represents the M (Metastasis) stage of thyroid cancer - whether the cancer has spread to distant organs.
    - __No evidence of distant metastasis__: Indicates that no signs of cancer spreading to distant parts of the body have been found, which suggests that imaging tests and physical examinations haven't revealed any tumors or other indicators of metastatic disease beyond the original site of the cancer. 
    - __Presence of Distant Metastasis__: Refers to the spread of cancer cells from the primary tumor to distant organs or lymph nodes. It is a key factor in cancer staging and prognosis, often indicating a more advanced stage of cancer.

10. __Cancer Stage__: classifies the extent of cancer's spread and is crucial for treatment planning and prognosis.
    - __Stage I__: Cancer is localized within the thyroid gland, hasn't spread to lymph nodes or other parts of the body, and the tumor is typically small.
    - __Stage II__: Tumor is any size, and the cancer may or may not have spread to nearby lymph nodes, but it has not spread to distant sites in the body. 
    - __Stage IVB__: Cancer has spread beyond the thyroid gland and into surrounding tissues, but it has not spread to distant parts of the body.
    - __Stage III__: Cancer has grown beyond the thyroid gland and may have spread to nearby tissues.
    - __Stage IVA__: Indicates that the cancer has spread from the thyroid to nearby tissues like the larynx, trachea, or esophagus, or it has spread to nearby lymph nodes.

11. __Treatment Response__: Represents the change in a patient's condition following a therapeutic intervention, reflecting how well a treatment is working.
    - __Excellent__: Refers to achieving a complete remission or a sustained partial remission.
    - __Biochemical Incomplete__: After treatment there's no structural evidence of disease, but the levels of thyroglobulin (Tg) or anti-Tg antibodies remain abnormal or are rising.
    - __Indeterminate__: Treatment has not resulted in a clear improvement or complete remission, but also not a clear indication of disease progression. 
    - __Structural Incomplete__: Indicates persistent or recurrent structural disease after initial treatment, suggesting that cancer is still present, either locally, regionally, or at distant sites, despite the initial treatment. 

### Data after cleaning

In [43]:
thyroid_data.head()

Unnamed: 0,Age,Gender,Currently Smoking,Smoking History,Radiotherapy History,Thyroid Function,Physical Examination,Adenopathy,Types of Thyroid Cancer (Pathology),Focality,Risk,Tumor,Lymph Nodes,Cancer Metastasis,Stage,Treatment Response,Recurred
0,27,F,No,No,No,Euthyroid,Single nodular goiter-left,No Lympth Adenopathy,Micropapillary,Uni-Focal,Low,tumor is less than or equal to 1cm,no evidence of regional lymph node metastasis,no evidence of distant metastasis,First-Stage,Indeterminate,No
1,34,F,No,Yes,No,Euthyroid,Multinodular goiter,No Lympth Adenopathy,Micropapillary,Uni-Focal,Low,tumor is less than or equal to 1cm,no evidence of regional lymph node metastasis,no evidence of distant metastasis,First-Stage,Excellent,No
2,30,F,No,No,No,Euthyroid,Single nodular goiter-right,No Lympth Adenopathy,Micropapillary,Uni-Focal,Low,tumor is less than or equal to 1cm,no evidence of regional lymph node metastasis,no evidence of distant metastasis,First-Stage,Excellent,No
3,62,F,No,No,No,Euthyroid,Single nodular goiter-right,No Lympth Adenopathy,Micropapillary,Uni-Focal,Low,tumor is less than or equal to 1cm,no evidence of regional lymph node metastasis,no evidence of distant metastasis,First-Stage,Excellent,No
4,62,F,No,No,No,Euthyroid,Multinodular goiter,No Lympth Adenopathy,Micropapillary,Multi-Focal,Low,tumor is less than or equal to 1cm,no evidence of regional lymph node metastasis,no evidence of distant metastasis,First-Stage,Excellent,No


### Saving the result as a new file
Explanation: This step is included additionally for administrative and grading purposes. It is not part of the main merged code

In [46]:
thyroid_data.to_csv('thyroiddata.csv', index=False)
print("Saved modified data as 'thyroid_data.csv'")

Saved modified data as 'thyroid_data.csv'
