In [1]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import matplotlib.ticker as ticker
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [2]:
df = pd.read_csv("/kaggle/input/brain-tumor-dataset/brain_tumor_dataset.csv")

# Columns Information

## Patient Information
- **Patient_ID**: Unique identifier for each patient.  
- **Age**: Age of the patient (in years).  
- **Gender**: Gender of the patient (**Male/Female**).  
- **Family_History**: Whether the patient has a family history of brain tumors (**Yes/No**).  

## Tumor Characteristics
- **Tumor_Type**: Type of tumor (**Benign/Malignant**).  
- **Tumor_Size**: Size of the tumor (in centimeters).  
- **Location**: The part of the brain where the tumor is located (e.g., **Frontal, Temporal**).  
- **Histology**: The histological type of the tumor (e.g., **Astrocytoma, Glioblastoma**).  
- **Stage**: The stage of the tumor (**I, II, III, IV**).  
- **Tumor_Growth_Rate**: The growth rate of the tumor (cm per month).  

## Symptoms
- **Symptom_1**: The first symptom observed (e.g., **Headache, Seizures**).  
- **Symptom_2**: The second symptom observed.  
- **Symptom_3**: The third symptom observed.  

## Diagnosis & Treatment
- **MRI_Result**: The result of the MRI scan (**Positive/Negative**).  
- **Radiation_Treatment**: Whether radiation treatment was administered (**Yes/No**).  
- **Surgery_Performed**: Whether surgery was performed (**Yes/No**).  
- **Chemotherapy**: Whether chemotherapy was administered (**Yes/No**).  

## Follow-Up & Prognosis
- **Survival_Rate**: The estimated survival rate of the patient (percentage).  
- **Follow_Up_Required**: Whether follow-up is required (**Yes/No**).  
- **Treatment_Response**: The response to the treatment (**Improved/Worsened/Stable**).  


# Data Cleaning

In [3]:
df.isnull().sum()

Patient_ID             0
Age                    0
Gender                 0
Tumor_Type             0
Tumor_Size             0
Location               0
Histology              0
Stage                  0
Symptom_1              0
Symptom_2              0
Symptom_3              0
Radiation_Treatment    0
Surgery_Performed      0
Chemotherapy           0
Survival_Rate          0
Tumor_Growth_Rate      0
Family_History         0
MRI_Result             0
Follow_Up_Required     0
dtype: int64

Wow, apparently we are in front of a super dataset that is complete, that's great!

In [4]:
df.dtypes

Patient_ID               int64
Age                      int64
Gender                  object
Tumor_Type              object
Tumor_Size             float64
Location                object
Histology               object
Stage                   object
Symptom_1               object
Symptom_2               object
Symptom_3               object
Radiation_Treatment     object
Surgery_Performed       object
Chemotherapy            object
Survival_Rate          float64
Tumor_Growth_Rate      float64
Family_History          object
MRI_Result              object
Follow_Up_Required      object
dtype: object

In [5]:
df.head()

Unnamed: 0,Patient_ID,Age,Gender,Tumor_Type,Tumor_Size,Location,Histology,Stage,Symptom_1,Symptom_2,Symptom_3,Radiation_Treatment,Surgery_Performed,Chemotherapy,Survival_Rate,Tumor_Growth_Rate,Family_History,MRI_Result,Follow_Up_Required
0,1,73,Male,Malignant,5.375612,Temporal,Astrocytoma,III,Vision Issues,Seizures,Seizures,No,No,No,51.312579,0.111876,No,Positive,Yes
1,2,26,Male,Benign,4.847098,Parietal,Glioblastoma,II,Headache,Headache,Nausea,Yes,Yes,Yes,46.373273,2.165736,Yes,Positive,Yes
2,3,31,Male,Benign,5.588391,Parietal,Meningioma,I,Vision Issues,Headache,Seizures,No,No,No,47.072221,1.884228,No,Negative,No
3,4,29,Male,Malignant,1.4366,Temporal,Medulloblastoma,IV,Vision Issues,Seizures,Headache,Yes,No,Yes,51.853634,1.283342,Yes,Negative,No
4,5,54,Female,Benign,2.417506,Parietal,Glioblastoma,I,Headache,Headache,Seizures,No,No,Yes,54.708987,2.069477,No,Positive,Yes


In [6]:
df['Tumor_Type'].unique()

array(['Malignant', 'Benign'], dtype=object)

In [7]:
df['Location'].unique()

array(['Temporal', 'Parietal', 'Frontal', 'Occipital'], dtype=object)

In [8]:
df['Histology'].unique()

array(['Astrocytoma', 'Glioblastoma', 'Meningioma', 'Medulloblastoma'],
      dtype=object)

In [9]:
df['Stage'].unique()

array(['III', 'II', 'I', 'IV'], dtype=object)

In [10]:
df['Symptom_1'].unique()

array(['Vision Issues', 'Headache', 'Seizures', 'Nausea'], dtype=object)

In [11]:
df['MRI_Result'].unique()

array(['Positive', 'Negative'], dtype=object)