<h1 align=center>Pancreatic Cancer Prediction</h1>
<style>
        h1 {
            color: white;
        }
        h1:hover {
            color: red;
        }

<p align=center>
This dataset is designed for predicting pancreatic cancer outcomes based on patient demographics, medical history, risk factors, symptoms, and treatments. It incorporates real-world biases such as late-stage diagnosis prevalence, survival rate disparities, and socioeconomic influences.<br>This dataset can be used for machine learning models, survival analysis, and healthcare policy assessments.
</p>

# EDA

In [2]:
# import necessary library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn 


In [4]:
# loading and vewing the data set
df = pd.read_csv("pancreatic_cancer_prediction_sample.csv")
df.head()

Unnamed: 0,Country,Age,Gender,Smoking_History,Obesity,Diabetes,Chronic_Pancreatitis,Family_History,Hereditary_Condition,Jaundice,...,Stage_at_Diagnosis,Survival_Time_Months,Treatment_Type,Survival_Status,Alcohol_Consumption,Physical_Activity_Level,Diet_Processed_Food,Access_to_Healthcare,Urban_vs_Rural,Economic_Status
0,Canada,64,Female,0,0,0,0,0,0,0,...,Stage III,13,Surgery,0,0,Medium,Low,High,Urban,Low
1,South Africa,77,Male,1,1,0,0,0,0,0,...,Stage III,13,Chemotherapy,0,1,Medium,Medium,Medium,Urban,Low
2,India,71,Female,0,0,0,0,0,0,0,...,Stage IV,3,Chemotherapy,1,0,Medium,High,Low,Rural,Middle
3,Germany,56,Male,0,0,0,0,1,0,1,...,Stage IV,6,Radiation,0,1,Low,Low,Medium,Rural,Middle
4,United States,82,Female,0,0,0,0,1,0,0,...,Stage IV,9,Chemotherapy,1,0,Low,Medium,Medium,Rural,Low


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 24 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   Country                        50000 non-null  object
 1   Age                            50000 non-null  int64 
 2   Gender                         50000 non-null  object
 3   Smoking_History                50000 non-null  int64 
 4   Obesity                        50000 non-null  int64 
 5   Diabetes                       50000 non-null  int64 
 6   Chronic_Pancreatitis           50000 non-null  int64 
 7   Family_History                 50000 non-null  int64 
 8   Hereditary_Condition           50000 non-null  int64 
 9   Jaundice                       50000 non-null  int64 
 10  Abdominal_Discomfort           50000 non-null  int64 
 11  Back_Pain                      50000 non-null  int64 
 12  Weight_Loss                    50000 non-null  int64 
 13  D

In [6]:
df.describe()

Unnamed: 0,Age,Smoking_History,Obesity,Diabetes,Chronic_Pancreatitis,Family_History,Hereditary_Condition,Jaundice,Abdominal_Discomfort,Back_Pain,Weight_Loss,Development_of_Type2_Diabetes,Survival_Time_Months,Survival_Status,Alcohol_Consumption
count,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0
mean,64.54094,0.29954,0.24826,0.19998,0.0993,0.15168,0.04944,0.19922,0.2965,0.25286,0.34998,0.19622,13.89804,0.12844,0.30346
std,9.973847,0.458061,0.432008,0.399989,0.299067,0.358714,0.216787,0.399418,0.456719,0.434656,0.476968,0.397141,11.272151,0.334582,0.459757
min,30.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
25%,58.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,0.0
50%,65.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0
75%,71.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,19.0,0.0,1.0
max,90.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,59.0,1.0,1.0


In [7]:
df.columns

Index(['Country', 'Age', 'Gender', 'Smoking_History', 'Obesity', 'Diabetes',
       'Chronic_Pancreatitis', 'Family_History', 'Hereditary_Condition',
       'Jaundice', 'Abdominal_Discomfort', 'Back_Pain', 'Weight_Loss',
       'Development_of_Type2_Diabetes', 'Stage_at_Diagnosis',
       'Survival_Time_Months', 'Treatment_Type', 'Survival_Status',
       'Alcohol_Consumption', 'Physical_Activity_Level', 'Diet_Processed_Food',
       'Access_to_Healthcare', 'Urban_vs_Rural', 'Economic_Status'],
      dtype='object')

### Checking for missing values

In [8]:
df.isna().sum()

Country                          0
Age                              0
Gender                           0
Smoking_History                  0
Obesity                          0
Diabetes                         0
Chronic_Pancreatitis             0
Family_History                   0
Hereditary_Condition             0
Jaundice                         0
Abdominal_Discomfort             0
Back_Pain                        0
Weight_Loss                      0
Development_of_Type2_Diabetes    0
Stage_at_Diagnosis               0
Survival_Time_Months             0
Treatment_Type                   0
Survival_Status                  0
Alcohol_Consumption              0
Physical_Activity_Level          0
Diet_Processed_Food              0
Access_to_Healthcare             0
Urban_vs_Rural                   0
Economic_Status                  0
dtype: int64

no missing values in the dataset

### Checking for unique value in each columns

In [10]:
for col in df.columns:
    print(f"{col} : {df[col].unique()}\n")

Country : ['Canada' 'South Africa' 'India' 'Germany' 'United States' 'Australia'
 'China' 'United Kingdom' 'Brazil']

Age : [64 77 71 56 82 49 67 50 53 68 75 79 60 58 74 78 84 70 69 59 61 57 51 73
 62 65 45 76 90 47 72 80 55 63 66 52 54 38 48 81 42 85 36 35 44 87 30 46
 83 40 86 88 41 34 43 39 89 37 33 31 32]

Gender : ['Female' 'Male']

Smoking_History : [0 1]

Obesity : [0 1]

Diabetes : [0 1]

Chronic_Pancreatitis : [0 1]

Family_History : [0 1]

Hereditary_Condition : [0 1]

Jaundice : [0 1]

Abdominal_Discomfort : [0 1]

Back_Pain : [0 1]

Weight_Loss : [0 1]

Development_of_Type2_Diabetes : [0 1]

Stage_at_Diagnosis : ['Stage III' 'Stage IV' 'Stage II' 'Stage I']

Survival_Time_Months : [13  3  6  9  4  8 12 14  1 35  7 11 10 58  5 20 22 21 16 29 31  2 17 49
 54 19 18 27 23 15 28 26 36 42 45 32 56 30 33 25 53 24 37 40 43 34 44 57
 47 41 52 39 59 55 48 51 50 46 38]

Treatment_Type : ['Surgery' 'Chemotherapy' 'Radiation']

Survival_Status : [0 1]

Alcohol_Consumption : [0 1]

Physi

## Graphs

# Feature Engennring

## Model Selection and traning it.