# <center>Cancer Diagnosis Analytics Dashboard</center>

# 1. Data Acquisition And Overview #

## Introduction To Project

This project analyzes breast cancer data to help improve diagnosis and treatment planning. Using a UK-based dataset of 2,000+ patients (METABRIC), we explore patterns in cancer outcomes and build tools to predict patient risks. The goal is to create an interactive dashboard that helps healthcare professionals make data-driven decisions.

## Dataset Overview

The METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) dataset contains genomic and clinical information for breast cancer patients. The dataset includes:

- RNA expression data for various genes
- Mutation information
- Patient clinical data
- Treatment and outcome information

## Objectives

1. **Explore Patient Data**: Analyze cancer gene expression patterns and molecular characteristics
2. **Predict Outcomes**: Build models to forecast survival risks and treatment success
3. **Optimize Care**: Identify factors that improve diagnosis speed and resource use
4. **Create Dashboard**: Develop an interactive tool for visualizing key insights

## Hypotheses

1. **Molecular Subtype Hypothesis**: Breast cancer molecular subtypes will show significant differences in 5-year survival rates. We expect Luminal A patients to have >85% survival rate compared to <60% for Basal-like subtypes.

2. **Age and Treatment Response Hypothesis**: Patients diagnosed under 45 years old will show different treatment response patterns compared to those over 70. Younger patients will have more aggressive tumors but better response to chemotherapy.

3. **Genomic Marker Prediction Hypothesis**: A combination of 3-5 key genetic markers will predict treatment outcomes with >75% accuracy.

4. **Tumor Characteristics and Survival Hypothesis**: Tumor size and lymph node involvement will be stronger predictors of survival than patient age.

5. **Treatment Optimization Hypothesis**: Machine learning models can identify optimal treatment protocols for specific patient subgroups.

## Imports And Settings


### Import Required Libraries

In [3]:
#import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import os
from datetime import datetime

ModuleNotFoundError: No module named 'numpy'

In [4]:
import sys
print(sys.executable)

c:\Users\User\AppData\Local\Programs\Python\Python312\python.exe


### Configure Visualization Settings

In [2]:
plt.style.use('seaborn-v0_8-whitegrid')  # Use seaborn style for plots
sns.set(font_scale=1.2)  # Increase font size for better readability
plt.rcParams['figure.figsize'] = (12, 8)  # Set default figure size
plt.rcParams['axes.labelsize'] = 12  # Set axis label size

NameError: name 'plt' is not defined