# Problem Framing: Hospital Billing Prediction

## Objective
Predict the total **hospital billing amount** using only information available **at or before admission** to enable upfront cost estimates for patients and insurers.

##  Problem Type & Target
- **Type**: Supervised Regression  
- **Target Variable**: `Billing_Amount` (continuous, in local currency)

##  Input Features (Admission-Time Only)
To prevent data leakage, features are limited to those known at admission:

- **Demographics**: Age, Gender, Blood Type  
- **Clinical**: Primary Medical Condition, Admission Medication, Initial Test Results  
- **Administrative**: Admission Type, Doctor, Hospital, Insurance Provider, Room Type  
- **Temporal**: Admission Date → engineered as day of week, season, etc.  

> ❌ **Excluded**: Discharge Date and Length of Stay (would leak future information)

##  Evaluation Strategy
- **Metrics**: RMSE, MAE  
- **Baselines**: 
  - Global mean billing
  - Mean billing per medical condition  
- **Validation**: Time-based split (e.g., train on 2022–2023, test on 2024) to simulate real-world deployment

##  Key Considerations
- **No data leakage**: Only admission-time data is used.  
- **Fairness**: Monitor for bias across insurance types, hospitals, or demographics.  
- **Interpretability**: Model decisions should be explainable to stakeholders.  
- **Clinical alignment**: Billing should reflect care intensity—not irrelevant patient attributes.

### Data Import

* The dataset is hosted on my GitHub repository. To ensure direct and reliable access in this notebook, the file is loaded using the **raw GitHub URL** (not the standard GitHub page link).

In [None]:
# To convert a GitHub file link into a raw file link, you typically replace the part of the URL that says github.com and the path to the file with raw.githubusercontent.com, and remove the blob/ segment from the URL.
# E.g: 

# https://github.com/username/repo/blob/branch/path/to/file.ext

# To:
# https://raw.githubusercontent.com/username/repo/branch/path/to/file.ext



url = "https://raw.githubusercontent.com/Rooney-tech/Machine-Learning/main/Linear%20regression/Data/healthcare_dataset.csv"

