#                                                 "Medical Treatment Outcomes and Financial Analysis"

### 📍 TASK 1 : Problem Definition & Dataset Selection.
#####      Clear problem statement, relevant domain, quality dataset (size, source, variety)

### 🔍 Overview of Healthcare Dataset

 * Health care data set consists of 55500 rows × 15 columns, with unique values.
 * The dataset contains detailed patient hospitalization records, including name, age ,gender etc.
 * The patient records provided span five years, from 2019 to 2024, which include their admission type and associated medical expenses.
 * It also provides details about the insurance providers,doctors attended and hospitals involved in the treatment,with their respective room number.
 * Details of patient diagonosis including blood type, medical conditions can be also seen in this report.
 * Disease management incldes Medication and evaluvating the outcome with its test results.
  
  

###  📚 DATA DESCRIPTION OF HEALTHCARE DATASET


| Variable           | Description                                                  |
|--------------------|--------------------------------------------------------------|
| Name               | Name of the patients  [Unique  Identifier]                                         |
| Age                | Age of the patients in years                                  |
| Gender             | Gender of the patients (2 unique values: Male and Female) |
| Blood Type         | Blood type of the patient (8 unique values : B-, A+, A-, O+,AB+,AB-,B+, O-)              |
| Medical Condition  | Disease condition (6 unique values : Cancer, Obesity, Diabetes, Asthma, Hypertension, Arthritis)   |
| Date of Admission  | Hospital admission date                                       |
| Doctor             | Doctors attending the patients                                |
| Hospital           | Hospital in which patients are treated                        |
| Insurance Provider | Insurance provider (5 unique values : Blue Cross, Medicare, Aetna, United Healthcare, Cigna)                     |
| Billing Amount     | Billing amount of the patient (decimal format)                |
| Room Number        | Room number of patients stayed                                |
| Admission Type     | Admission type of patient (3 unique values : Urgent, Emergency, Elective)                          |
| Discharge Date     | Discharge date of the patients                                |
| Medication         | Antibiotics administered to patient (5 unique values : Paracetamol, Ibuprofen, Aspirin, Penicillin, Lipitor)                  |
| Test Results       | Outcome of Medication (3 unique values : Normal, Inconclusive, Abnormal)                           |

### 📌 Objective 

1. Data Loading and Initial Overview
   * Import the dataset using Pandas and provide an overview:
      * Number of rows and columns
      * Data types of each column
      * Initial observations (e.g., head(), info(), describe())
2. Data Pre-processing
   * Perform all necessary cleaning steps such as:
      * Handling missing values
      * Removing duplicates
      * Correcting data types
      * Creating derived columns
      * Filtering or aggregating data
3. Exploratory Data Analysis (EDA)
   * Conduct descriptive and exploratory analysis to uncover patterns and trends:
      * Univariate, bivariate, and multivariate analysis
      * Use groupby, pivot tables, and correlation analysis
      * Include statistical summaries to support findings
4. Visualizations
   * Use Matplotlib / Seaborn / Plotly to generate meaningful visualizations:
      * Bar plots, line charts, pie charts, histograms, box plots, scatter plots, heatmaps, etc
      * Ensure visuals should have proper titles, labels, legends, and color schemes
      * Use subplots where applicable for better layout
5. Insight Generation and Report
   * Summarize key insights from your analysis:
      * Use Markdown cells throughout the notebook to clearly explain the logic, methods, and interpretation of results
      * Highlight significant patterns, correlations, or anomalies found in the data
      * Conclude with a short summary of your overall findings and any potential recommendations or next steps 

### 💡 Possible Key Insights About The Healthcare Dataset Over Five Years!!


* We can study the patterns about  medication and its outcome.
* Is there any relationship between blood type, gender and medical condition.
* Length of the stay and billing and treatment outcomes.
* Insurance provider and medical expense based on demography.
* Billing amount and its link with hospital and doctor.

### 1️⃣ : Importing Libraries

In [102]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### 2️⃣ : Reading The Dataset

In [5]:
df = pd.read_csv(r"C:\Users\Surface Pro\Downloads\healthcare_dataset.csv")
df                 

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Bobby JacksOn,30,Male,B-,Cancer,2024-01-31,Matthew Smith,Sons and Miller,Blue Cross,18856.281306,328,Urgent,2024-02-02,Paracetamol,Normal
1,LesLie TErRy,62,Male,A+,Obesity,2019-08-20,Samantha Davies,Kim Inc,Medicare,33643.327287,265,Emergency,2019-08-26,Ibuprofen,Inconclusive
2,DaNnY sMitH,76,Female,A-,Obesity,2022-09-22,Tiffany Mitchell,Cook PLC,Aetna,27955.096079,205,Emergency,2022-10-07,Aspirin,Normal
3,andrEw waTtS,28,Female,O+,Diabetes,2020-11-18,Kevin Wells,"Hernandez Rogers and Vang,",Medicare,37909.782410,450,Elective,2020-12-18,Ibuprofen,Abnormal
4,adrIENNE bEll,43,Female,AB+,Cancer,2022-09-19,Kathleen Hanna,White-White,Aetna,14238.317814,458,Urgent,2022-10-09,Penicillin,Abnormal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55495,eLIZABeTH jaCkSOn,42,Female,O+,Asthma,2020-08-16,Joshua Jarvis,Jones-Thompson,Blue Cross,2650.714952,417,Elective,2020-09-15,Penicillin,Abnormal
55496,KYle pEREz,61,Female,AB-,Obesity,2020-01-23,Taylor Sullivan,Tucker-Moyer,Cigna,31457.797307,316,Elective,2020-02-01,Aspirin,Normal
55497,HEATher WaNG,38,Female,B+,Hypertension,2020-07-13,Joe Jacobs DVM,"and Mahoney Johnson Vasquez,",UnitedHealthcare,27620.764717,347,Urgent,2020-08-10,Ibuprofen,Abnormal
55498,JENniFER JOneS,43,Male,O-,Arthritis,2019-05-25,Kimberly Curry,"Jackson Todd and Castro,",Medicare,32451.092358,321,Elective,2019-05-31,Ibuprofen,Abnormal


## 3️⃣ : Dataset Overview

#### 3.1  Let's Find Out The Total Number of Rows and Columns
           ★ HealthCare Dataset has 55500 Columns and 15 Rows.
        

In [7]:
df.shape

(55500, 15)

### 3.2  Basic Info : Next, Is to find out about Datas in the Health Care Dataset 


####    HealthCare Dataset contains:
         ★ 15 columns in total with 55500 entries.
         ★ In this, Billing Amount is in float format, Age and Room Number is integer and rest 12 columns are string values.

In [81]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55500 entries, 0 to 55499
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Name                55500 non-null  object 
 1   Age                 55500 non-null  int64  
 2   Gender              55500 non-null  object 
 3   Blood Type          55500 non-null  object 
 4   Medical Condition   55500 non-null  object 
 5   Date of Admission   55500 non-null  object 
 6   Doctor              55500 non-null  object 
 7   Hospital            55500 non-null  object 
 8   Insurance Provider  55500 non-null  object 
 9   Billing Amount      55500 non-null  float64
 10  Room Number         55500 non-null  int64  
 11  Admission Type      55500 non-null  object 
 12  Discharge Date      55500 non-null  object 
 13  Medication          55500 non-null  object 
 14  Test Results        55500 non-null  object 
dtypes: float64(1), int64(2), object(12)
memory usage: 6.4

### 3.3 Loads First 5 Rows of HealthCare Dataset

In [92]:
df.head()

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Bobby JacksOn,30,Male,B-,Cancer,2024-01-31,Matthew Smith,Sons and Miller,Blue Cross,18856.281306,328,Urgent,2024-02-02,Paracetamol,Normal
1,LesLie TErRy,62,Male,A+,Obesity,2019-08-20,Samantha Davies,Kim Inc,Medicare,33643.327287,265,Emergency,2019-08-26,Ibuprofen,Inconclusive
2,DaNnY sMitH,76,Female,A-,Obesity,2022-09-22,Tiffany Mitchell,Cook PLC,Aetna,27955.096079,205,Emergency,2022-10-07,Aspirin,Normal
3,andrEw waTtS,28,Female,O+,Diabetes,2020-11-18,Kevin Wells,"Hernandez Rogers and Vang,",Medicare,37909.78241,450,Elective,2020-12-18,Ibuprofen,Abnormal
4,adrIENNE bEll,43,Female,AB+,Cancer,2022-09-19,Kathleen Hanna,White-White,Aetna,14238.317814,458,Urgent,2022-10-09,Penicillin,Abnormal


### 3.4 Loads Last 5 Rows of HealthCare Dataset

In [93]:
df.tail()

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
55495,eLIZABeTH jaCkSOn,42,Female,O+,Asthma,2020-08-16,Joshua Jarvis,Jones-Thompson,Blue Cross,2650.714952,417,Elective,2020-09-15,Penicillin,Abnormal
55496,KYle pEREz,61,Female,AB-,Obesity,2020-01-23,Taylor Sullivan,Tucker-Moyer,Cigna,31457.797307,316,Elective,2020-02-01,Aspirin,Normal
55497,HEATher WaNG,38,Female,B+,Hypertension,2020-07-13,Joe Jacobs DVM,"and Mahoney Johnson Vasquez,",UnitedHealthcare,27620.764717,347,Urgent,2020-08-10,Ibuprofen,Abnormal
55498,JENniFER JOneS,43,Male,O-,Arthritis,2019-05-25,Kimberly Curry,"Jackson Todd and Castro,",Medicare,32451.092358,321,Elective,2019-05-31,Ibuprofen,Abnormal
55499,jAMES GARCiA,53,Female,O+,Arthritis,2024-04-02,Dennis Warren,Henry Sons and,Aetna,4010.134172,448,Urgent,2024-04-29,Ibuprofen,Abnormal


## 4️⃣ . Statistical Analysis

### 4.1 Summary Statistics for Numerical Variables

#### Inorder to have an Statistical Overview of the Numerical Data in HealthCare Dataset
 * Breif Description:
   * Age : Oldest Patient is 89 and the Yongest one is 13 years old.
   * Billing Amount : Among the 55500 bills average bill amount is 25539.316097.
   * Room Number : This data is not of much relevence as it is Room Number, since its is a numerical value,it came under the list.

In [75]:
df.describe()

Unnamed: 0,Age,Billing Amount,Room Number
count,55500.0,55500.0,55500.0
mean,51.539459,25539.316097,301.134829
std,19.602454,14211.454431,115.243069
min,13.0,-2008.49214,101.0
25%,35.0,13241.224652,202.0
50%,52.0,25538.069376,302.0
75%,68.0,37820.508436,401.0
max,89.0,52764.276736,500.0


### 4.2  Summary Statistics for Categorical Variables

#### Summerize Non-Numerical data based on count, uniqueness, maximum appearance and its frequency of apperances.

In [87]:
df.describe(include = 'object')

Unnamed: 0,Name,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Admission Type,Discharge Date,Medication,Test Results
count,55500,55500,55500,55500,55500,55500,55500,55500,55500,55500,55500,55500
unique,49992,2,8,6,1827,40341,39876,5,3,1856,5,3
top,DAvId muNoZ,Male,A-,Arthritis,2024-03-16,Michael Smith,LLC Smith,Cigna,Elective,2020-03-15,Lipitor,Abnormal
freq,3,27774,6969,9308,50,27,44,11249,18655,53,11140,18627


## 5️⃣  Analysing Sum of Duplicated Data in HealthCare Dataset

In [23]:
df.duplicated().sum()

534

## 6️⃣  Detecting Sum of Empty Cells with no value in HealthCare Dataset

In [24]:
df.isnull().sum()

Name                  0
Age                   0
Gender                0
Blood Type            0
Medical Condition     0
Date of Admission     0
Doctor                0
Hospital              0
Insurance Provider    0
Billing Amount        0
Room Number           0
Admission Type        0
Discharge Date        0
Medication            0
Test Results          0
dtype: int64

## 7️⃣  Finding Unique values in HealthCare Dataset

### 7.1  Number of Unique value per Column

In [50]:
df.nunique()

Name                  49992
Age                      77
Gender                    2
Blood Type                8
Medical Condition         6
Date of Admission      1827
Doctor                40341
Hospital              39876
Insurance Provider        5
Billing Amount        50000
Room Number             400
Admission Type            3
Discharge Date         1856
Medication                5
Test Results              3
dtype: int64

### 7.2 Unique value per Column

In [27]:
df['Gender'].unique() # Unique Entries in Gender Column

array(['Male', 'Female'], dtype=object)

In [31]:
df['Blood Type'].unique() # Unique Entries in Blood Type Column

array(['B-', 'A+', 'A-', 'O+', 'AB+', 'AB-', 'B+', 'O-'], dtype=object)

In [58]:
df['Medical Condition'].unique() # Unique Entries in Gender Column

array(['Cancer', 'Obesity', 'Diabetes', 'Asthma', 'Hypertension',
       'Arthritis'], dtype=object)

In [97]:
df['Doctor'].unique()  # Unique Entries in Doctor's Column

array(['Matthew Smith', 'Samantha Davies', 'Tiffany Mitchell', ...,
       'Deborah Sutton', 'Mary Bartlett', 'Alec May'], dtype=object)

In [98]:
df['Hospital'].unique()     # Unique Entries in Hosptal in Column

array(['Sons and Miller', 'Kim Inc', 'Cook PLC', ...,
       'Guzman Jones and Graves,', 'and Williams, Brown Mckenzie',
       'Moreno Murphy, Griffith and'], dtype=object)

In [59]:
df['Insurance Provider'].unique()    # Unique Entries in Insurance Provider Column

array(['Blue Cross', 'Medicare', 'Aetna', 'UnitedHealthcare', 'Cigna'],
      dtype=object)

In [99]:
df['Room Number'].unique()   # Unique Entries in Room Number Column

array([328, 265, 205, 450, 458, 389, 277, 316, 249, 394, 288, 134, 309,
       182, 465, 114, 449, 260, 115, 295, 327, 119, 109, 162, 401, 157,
       223, 293, 371, 108, 245, 494, 285, 228, 481, 212, 113, 272, 478,
       196, 418, 410, 300, 211, 413, 138, 456, 234, 492, 180, 250, 296,
       330, 405, 306, 333, 244, 325, 378, 468, 368, 263, 489, 241, 231,
       377, 407, 135, 131, 102, 255, 422, 320, 273, 395, 152, 321, 428,
       482, 268, 120, 318, 144, 226, 459, 208, 227, 402, 442, 425, 373,
       290, 361, 251, 440, 414, 424, 307, 476, 388, 326, 178, 177, 302,
       130, 430, 133, 104, 408, 376, 331, 275, 480, 233, 384, 380, 310,
       406, 213, 427, 500, 451, 485, 267, 154, 466, 453, 261, 167, 179,
       490, 258, 483, 202, 198, 308, 278, 103, 400, 192, 128, 238, 136,
       218, 348, 486, 147, 126, 314, 271, 341, 498, 168, 189, 438, 286,
       266, 392, 156, 315, 322, 184, 472, 398, 435, 174, 137, 111, 464,
       117, 493, 183, 471, 164, 356, 497, 421, 488, 317, 247, 15

In [40]:
df['Admission Type'].unique()    # Unique Entries in Admission Type Column

array(['Urgent', 'Emergency', 'Elective'], dtype=object)

In [35]:
df['Medication'].unique()      # Unique Entries in Medication Column

array(['Paracetamol', 'Ibuprofen', 'Aspirin', 'Penicillin', 'Lipitor'],
      dtype=object)

In [56]:
df['Test Results'].unique()     # Unique Entries in Test Result Column

array(['Normal', 'Inconclusive', 'Abnormal'], dtype=object)