### Estimating Healthcare Insurance Expenses through Machine Learning.

In the ever-evolving landscape of healthcare, where costs have soared due to the complexities of the modern healthcare system, the power of data has emerged as a beacon of hope. Healthcare analytics, much like a skilled navigator, holds the key to understanding and transforming the intricate web of health insurance costs. This is the mission of our project: to harness the transformative potential of data.

Our journey begins with the realization that health insurance costs have reached unprecedented levels. This escalation is a result of numerous factors, including the rising cost of healthcare services and an array of individual characteristics. It's a multifaceted puzzle that calls for a comprehensive solution.

We embark on this mission with the heart of a data virtuoso. Armed with the latest tools and techniques in healthcare analytics, we delve into a vast dataset, comprising variables such as age, sex, BMI, number of children, smoking habits, and region. Like a sculptor chiseling away at a block of marble, we meticulously dissect and interpret this data, applying both quantitative and qualitative methods to reveal the hidden insights and patterns within.

At the core of our analysis lies the prediction of medical costs – a crucial indicator that unlocks a world of possibilities. This prediction empowers individuals, healthcare providers, and insurers to make informed decisions. It guides patients toward better planning, assists healthcare professionals in optimizing treatment plans, and helps insurance providers in setting fair premiums.

Collaboration is at the heart of our endeavor. We work closely with experts in the field, building a bridge between the world of data and the world of healthcare. The insights we generate through data analysis become the guiding stars, steering our stakeholders towards better decision-making. With compelling data visualizations, we breathe life into these insights, making them accessible and actionable.

Our work, much like that of the unsung hero in the healthcare analytics narrative, may often go unnoticed. Yet, its impact is profound. It contributes to the ongoing saga of healthcare excellence, making the journey towards affordable and efficient healthcare that much more extraordinary.

In the realm of health insurance cost prediction, we are the unsung heroes, unlocking the extraordinary stories of financial well-being, efficient healthcare planning, and a more accessible healthcare system. Our dedication to data and our ability to transform it into actionable insights shape the path to a healthier, more financially secure future for all.

#### Step 1: Analyzing Insurance Costs.

In [1]:
#--- Import Pandas ---
import pandas as pd

#--- Read in dataset ----
df = pd.read_csv("insurance.csv")

#--- Inspect data ---
df

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


#### Step 2: Unearthing Data Duplications.

In [2]:
duplicates = df.duplicated().sum()

#--- Inspect data ---
duplicates

1

#### Step 3: Eliminating Duplications.

In [3]:
df.drop_duplicates(inplace=True)

#--- Inspect data ---
df

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


#### Step 4: Check Null Values

In [4]:
null_values = df.isnull().sum()

#--- Inspect data ---
null_values

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64

#### Step 5: Setting the Stage for Precise Predictions

Label Encoding of Categorical Features.

Apply label encoding to the columns 'sex' and 'smoker' in the DataFrame 'df'.

By employing the LabelEncoder from the sklearn library, we convert non-numeric attributes such as 'sex' and 'smoker' into numerical values. This transformation is essential as it enables us to feed these features into our analytical models, providing a more comprehensive understanding of the factors influencing insurance costs.

In [5]:
from sklearn. preprocessing import LabelEncoder
label_encode = LabelEncoder()

# Encode the features to integers inside a for loop
for i in df[["sex", "smoker"]]:
    df[i] = label_encode.fit_transform(df[i])

#--- Inspect data ---
df

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,0,27.900,0,1,southwest,16884.92400
1,18,1,33.770,1,0,southeast,1725.55230
2,28,1,33.000,3,0,southeast,4449.46200
3,33,1,22.705,0,0,northwest,21984.47061
4,32,1,28.880,0,0,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,1,30.970,3,0,northwest,10600.54830
1334,18,0,31.920,0,0,northeast,2205.98080
1335,18,0,36.850,0,0,southeast,1629.83350
1336,21,0,25.800,0,0,southwest,2007.94500


#### Step 6: Unleashing the Power of One-Hot Encoding.

One-Hot Encoding of Categorical Data.

Perform one-hot encoding on the 'region' column of the DataFrame 'df' and store the result in the variable 'one_hot_encode'.

This task is essential to translate geographical regions into numerical representations, allowing us to incorporate this valuable information into our analysis. By executing the code provided, we create a series of binary columns, each representing a different region. 

In [6]:
# Create an One-hot encoding object
one_hot_encode = pd.get_dummies(df["region"])

#--- Inspect data ---
one_hot_encode

Unnamed: 0,northeast,northwest,southeast,southwest
0,False,False,False,True
1,False,False,True,False
2,False,False,True,False
3,False,True,False,False
4,False,True,False,False
...,...,...,...,...
1333,False,True,False,False
1334,True,False,False,False
1335,False,False,True,False
1336,False,False,False,True
