# U.S. Medical Insurance Costs

## Introduction:
This notebook explores trends and relationships within a U.S. medical insurance dataset. The dataset contains attributes like age, sex, BMI, number of children, smoking status, region, and insurance charges.

The goal is to answer several exploratory questions to uncover patterns related to smoking, gender, region, and insurance cost.


## Goals:
We aim to answer the following questions:
- Do people in the Southwest smoke more than people in the Northwest?
- What is the average age of someone with at least one child or more?
- What is the average insurance cost for smokers in the Northwest?
- What is the average BMI for people who smoke?
- What is the average BMI for people who do not smoke?
- What is the average insurance cost for females?
- What is the average insurance cost for males?
- Is there a significant difference in average insurance cost between smokers and non-smokers?
- Which region has the highest number of smokers?
- Are there more male smokers than female smokers?


## Load Data



In [None]:
import csv

with open("insurance.csv") as insurance_data:
    insurance_data_object = csv.DictReader(insurance_data)
    insurance_object = list(insurance_data_object)
print(insurance_object)

## Exploratory Data Analysis (EDA)

### Are there more smokers in the Southwest than in the Northwest?

In [44]:
southwest_smokers = []
northwest_smokers = []
for i in insurance_object:
    if i["smoker"] == "yes" and i["region"] == "southwest":
        southwest_smokers.append(i)
    elif i["smoker"] == "yes" and i["region"] == "northwest":
        northwest_smokers.append(i)
print(len(southwest_smokers), len(northwest_smokers))

58 58


### What is the average age of someone with at least one child or more?

In [45]:
people_with_children = []

for i in insurance_object:
    if i["children"] or i["age"]:
        i["children"] = int(i["children"])
        i["age"] = int(i["age"])
# print(insurance_object)

sum_of_ages = 0

for j in insurance_object:
    if j["children"] > 0:
        people_with_children.append(j)
        sum_of_ages += j["age"]
average_age = round(sum_of_ages/len(people_with_children), 1)
print(average_age)

39.8


### What is the average insurance cost for smokers in the Northwest?

In [46]:
sum_of_insurance_cost = 0
count = 0

for i in insurance_object:
    if i["charges"]:
        i["charges"] = float(i["charges"])


for j in insurance_object:
    if j["region"] == "northwest":
        sum_of_insurance_cost += j["charges"]
        count += 1
average_insurance_northwest = round(sum_of_insurance_cost/count, 2)
print(average_insurance_northwest)

12417.58


### What is the average BMI for people who smoke?

In [47]:
bmi_sum = 0
count = 0

for i in insurance_object:
    if i["bmi"]:
        i["bmi"] = float(i["bmi"])

for j in insurance_object:
    if j["smoker"] == "yes":
        bmi_sum += j["bmi"]
        count += 1
average_smoker_bmi = round(bmi_sum/count, 2)
print(average_smoker_bmi)


30.71


### What is the average BMI for people who do not smoke?

In [48]:
nonsmoker_bmi_sum = 0
count = 0

for i in insurance_object:
    if i["bmi"]:
        i["bmi"] = float(i["bmi"])

for j in insurance_object:
    if j["smoker"] == "no":
        nonsmoker_bmi_sum += j["bmi"]
        count += 1
average_nonsmoker_bmi = round(nonsmoker_bmi_sum/count, 2)
print(average_nonsmoker_bmi)


30.65


### What is the average insurance cost for females?

In [49]:
average_female_insurance_cost = 0
count = 0

for i in insurance_object:
    if i["charges"]:
        i["charges"] = float(i["charges"])


for j in insurance_object:
    if j["sex"] == "female":
        average_female_insurance_cost += j["charges"]
        count += 1
average_cost = round(average_female_insurance_cost/count, 2)
print(average_cost)

12569.58


### What is the average insurance cost for males?

In [50]:
average_male_insurance_cost = 0
count = 0

for i in insurance_object:
    if i["charges"]:
        i["charges"] = float(i["charges"])


for j in insurance_object:
    if j["sex"] == "male":
        average_male_insurance_cost += j["charges"]
        count += 1
average_cost = round(average_male_insurance_cost/count, 2)
print(average_cost)

13956.75


### Is there a significant difference in average insurance cost between smokers and non-smokers?

In [58]:
sum_of_smokers_insurance_cost = 0
sum_of_nonsmokers_insurance_cost = 0
count = 0
non_smoker_count = 0

for i in insurance_object:
    if i["charges"]:
        i["charges"] = float(i["charges"])


for j in insurance_object:
    if j["smoker"] == "yes":
        sum_of_smokers_insurance_cost += j["charges"]
        count += 1
average_smoker_insurance_cost = round(sum_of_insurance_cost/count, 2)



for k in insurance_object:
    if k["smoker"] == "no":
        sum_of_nonsmokers_insurance_cost += k["charges"]
        non_smoker_count += 1
average_non_smoker_insurance_cost = round(sum_of_nonsmokers_insurance_cost/non_smoker_count, 2)


difference_in_cost = round(abs(average_smoker_insurance_cost - average_non_smoker_insurance_cost), 2)
print(difference_in_cost, average_smoker_insurance_cost, average_non_smoker_insurance_cost)

6294.61 14728.88 8434.27


### Which region has the highest number of smokers?

In [56]:
{'southwest', 'northeast', 'northwest', 'southeast'}
southwest_list = []
northeast_list = []
northwest_list = []
southeast_list = []

for i in insurance_object:
    if i["region"] == "southwest" and i["smoker"] == "yes":
        southwest_list.append(i)
    elif i["region"] == "northeast" and i["smoker"] == "yes":
        northeast_list.append(i)
    elif i["region"] == "northwest" and i["smoker"] == "yes":
        northwest_list.append(i)
    elif i["region"] == "southeast" and i["smoker"] == "yes":
        southeast_list.append(i)
southwest_count = len(southwest_list)
northeast_count = len(northeast_list)
northwest_count = len(northwest_list)
southeast_count = len(southeast_list)

print(southwest_count, northeast_count, northwest_count, southeast_count)


58 67 58 91


### Are there more male smokers than female smokers?

In [57]:
male_smokers = []
female_smokers = []

for i in insurance_object:
    if i["sex"] == "female" and i["smoker"] == "yes":
        female_smokers.append(i)
    elif i["sex"] == "male" and i["smoker"] == "yes":
        male_smokers.append(i)
male_smoker_count = len(male_smokers)
female_smoker_count = len(female_smokers)
print(male_smoker_count, female_smoker_count)

159 115


## Conclusion

- There are not more smokers in Southwest than in the Northwest region. Both regions have the same amount of smokers.
- The average age for someone with at least 1 child or more is 39.8, which is approximately 40 years in age.
- The average insurance cost for smokers in the Northwest is $12417.58.
- The average BMI for smokers is 30.71.
- The average BMI for non-smokers is 30.65.
- The average insurance cost for females is $12569.58.
- The average insurance cost for males is $13956.75.
- The difference in the average insurance cost for smokers vs. non-smokers is $6294.61, with the average insurance cost for smokers being higher than the average insurance cost for non-smokers. Additionally, there means there is a huge difference in cost between smokers in non-smokers.
- The region with the highest number of smokers is southeast.
- There are more male smokers than female smokers.

## Next Steps

- Creating visualizations to display data findings
- Building a predictive model to estimate insurance charges
- Examining interactions between multiple factors like age, BMI, and children
- Comparing this dataset to newer health datasets