# U.S. Medical Insurance Costs

Goals:
- What is the average cost for individuals who have children vs individuals who do not?
- Which region has the highest average cost?
- What is the average cost for smokers vs non-smokers?
- Which sex pays the most insurance cost on average?

In [312]:
import csv

insurance_data = {}

with open('insurance.csv') as file:
    insurance_reader = csv.DictReader(file)
    
    for index, row in enumerate(insurance_reader):
        insurance_data[index] = {
            'age': int(row['age']),
            'sex': row['sex'],
            'children': int(row['children']),
            'smoker': row['smoker'],
            'region': row['region'],
            'charges': float(row['charges'])
        }
    #print(insurance_data)

## What is the average cost for individuals who have children vs individuals who do not?

In [299]:
has_children = {}
no_children = {}

for key, row in insurance_data.items():
    if row['children'] > 0:
        has_children[key] = row
    else:
        no_children[key] = row

total_charges_with_children = 0
total_charges_without_children = 0
for row in has_children.values():
    total_charges_with_children += row['charges']

for row in no_children.values():
    total_charges_without_children += row['charges']

average_charges_with_children = round(total_charges_with_children / len(has_children))
average_charges_without_children = round(total_charges_without_children / len(no_children))
print(f'The average insurance cost for someone with children is ${average_charges_with_children}.')
print(f'The average insurance cost for someone with no children is ${average_charges_without_children}.')
print(f'That is a ${average_charges_with_children - average_charges_without_children} difference.')

The average insurance cost for someone with children is $13950.
The average insurance cost for someone with no children is $12366.
That is a $1584 difference.


## Which region has the highest average cost?

In [301]:
region_groups = {}

for index, row in insurance_data.items():
    region = row['region']
    if region not in region_groups:
        region_groups[region] = []
    region_groups[region].append(row)

region_avg_charges = {}
highest_region = None
highest_avg_region_charges = 0

for region, rows in region_groups.items():
    total_charges = 0
    for row in rows:
        total_charges += row['charges']
    average_charges = round(total_charges / len(rows))
    region_avg_charges[region] = average_charges

    if average_charges > highest_avg_region_charges:
        highest_avg_region_charges = average_charges
        highest_region = region

for region, average_charges in region_avg_charges.items():
    print(f"Average charges for {region}: {average_charges}")

print(f"The region with the highest average charges is {highest_region} with an average charge of {highest_avg_region_charges}")

Average charges for southwest: 12347
Average charges for southeast: 14735
Average charges for northwest: 12418
Average charges for northeast: 13406
The region with the highest average charges is southeast with an average charge of 14735


## What is the average cost for smokers vs non-smokers?

In [303]:
smokers = {}
nonsmokers = {}

for key, row in insurance_data.items():
    if row['smoker'] == 'yes':
        smokers[key] = row
    else:
        nonsmokers[key] = row

smokers_total_charges = 0
nonsmokers_total_charges = 0

for row in smokers.values():
    smokers_total_charges += row['charges']

for row in nonsmokers.values():
    nonsmokers_total_charges += row['charges']

smoker_avg_charges = round(smokers_total_charges / len(smokers))
nonsmoker_avg_charges = round(nonsmokers_total_charges / len(nonsmokers))

print(f'The average insurance cost for a smoker is ${smoker_avg_charges}.')
print(f'The average insurance cost for a non smoker is ${nonsmoker_avg_charges}.')
print(f'That is a ${smoker_avg_charges - nonsmoker_avg_charges} difference.')

The average insurance cost for a smoker is $32050.
The average insurance cost for a non smoker is $8434.
That is a $23616 difference.


## Which sex pays the most insurance cost on average?

In [305]:
sexes = {}

for index, row in insurance_data.items():
    sex = row['sex']
    if sex not in sexes:
        sexes[sex] = []
    sexes[sex].append(row)

sex_avg_charges = {}
highest_sex = None
highest_avg_sex_charges = 0

for sex, rows in sexes.items():
    total_charges = 0
    for row in rows:
        total_charges += row['charges']
    average_charges = round(total_charges / len(rows))
    sex_avg_charges[sex] = average_charges

    if average_charges > highest_avg_sex_charges:
        highest_avg_sex_charges = average_charges
        highest_sex = sex

for sex, average_charges in sex_avg_charges.items():
    print(f"Average charges for {sex}: ${average_charges}")

print(f"The sex with the highest average charges is {highest_sex}.")
print(f'The difference is ${sex_avg_charges['male'] - sex_avg_charges['female']}')

Average charges for female: $12570
Average charges for male: $13957
The sex with the highest average charges is male.
The difference is $1387
