# U.S. Medical Insurance Costs

## Goals
1. Find the average age of the patients in the dataset.
2. Analyze where a majority of the individuals are from.
3. Look at the gender of the individuals.
4. Look at the costs based on gender.
5. Look at the costs based on Body Mass Index (BMI). 

#### 1. Importing the data

In [30]:
import csv
insurance_data = []
with open('insurance.csv', 'r') as insurance_file:
    reader = csv.DictReader(insurance_file)
    headers = reader.fieldnames
    for row in reader:
        insurance_data.append(row)

print(insurance_data[0])

{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}


#### 2. Average age of the patients in the dataset

In [31]:
average_age = round(sum([int(data['age']) for data in insurance_data ]) / len(insurance_data))

print(f"The average age is {average_age}.")

The average age is 39.


#### 3. Where a majority of the individuals are from

In [32]:
from collections import Counter

regions = [data['region'] for data in insurance_data]
region_counts = Counter(regions)
most_common_region = region_counts.most_common(1)[0]

print(f"The majority of individuals are from the {most_common_region[0]} region with {most_common_region[1]} individuals.")

The majority of individuals are from the southeast region with 364 individuals.


#### 4. Look at the gender of the individuals

In [33]:
gender_counts = Counter([data['sex'] for data in insurance_data])
print(f"Gender distribution: male - {gender_counts['male']} vs female - {gender_counts['female']}")

Gender distribution: male - 676 vs female - 662


#### 5. Look at the costs based on gender

In [37]:
def costs_by_gender():
    costs_by_gender = {'male': [], 'female': []}

    for data in insurance_data:
        costs_by_gender[data['sex']].append(float(data['charges']))

    return {gender: sum(costs) / len(costs) for gender, costs in costs_by_gender.items()}



average_costs_by_gender = costs_by_gender()

print(f"Average cost for males: {average_costs_by_gender['male']:.2f}")
print(f"Average cost for females: {average_costs_by_gender['female']:.2f}")


Average cost for males: 13956.75
Average cost for females: 12569.58
1338


On average we can see that the healthcare costs are higher for male patients by more than 1000 USD.

#### 6. Look at the costs based on Body Mass Index (BMI)

In [43]:
def get_average_costs_by_bmi(patient_data):
    bmi_categories = {
        'Underweight': [],
        'Normal weight': [],
        'Overweight': [],
        'Obesity': []
    }

    for data in patient_data:
        bmi = float(data['bmi'])
        charges = float(data['charges'])
        if bmi < 18.5:
            bmi_categories['Underweight'].append(charges)
        elif 18.5 <= bmi < 24.9:
            bmi_categories['Normal weight'].append(charges)
        elif 25 <= bmi < 29.9:
            bmi_categories['Overweight'].append(charges)
        else:
            bmi_categories['Obesity'].append(charges)

    return {category: {'count': len(costs), 'average': sum(costs) / len(costs) if costs else 0} for category, costs in bmi_categories.items()}

average_costs_by_bmi = get_average_costs_by_bmi(insurance_data)

print("Average costs based on BMI categories:")

for category, stats in average_costs_by_bmi.items():
    print(f"{category}: ${stats['average']:.2f} (Count: {stats['count']})")

    overweight_obese_count = average_costs_by_bmi['Overweight']['count'] + average_costs_by_bmi['Obesity']['count']
    underweight_normal_count = average_costs_by_bmi['Underweight']['count'] + average_costs_by_bmi['Normal weight']['count']

    ratio = overweight_obese_count / underweight_normal_count

    print(f"Count of overweight/obese patients divided by underweight/normal patients: {ratio:.2f}")

Average costs based on BMI categories:
Underweight: $8852.20 (Count: 20)
Count of overweight/obese patients divided by underweight/normal patients: 4.53
Normal weight: $10379.50 (Count: 222)
Count of overweight/obese patients divided by underweight/normal patients: 4.53
Overweight: $11017.60 (Count: 374)
Count of overweight/obese patients divided by underweight/normal patients: 4.53
Obesity: $15448.68 (Count: 722)
Count of overweight/obese patients divided by underweight/normal patients: 4.53


We use the typical BMI categories (underweight, normal, overweight, obese) to group the patients and calculate the average costs for each group. We can clearly see that the average costs increase with the increase of BMI category.

### Conclusion

From the current data analysis, it is clear that the costs are higher on average for patients with a higher BMI. Especially for obese patients, the costs are significantly higher. Also, the number of overweight / obese people is more than 4 and a half times the count of the other groups. This means that steps should be taken to reduce the number of obese patients in order to reduce the overall healthcare costs.