# U.S. Medical Insurance Costs

## Introduction

In this analysis, I will explore an insurance dataset containing information about patients, including their age, se, BMI, number of children, smoking status, region, and charges.

## Data

### Data Description
The dataset contains the following columns:
- Age: Age of the patient
- Sex: Gender of the patient
- BMI: Body Mass Index of the patient
- Children: Number of children the patient has
- Smoker: Smoking status of the patient (yes/no)
- Region: Region where the patient resides
- Charges: Medical charges billed to the patient

## Goals

1. Analyze the demographic characteristics of the individuals in the dataset.
2. Determine the geographic distribution of the individuals.
3. Compare the costs between smokers and non-smokers and analyze their demographics.
4. Determine the average age of individuals with at least one child.

## Analysis (Without Pandas)

In [1]:
import csv

In [2]:
# function to read the csv file
def read_csv(filename):
    data = []
    with open(filename, "r") as file:
        reader = csv.DictReader(file)
        for row in reader:
            data.append(row)
    return data

In [3]:
# load the dataset
insurance_data = read_csv("insurance.csv")
print(insurance_data[:5])

[{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}, {'age': '18', 'sex': 'male', 'bmi': '33.77', 'children': '1', 'smoker': 'no', 'region': 'southeast', 'charges': '1725.5523'}, {'age': '28', 'sex': 'male', 'bmi': '33', 'children': '3', 'smoker': 'no', 'region': 'southeast', 'charges': '4449.462'}, {'age': '33', 'sex': 'male', 'bmi': '22.705', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '21984.47061'}, {'age': '32', 'sex': 'male', 'bmi': '28.88', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '3866.8552'}]


In [4]:
# extract columns from the dataset
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
charges = []

for row in insurance_data:
    ages.append(float(row['age']))
    sexes.append(row['sex'])
    bmis.append(float(row['bmi']))
    num_children.append(int(row['children']))
    smoker_statuses.append(row['smoker'])
    regions.append(row['region'])
    charges.append(float(row['charges']))


In [5]:
# function to calculate the average age
def average(ages):
    total_age = 0
    for age in ages:
        total_age += age
    average_age = total_age / len(ages)
    return average_age

average_age = average(ages)
print("Average_age", average_age)

Average_age 39.20702541106129


In [6]:
# function to count the number of insured individuals by region
def count_by_region(regions):
    region_counts = {}
    for region in regions:
        region_counts[region] = region_counts.get(region, 0)  + 1
    return region_counts

region_counts = count_by_region(regions)
print("Count by Region:", region_counts)

Count by Region: {'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}


In [7]:
# function to compare the insurance charges between smokers and non-smokers
def compare_charges_by_smoking_status(charges, smoker_statuses):
    smoker_charges = []
    non_smoker_charges = []
    for charge, smoker_status in zip(charges, smoker_statuses):
        if smoker_status == "yes":
            smoker_charges.append(charge)
        else:
            non_smoker_charges.append(charge)
    avg_charges = {'smoker': average(smoker_charges),
                   'non-smoker': average(non_smoker_charges)}
    return avg_charges

avg_charges_by_smoking = compare_charges_by_smoking_status(charges, smoker_statuses)
print("Average Charges by Smoking Status:", avg_charges_by_smoking)

Average Charges by Smoking Status: {'smoker': 32050.23183153285, 'non-smoker': 8434.268297856199}


In [8]:
# function to determine the average age of individuals with at least one child
def average_age_with_children(ages, num_children):
    age_with_children = [age for age, children in zip(ages, num_children) if children >= 1]
    average_age_with_children = average(age_with_children)
    return average_age_with_children

average_age_with_children = average_age_with_children(ages, num_children)
print("Average Age of Individuals with Children:", average_age_with_children)


Average Age of Individuals with Children: 39.78010471204188


**Conclusion:**

Here are the key findings from my analysis:

1. **Average Age**: The average age of insured individuals in the dataset is approximately 39 years.

2. **Distribution by Region**: I observed the distribution of insured individuals across different regions. `southeast` had the highest number of `364` insured individuals, followed by `southwest`: `325`, `northwest`: `325` and lastly `northeast`: `324`.

3. **Comparison of Insurance Charges**: I compared the average insurance charges between smokers and non-smokers. On average, smokers tend to have higher insurance charges `32,050`compared to non-smokers `8434`.

4. **Average Age of Individuals with Children**: The average age of individuals who have at least one child is around `40 years`.

These findings provide valuable insights for insurance companies in understanding their customer demographics and pricing strategies. Further analysis and exploration could involve investigating additional factors such as BMI, gender, and their impact on insurance charges.

Overall, this analysis contributes to our understanding of the insurance landscape and can help inform decision-making processes for insurance providers.