# U.S. Medical Insurance Costs

## Overview

This notebook explores a dataset of U.S. medical insurance records and summarizes patterns across demographic and lifestyle attributes. The analysis organizes the raw CSV data into Python lists and encapsulates common computations in a small utility class.

The dataset (insurance.csv) contains the following columns:
- age (years)
- sex (female/male)
- bmi (body mass index)
- children (number of dependents)
- smoker (yes/no)
- region (U.S. region)
- charges (annual medical insurance cost in USD)

Results include an average patient age, counts by sex, the set of unique regions represented, and an average annual insurance charge. A consolidated patient dictionary is also produced to facilitate later exploration.

Implementation notes are embedded as TODO comments in code cells.

In [7]:
import csv

## Data containers

Seven lists serve as simple, readable containers for each column from the CSV file: ages, sexes, bmis, children, smokers, regions, and charges. This structure mirrors the file layout and keeps the analysis approachable without external libraries.

In [8]:
ages = []
sexes = []
bmis = []
children = []
smokers = []
regions = []
charges = []

## Loading utility

A small helper function reads the CSV with Python’s built-in csv module and appends values from a specified column into a provided list. Using a single loader avoids repeating similar file-reading loops and centralizes assumptions about the input format.

In [9]:
def load_data(list_name, csv_file, column_name):
    with open(csv_file, newline='') as csvfile:
        csv_dict = csv.DictReader(csvfile)
        for row in csv_dict:
            list_name.append(row[column_name])

    return list_name

## Column ingestion

Each list is populated from the corresponding column in the CSV file (age, sex, bmi, children, smoker, region, charges). Values are read as strings at first, deferring type conversion until the analysis step where numeric operations are needed.

In [10]:
# call load_list_data(...) for each column
load_data(ages, 'insurance.csv', 'age')
load_data(sexes, 'insurance.csv', 'sex')
load_data(bmis, 'insurance.csv', 'bmi')
load_data(children, 'insurance.csv', 'children')
load_data(smokers, 'insurance.csv', 'smoker')
load_data(regions, 'insurance.csv', 'region')
load_data(charges, 'insurance.csv', 'charges')

# print first 10 items in each list to verify data was loaded correctly
print(ages[:10])
print(sexes[:10])
print(bmis[:10])
print(children[:10])
print(smokers[:10])
print(regions[:10])
print(charges[:10])

['19', '18', '28', '33', '32', '31', '46', '37', '37', '60']
['female', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'female']
['27.9', '33.77', '33', '22.705', '28.88', '25.74', '33.44', '27.74', '29.83', '25.84']
['0', '1', '3', '0', '0', '0', '1', '3', '2', '0']
['yes', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no']
['southwest', 'southeast', 'southeast', 'northwest', 'northwest', 'southeast', 'southeast', 'northwest', 'northeast', 'northwest']
['16884.924', '1725.5523', '4449.462', '21984.47061', '3866.8552', '3756.6216', '8240.5896', '7281.5056', '6406.4107', '28923.13692']


## Analysis class

The PatientsInfo class encapsulates simple computations on the loaded lists. Methods include:
- average_age: computes the mean patient age (rounded)
- analyze_sexes: reports counts by sex
- unique_regions: collects the distinct regions present
- average_charges: computes the mean annual charge (rounded)
- create_dictionary: assembles a dictionary view of the dataset

Numeric casting (e.g., age to int, charges to float) occurs inside methods to keep the data containers uniform.

In [11]:
# TODO: implement the PatientsInfo class and its methods
class PatientsInfo:
    def __init__(self, ages, sexes, bmis, children, smokers, regions, charges):
        self.ages = ages
        self.sexes = sexes
        self.bmis = bmis
        self.children = children
        self.smokers = smokers
        self.regions = regions
        self.charges = charges

    def average_age(self):
        return sum(map(int, self.ages)) / len(self.ages)

    def average_bmi(self):
        return sum(map(float, self.bmis)) / len(self.bmis)

    def average_children(self):
        return sum(map(int, self.children)) / len(self.children)

    def smoker_percentage(self):
        return sum(1 for x in self.smokers if x == "yes") / len(self.smokers) * 100
    
    def smoker_to_charge_ratio(self):
        smoker_charges = [float(self.charges[i]) for i in range(len(self.smokers)) if self.smokers[i] == "yes"]
        non_smoker_charges = [float(self.charges[i]) for i in range(len(self.smokers)) if self.smokers[i] == "no"]
        avg_smoker_charge = sum(smoker_charges) / len(smoker_charges) if smoker_charges else 0
        avg_non_smoker_charge = sum(non_smoker_charges) / len(non_smoker_charges) if non_smoker_charges else 0
        return avg_smoker_charge / avg_non_smoker_charge if avg_non_smoker_charge != 0 else float('inf')
    
    def average_charge(self):
        return sum(map(float, self.charges)) / len(self.charges)

    def analyze_sexes(self):
        females = sum(1 for i in self.sexes if i == "female")
        males = sum(1 for i in self.sexes if i == "male")
        return {"female": females, "male": males}

    def unique_regions(self):
        return set(self.regions)

    def create_dictionary(self):
        return {
            "ages": self.ages,
            "sexes": self.sexes,
            "bmis": self.bmis,
            "children": self.children,
            "smokers": self.smokers,
            "regions": self.regions,
            "charges": self.charges
        }
    
    def print_summary(self):
        print(f"Average Age: {self.average_age()}")
        print(f"Average BMI: {self.average_bmi()}")
        print(f"Average Number of Children: {self.average_children()}")
        print(f"Smoker Percentage: {self.smoker_percentage()}%")
        print(f"Smoker to Charge Ratio: {self.smoker_to_charge_ratio()}")
        print(f"Average Charge: ${self.average_charge()}")
        print(f"Unique Regions: {self.unique_regions()}")
        

## Results checkpoint

An instance of PatientsInfo provides a quick snapshot of the dataset: the average age of the population, sex distribution, the set of represented regions, and the typical annual charge. A dictionary representation is also produced for downstream uses such as plotting or grouping.

In [12]:
# TODO: create an instance and call each method
patients_info = PatientsInfo(ages, sexes, bmis, children, smokers, regions, charges)

patients_info.average_age()
patients_info.average_bmi()
patients_info.average_children()
patients_info.smoker_percentage()
patients_info.smoker_to_charge_ratio()
patients_info.average_charge()
patients_info.analyze_sexes()
patients_info.unique_regions()
patients_info.create_dictionary()

patients_info.print_summary()

Average Age: 39.20702541106129
Average BMI: 30.663396860986538
Average Number of Children: 1.0949177877429
Smoker Percentage: 20.47832585949178%
Smoker to Charge Ratio: 3.8000014582983206
Average Charge: $13270.422265141257
Unique Regions: {'southeast', 'southwest', 'northwest', 'northeast'}


## Further exploration

Additional avenues include distributional summaries (min, max, median, standard deviation of age), comparisons of average charges by smoker status, region, or number of children, BMI category analyses, and basic correlation checks between numeric attributes. Such extensions can remain in pure Python or transition to libraries like pandas and numpy for convenience.