# U.S. Medical Insurance Costs

For this project, we will be analyzing an insurance dataset in the form of a .csv file. 

# Importing the necessary libraries

First, we will import the necessary libraries needed for the project

In [77]:
import csv 

# Reading and Storing the data

Next, we will open 'insurance.csv' and store the data in associated lists.

In [78]:
ages = []
sex = []
bmi = []
children = []
smoker_status = []
region = []
insurance_charges = []

In [79]:
def load_data(lst, csv_file, column_name):
    with open('insurance.csv') as insurance_file:
        csv_dict = csv.DictReader(insurance_file)
        for row in csv_dict:
            lst.append(row[column_name])
        return lst

In [85]:
load_data(ages, 'insurance.csv', 'age');
load_data(sex, 'insurance.csv', 'sex');
load_data(bmi, 'insurance.csv', 'bmi');
load_data(children, 'insurance.csv', 'children');
load_data(smoker_status, 'insurance.csv', 'smoker');
load_data(region, 'insurance.csv', 'region');
load_data(insurance_charges, 'insurance.csv', 'charges');

# Analysis 

Function to find the average age in the dataset. 

In [81]:
def averageAge(ages):
    sum = 0
    for age in ages:
        sum += int(age)
    return sum/len(ages)

print(averageAge(ages))

39.20702541106129


We can see that the average age is `~39`

Finding the region where the majority in the dataset comes from.

In [82]:
def averageRegion(region):
        regionCount = {}
        for region in region:
            if not region in regionCount:
                regionCount[region] = 0
            else:
                currentCount = regionCount[region] + 1
                regionCount[region] = currentCount
        return regionCount

print(averageRegion(region))

{'southwest': 324, 'southeast': 363, 'northwest': 324, 'northeast': 323}


We can see that the majority come from the `Southeast` at `363 times`

Analyzing the difference cost between smokers vs. non-smokers.

In [83]:
def smokerDifference(smoker_status, insurance_charges):
        smokerPatients = []
        nonSmokerPatients = []
        smokerSum = 0
        nonSmokerSum = 0
        
        for i,patient in enumerate(smoker_status):
            if(patient == "yes"):
                smokerPatients.append(i)
            else:
                nonSmokerPatients.append(i)
        
        for smoker in smokerPatients:
            smokerSum += float(insurance_charges[smoker])
            smokerAverage = smokerSum / len(smokerPatients)
            
        for nonSmoker in nonSmokerPatients:
            nonSmokerSum += float(insurance_charges[nonSmoker])
            nonSmokerAverage = nonSmokerSum / len(nonSmokerPatients)
         
        smokerDifference = abs(smokerAverage - nonSmokerAverage)
        
        print("The average insurance charge of smokers: " + str(smokerAverage))
        print("The average insurance charge of non-smokers: " + str(nonSmokerAverage))
        print("The insurance charge difference between smokers and non-smokers: " + str(smokerDifference))
        print("Total number of smokers: " + str(len(smokerPatients)))
        print("Total number of non-smokers: " + str(len(nonSmokerPatients)))
        
print(smokerDifference(smoker_status, insurance_charges))

The average insurance charge of smokers: 32050.23183153285
The average insurance charge of non-smokers: 8434.268297856199
The insurance charge difference between smokers and non-smokers: 23615.96353367665
Total number of smokers: 274
Total number of non-smokers: 1064
None


With this function, we can see that the insurance charge difference between smokers and non-smokers is `~23616.0`. This number can tell us that on average, smokers pay more than non-smokers. Added, we can also see that the majority of people in that dataset are non-smokers. This can indicate potential bias in the sampling of the data.

Average age for someone with at least one child.

In [84]:
def childrenAgeAverage(children, ages):
    patientsWithChildren = []
    ageSum = 0
        
    for i,count in enumerate(children):
        if int(count) > 0:
            patientsWithChildren.append(i)
        
    for i in patientsWithChildren:
        ageSum += int(ages[i])
            
    return ageSum / len(ages)

print(childrenAgeAverage(children, ages))

22.714499252615845


This function tells us that the average age of someone with at least one child is `~22 years old`

# Conclusion

Within this project, we have analyzed a dataset containing information about US Medical Insurance Costs. After analyzing this dataset, we went more in depth b examining the average age, majority region, insurance cost difference between smokers and non-smokers, and the average age of someone with at least one child. 