# U.S. Medical Insurance Costs

Goal: In this project, a CSV file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within insurance.csv to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

Actions: Provide the medical insurance industry with analysis to improve customer experience

Analyses:,
 1. Basic information.
 2. Find out the average age of the patients in the dataset.
 3. Analyze where a majority of the individuals are from.
 4. Look at the different costs between smokers vs. non-smokers.
 5. Figure out what the average age is for someone who has at least one child in this dataset.

First step: Load the dataset.

In [9]:
import csv
full_data_set = []

with open('insurance.csv', 'r') as insurance_csv_dataset:
    csv_reader = csv.DictReader(insurance_csv_dataset)
    for row in csv_reader:
        full_data_set.append(row)
        #print(full_data_set)
        
#Create separate lists for each category
ages = []
sexes = []
bmis = []
children = []
smokers = []
regions = []
charges = []

#Extract data from the full_data_set and populate the separate lists
for data in full_data_set:
    ages.append(int(data['age']))
    sexes.append(data['sex'])
    bmis.append(float(data['bmi']))
    children.append(int(data['children']))
    smokers.append(data['smoker'])
    regions.append(data['region'])
    charges.append(float(data['charges']))

# Now you have separate lists for each category\n",
#print(ages)
#print(sexes)
#print(bmis)
#print(children)
#print(smokers)
#print(regions)
#print(charges)

In [11]:
#Basic data

total_patients = len(full_data_set)
print("Total ammount of patients is:" + str(total_patients))

def total_men_calculation(lst):
    list_of_men = 0
    for patient in lst:
        if patient == 'male':
            list_of_men += 1
    return list_of_men

def total_women_calculation(lst):
    list_of_women = 0
    for patient in lst:
        if patient == 'female':
            list_of_women += 1
        
    return list_of_women

total_men = total_men_calculation(sexes)
print("Total ammount of men is: " + str(total_men))
total_women = total_women_calculation(sexes)
print("Total ammount of women is: " + str(total_women))

Total ammount of patients is:1338
Total ammount of men is: 676
Total ammount of women is: 662


Build out analysis functions or class methods

In [13]:
# Calculate the average age of the patients in the dataset
average_age_patients = round(sum(ages) / len(ages), 1)
print("The average age of the patients is:", average_age_patients, "years old")

The average age of the patients is: 39.2 years old


In [20]:
# Analyze where a majority of the individuals are from.

# First, find unique regions
def unique_regions(lst):
    unique_regions_list = []
    for region in lst:
        if region not in unique_regions_list:
            unique_regions_list.append(region)
    return unique_regions_list

print(unique_regions(regions))

def majority_indiv_regions(lst):
    southwest = 0
    southeast = 0
    northwest = 0
    northeast = 0
    
    for region in lst:
        if region == 'southwest':
            southwest += 1
        elif region == 'southeast':
            southeast += 1
        elif region == 'northwest':
            northwest += 1
        elif region == 'northeast':
            northeast += 1
            
    print("In southwest, there are " + str(southwest) + " patients.")
    print("In southeast, there are " + str(southeast) + " patients.")
    print("In northwest, there are " + str(northwest) + " patients.")
    print("In northeast, there are " + str(northeast) + " patients.")

print(majority_indiv_regions(regions))

['southwest', 'southeast', 'northwest', 'northeast']
In southwest, there are 325 patients.
In southeast, there are 364 patients.
In northwest, there are 325 patients.
In northeast, there are 324 patients.
None


In [16]:
# Look at the different costs between smokers vs. non-smokers.

# Combine smoker and costs

smoker_costs = list(zip(smokers, charges))

# Define a function to calculate the average cost difference between smokers and non-smokers

def average_cost_difference_smoker_vs_nonsmokers(smokercosts):
    
    smoker_costs_smoker = 0
    num_smokers = 0
    smoker_costs_non_smoker = 0
    num_non_smokers = 0
    
    for smoker in smokercosts:
        if smoker[0] == 'yes':
            smoker_costs_smoker += smoker[1]
            num_smokers += 1
        elif smoker[0] == 'no':
            smoker_costs_non_smoker += smoker[1]
            num_non_smokers += 1
    
    average_smoker_cost = smoker_costs_smoker / num_smokers
    average_non_smoker_cost = smoker_costs_non_smoker / num_non_smokers
    difference_smoker_costs = average_smoker_cost - average_non_smoker_cost
    
    print("The average smokers insurance cost is: " + str(average_smoker_cost) + " dollars.")
    print("The average non-smokers insurance cost is: " + str(average_non_smoker_cost) + " dollars.")
    print("The difference in costs on average is: " + str(difference_smoker_costs) + " dollars.")

print(average_cost_difference_smoker_vs_nonsmokers(smoker_costs))

The average smokers insurance cost is: 32050.23183153285 dollars.
The average non-smokers insurance cost is: 8434.268297856199 dollars.
The difference in costs on average is: 23615.96353367665 dollars.
None


In [18]:
# Figure out what the average age is for someone who has at least one child in this dataset

# Combine age and children

age_with_children = list(zip(ages, children))

# Define a function to calculate the average age for those with at least one child

def average_age_atleast_1_child(lst):
    sum_age_with_children = 0
    patient_with_child = 0
    
    for patient in lst:
        if patient[1] > 0:
            sum_age_with_children += patient[0]
            patient_with_child += 1
    
    average_age_with_1_or_more_child = round(sum_age_with_children / patient_with_child, 1)
    
    print("The average age for someone with at least one child in the patient data set is: " + str(average_age_with_1_or_more_child) + " years old.")

print(average_age_atleast_1_child(age_with_children))

The average age for someone with at least one child in the patient data set is: 39.8 years old.
None


# Summary

1. Total ammount of patients is 1338. 676 men and 662 women. 
2. The average age of the patients is: 39.2 years old
3. Patients per area:
- 325 in southwest. 
- 364 in southeast.
- 325 in northwest.
- 324 in northeast.
4. Insurance costs:
- The average smokers insurance cost is: 32050.23183153285 dollars.
- The average non-smokers insurance cost is: 8434.268297856199 dollars.
- The difference in costs on average is: 23615.96353367665 dollars.
5. The average age for someone with at least one child in the patient data set is: 39.8 years old.