# U.S. Medical Insurance Costs

Given a set of insurance cost data, provided in insurance.csv, the following analysis seeks to determine if the region in which a customer lives has a significant impact on the cost of their health insurance. This will be done by comparing the average costs of coverage in each region

In [37]:
#imports necessary csv module and sets kernel to display matplotlib in notebook mode
%matplotlib notebook
import csv


regions= []
charges= []

#Opens the insurance csv file and reads through each row, appending the regions to the regions list, and the charges to the charges list
with open('insurance.csv', newline = '') as insurance_f:
    csv_dict = csv.DictReader(insurance_f)
    for row in csv_dict:
        regions.append(row['region'])
        charges.append(row['charges'])

#Declares lists for filling with associated charges        
southwest = []
southeast = []
northwest = []
northeast = []
     
#Zips region and charge lists into paired tuples for iteration    
zipped_lst = zip(regions, charges)

#Iterates through list of tuples, populates region lists with associated charges. Also Converts charges from string to float.
for i in zipped_lst:
    if i[0] == 'southwest':
        southwest.append(float(i[1]))
    if i[0] == 'southeast':
        southeast.append(float(i[1]))
    if i[0] == 'northwest':
        northwest.append(float(i[1]))
    else:
        northeast.append(float(i[1]))


In [42]:
#Function for calculating average cost for a region
def region_average(lst):
    return sum(lst) / len(lst)



In [44]:
#Calculates and stores the averages to associated variables
southwest_average = region_average(southwest)
southeast_average = region_average(southeast)
northwest_average = region_average(northwest)
northeast_average = region_average(northeast)


12346.93737729231


In [55]:
#imports statistics module for easy calculation of more complex statistical indicators
import statistics

#calculate, store, and print the variance of each sample set
southwest_variance = statistics.variance(southwest)
southeast_variance = statistics.variance(southeast)
northwest_variance = statistics.variance(northwest)
northeast_variance = statistics.variance(northeast)

print(
    f"""The variance for the southwest set is {southwest_variance}, the variance for the southeast set is {southeast_variance}, 
the variance for the northwest set is {northwest_variance}, and the variance for the northeast set is {northeast_variance}.
"""
    )

#calculate, store and print the standard deviation of each sample set
southwest_stdv = statistics.stdev(southwest)
southeast_stdv = statistics.stdev(southeast)
northwest_stdv = statistics.stdev(northwest)
northeast_stdv = statistics.stdev(northeast)

print(
f"""The standard deviation for the southwest set is {southwest_stdv} USD, the standard deviation for the southeast set is {southeast_stdv} USD, 
the standard deviation for the northwest set is {northwest_stdv} USD, and the standard deviation for the northeast set is {northeast_stdv} USD."""
)

The variance for the southwest set is 133568388.76678441, the variance for the southeast set is 195191595.78332722, 
the variance for the northwest set is 122595316.3610199, and the variance for the northeast set is 154190820.98318765.

The standard deviation for the southwest set is 11557.179100748781 USD, the standard deviation for the southeast set is 13971.098588991748 USD, 
the standard deviation for the northwest set is 11072.276927579976 USD, and the standard deviation for the northeast set is 12417.359662310972 USD.


In [56]:
#imports matplotlib for plotting
import matplotlib.pyplot as plt

#plots a categorical bar graph to display cost disparity between regions
plt.bar(["Southwest", "Southeast", "Northwest", "Northeast"], [southwest_average, southeast_average, northwest_average, northeast_average])

<IPython.core.display.Javascript object>

<BarContainer object of 4 artists>

The bar graph above shows a slight difference between regions in average cost of care, but due to the high variance and standard deviations, these variations are likely not significant. Overall, the proximity of costs in the different regions as well as our statistical measures indicates that region of the country does not correlate with insurance costs. This is a likely indicator that these two things are unrelated. 