US Medical Insurance Project is a pretty classic project which will use python to manipulate a set of data from 2018. The objective is to organize and analyze insurance.csv to learn more about the patient information provided.
There are three questions I want to analyze in this project:
- Compare the average insurance cost for male and female, see which gender cost more 
- find out how much money a non-smoker saves 
- what is the average cost per region

In [20]:
import csv

Firstly, import the libraries which is a csv in this project.
After looking up the insurance.csv, we found out that each patient was provided with the following information:

Age
Sex
BMI
Number of Children
Smoker Status
Region
Charges

An empty list is created for each of these lists, which are populated with the corresponding values from the .csv file.

In [21]:
ages = []
gender_list = []
bmi_list = []
children_list = []
smoker_status = []
regions = []
costs = []

And now, we will create a function to import insurance.csv to Python script

In [22]:
def load_data(lst, csv_file, column_name): 
    with open(csv_file) as csv_info:
        csv_dict = csv.DictReader(csv_info)
        for row in csv_dict:
            lst.append(row[column_name])

The function above allows us to minimize the number of repetitions of opening .csv file and adding the value of each column to each list. Using this function, we can simply call it for every list and column we want.

In [147]:
load_data(ages, 'insurance.csv', 'age')
load_data(gender_list, 'insurance.csv', 'sex')
load_data(bmi_list, 'insurance.csv', 'bmi')
load_data(children_list, 'insurance.csv', 'children')
load_data(smoker_status, 'insurance.csv', 'smoker')
load_data(regions, 'insurance.csv', 'region')
load_data(costs, 'insurance.csv', 'charges')

Our data is all set, so we can start to analyze.
I will put the three questions I mentioned in the beginning again:
- Compare the average insurance cost for male and female, see which gender cost more 
- find out how much money a non-smoker saves 
- what is the average cost per region

To deal with those questions, we can make a class called CustmorInfo which includes these information:
analyze_gender_costs()
analyze_regions()
analyze_smoking()

In [148]:
class CustmorInfo:
    def __init__(self, ages, gender_list, bmi_list, children_list, smoker_status, regions, costs):
        self.ages = ages
        self.gender_list = gender_list
        self.bmi_list = bmi_list
        self.children_list = children_list
        self.smoker_status = smoker_status
        self.regions = regions
        self.costs = costs
    def analyze_gender_costs(self):
        # For the question 1, separate the costs for male and female
        male_costs = []
        female_costs = []
        gender_cost = list(zip(self.gender_list, self.costs))
        for gender, cost in gender_cost:
            if gender == "male":
                male_costs.append(float(cost))
            else:
                female_costs.append(float(cost))
        # finding average cost for each gender rounded to two decimal places, and the difference between them, as below
        male_average_cost = round(sum(male_costs) / len(male_costs), 2)
        female_average_cost = round(sum(female_costs) / len(female_costs), 2)
        cost_difference = abs(male_average_cost - female_average_cost)
        # returning the result
        if male_average_cost < female_average_cost:
            return "The average cost for male insurance is {}, and the average cost for female insurance is {}, so male insurance is cheaper by {} on average.".format(male_average_cost, female_average_cost, cost_difference)
        else:
            return "The average cost for male insurance is ${}, and the average cost for female insurance is {}, so female insurance is cheaper by ${} on average.".format(male_average_cost, female_average_cost, cost_difference) 

    def analyze_smoker_status(self):
        smoker_cost_sum = 0
        smoker_count = 0
        nonsmoker_cost_sum = 0
        nonsmoker_count = 0
        for smoker, cost in list(zip(smoker_status, costs)):
            if smoker == 'yes':
                smoker_cost_sum += float(cost)
                smoker_count += 1
            else:
                nonsmoker_cost_sum += float(cost)
                nonsmoker_count += 1
        smoker_cost_avg = round(smoker_cost_sum/smoker_count, 2)
        nonsmoker_cost_avg = round(nonsmoker_cost_sum/nonsmoker_count, 2)
        avg_diff = smoker_cost_avg - nonsmoker_cost_avg
        # printing results
        print("On average, smokers pay ${} more for their insurance than non-smokers.".format(avg_diff))

    def analyze_regions(self):
        # separating costs from each region into their own respective lists
        ne_costs = []
        se_costs = []
        sw_costs = []
        nw_costs = []
        region_cost = list(zip(self.regions, self.costs))
        for region, cost in region_cost:
            if region == 'northeast':
                ne_costs.append(float(cost))
            elif region == 'southeast':
                se_costs.append(float(cost))
            elif region == 'southwest':
                sw_costs.append(float(cost))
            elif region == 'northwest':
                nw_costs.append(float(cost))
        # finding average cost for each list, rounded to two decimal places
        ne_avg_cost = round(sum(ne_costs)/len(ne_costs), 2)
        se_avg_cost = round(sum(se_costs)/len(se_costs), 2)
        sw_avg_cost = round(sum(sw_costs)/len(sw_costs), 2)
        nw_avg_cost = round(sum(nw_costs)/len(nw_costs), 2)
        # printing results
        print("The average insurance cost per region of the U.S. is as follows:\nNortheast: ${} \nSoutheast: ${} \nSouthwest: ${} \nNorthwest: ${}".format(ne_avg_cost, se_avg_cost, sw_avg_cost, nw_avg_cost))


Now, we will use several examples to test if our methods work.

In [149]:
custmor_info = CustmorInfo(ages, gender_list, bmi_list, children_list, smoker_status, regions, costs)

In [150]:
custmor_info.analyze_gender_costs()

'The average cost for male insurance is $13956.75, and the average cost for female insurance is 12569.58, so female insurance is cheaper by $1387.17 on average.'

In [151]:
custmor_info.analyze_smoker_status()

On average, smokers pay $23615.96 more for their insurance than non-smokers.


In [152]:
custmor_info.analyze_regions()

The average insurance cost per region of the U.S. is as follows:
Northeast: $13406.38 
Southeast: $14735.41 
Southwest: $12346.94 
Northwest: $12417.58
