# U.S. Medical Insurance Costs

Here, as a start, is a low-fi version with csv as the only library used. 
1. Working in parallel to show how the go-to data modules (pandas, numpy, matplotlib, seaborn and what else is there?) can not only save us time and the need to write plenty of code, but also expand our possibilities by showing the same things in greater depth or beautifully visualized, seems like a fair and desirable goal.
2. Though the recommendations Codecademy is giving for this project seem like an interesting challenge in terms of learning Python, one could wonder if there might be more original questions to be answered?
(see Kanban board: Scoping Your Project). 

In [497]:
# first things first
import csv

Following the solution that is enclosed in this repository, here is a function that creates a list from a given attribute, 
pretty much as in the solution, but using list comprehensions instead of empty lists, for-loops and the append function.

In [498]:
def create_list(attribute):
    """from having read the file, we know there are seven specific rows:
    age, sex, bmi, children, smoker, region, charges
    we can use any of these as an argument (input: string)
    """
    with open("insurance.csv") as insur_csv:
        "we import our dataset as a variable (dictionary with attributes as keys)"
        insur_data = csv.DictReader(insur_csv)
        "and get a list of the attribute we chose"
        return [i[attribute] for i in insur_data]

In [499]:
# uncomment the print statement and add an attribute in between the quotes to try:
# print(create_list_on_attribute(" "))

The next super sweet thing that can be seen in the solution is a class for patient info.
I don't know about you guys, but I'm still keen on getting a better grip on Class(es)! But I want to try this without having to create all those lists beforehand.

In [500]:
class PatientsInfo:
    "here I'll build the class, starting with the init method"
    def __init__(self):
        "why would we use the attributes as arguments here, as we'll end up with only of instance of this class anyway?"
        self.age = create_list("age")
        self.sex = create_list("sex")
        self.bmi = create_list("bmi")
        self.children = create_list("children")
        self.smoker = create_list("smoker")
        self.region = create_list("region")
        self.charges = create_list("charges")
        
    def list_of_dict(self):
        return [{"Age": int(age), "Sex": sex, "BMI": float(bmi), "Children": int(children), "Smoker": smoker, "Region": region, "Charges": float(charges)} for age, sex, bmi, children, smoker, region, charges in zip(self.age, self.sex, self.bmi, self.children, self.smoker, self.region, self.charges)]
    
    def average_age_of_parents(self):
        counter = 0
        total_age = 0
        for person in self.list_of_dict():
            if person["Children"] >= 1:
                total_age += person["Age"]
                counter += 1
        return total_age / counter
        # """"alternate version"""
        # ages_and_children = [(age, child) for age, child in zip(self.age, self.children) if child != "0"]
        # return sum(int(i[0]) for i in ages_and_children) / len(ages_and_children)
    
    def most_common_regions(self):
        """Ok, to add something of my own, I would like to know what regions exist and how often we encounter them"""
        regions_as_dict = {}
        for i in self.region:
            if i not in regions_as_dict:
                regions_as_dict[i] = 1
            else:
                regions_as_dict[i] += 1
        return regions_as_dict

    def average(self, attribute):
        if attribute == "age": # discrete
            return self.calculate_average("age")
        if attribute == "sex": # nominal
            sex_binary = self.convert_to_binary("sex", ["female", "male"])
            return sum(sex_binary) / len(sex_binary)
        if attribute == "bmi": # continuous 
            return self.calculate_average("bmi")
        if attribute == "children": # discrete
            return self.calculate_average("children")
        if attribute == "smoker": # nominal
            smoker_binary = self.convert_to_binary("smoker", ["no", "yes"])
            return sum(smoker_binary) / len(smoker_binary)
        if attribute == "region": # nominal
            pass
        if attribute == "charges": # continuous
            return self.calculate_average("charges")

    def convert_to_binary(self, attr, input_value_list):
        output_value_list = ["0", "1"]
        return [int(i.replace(input_value_list[0], output_value_list[0]).replace(input_value_list[1], output_value_list[1])) for i in getattr(self, attr)]

    def calculate_average(self, attr):
        temp_list = getattr(self, attr)
        return sum(float(i) for i in temp_list) / len(temp_list)

    def smoker_charges(self):
        smokers_list = [i["Charges"] for i in self.list_of_dict() if i["Smoker"] == "yes"]
        return f"Smokers are charged ${float(round(sum(smokers_list) / len(smokers_list), 2))} on average, compared to ${round(self.calculate_average('charges'), 2)} as the general average."

    def sex_and_region(self):
        print("\nComparison of sexes and regions:")
        sex = set(self.sex)
        region = set(self.region)
        print("".rjust(6) + "|" + "total".rjust(10) + "|", end="")
        for i in region:
            print(i.rjust(10) + "|", end="")
        print("")
        for i in sex:
            print(i.rjust(6), end="|")
            charges_total = sum([float(charge) for charge, sex in zip(self.charges, self.sex) if sex == i])
            number_of = len([j for j in self.sex if j == i])
            print(str(round(charges_total / number_of, 2)).rjust(10), end="|")
            for j in region:
                charges = [float(charge) for charge, sex, region in zip(self.charges, self.sex, self.region) if sex == i and region == j]
                number_of = len(charges)
                print(str((round(sum(charges) / number_of, 2))).rjust(10), end="|")
            print("")
        

What could a function look like, that calculates an average out of any given attribute?

In [501]:
print(PatientsInfo().smoker_charges())
PatientsInfo().sex_and_region()


Smokers are charged $32050.23 on average, compared to $13270.42 as the general average.

Comparison of sexes and regions:
      |     total| southwest| northeast| northwest| southeast|
  male|  13956.75|  13412.88|  13854.01|  12354.12|  15879.62|
female|  12569.58|  11274.41|   12953.2|  12479.87|  13499.67|


In [502]:
print(max(PatientsInfo().children))
print(PatientsInfo().average("smoker"))

5
0.20478325859491778


Let's actually try the ages:

In [503]:
p = PatientsInfo()
# the average age of the patients was 39.21 years in the codecademy solution
print("Average age of all patients:", p.average("age"))
# the average age of the parents was 39.78 years in Murad's solution
print("Average age of all parents:", p.average_age_of_parents())
print("Age difference is:", abs(p.average("age") - p.average_age_of_parents()))

Average age of all patients: 39.20702541106129
Average age of all parents: 39.78010471204188
Age difference is: 0.573079300980595


Let's try the regions as well:

In [504]:
print(p.most_common_regions())

{'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}


Now this seems surprisingly balanced to me! Though without any understanding of statistics I couldn't tell if it really is, or if it is southeast biased?!

In [505]:
p.average("sex")

0.5052316890881914