# U.S. Medical Insurance Costs Project

#### The following cells include code utilizing Python functions, loops, classes, and more to make interpretations from a medical insurance cost dataset. 

Dataset info:

`age`: age of primary beneficiary in years

`sex`: insurance contractor gender (female, male)

`bmi`: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height,
objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9

`children`: Number of children covered by health insurance / Number of dependents

`smoker`: Smoking (yes or no)

`region`: the beneficiary's residential area in the US (northeast, southeast, southwest, or northwest)

`charges`: Individual medical costs billed by health insurance

Source: https://www.kaggle.com/mirichoi0218/insurance

***

### Step 1: Importing the dataset

In [243]:
import csv

age_raw = []
sex_raw = []
bmi_raw = []
children_raw = []
smoker_raw = []
region_raw = []
charges_raw = []

with open(r'C:\Users\hchld\Documents\Programming\Codecademy\US_Medical_Insurance\insurance.csv', newline = '') as csv_data:
    csv_data_dict = csv.DictReader(csv_data)
    for row in csv_data_dict:
        age_raw.append(row['age'])
        sex_raw.append(row['sex'])
        bmi_raw.append(row['bmi'])
        children_raw.append(row['children'])
        smoker_raw.append(row['smoker'])
        region_raw.append(row['region'])
        charges_raw.append(row['charges'])

#checking:
#print(age_raw)
#print(sex_raw)
#print(bmi_raw)
#print(children_raw)
#print(smoker_raw)
#print(region_raw)
#print(charges_raw)

### Step 2: Transforming the dataset

In [244]:
#convert age, bmi, children, and charges string to integer or float w/ list comprehension
age = [int(years) for years in age_raw]
bmi = [float(value) for value in bmi_raw]
children = [int(kids) for kids in children_raw]
charges = [float(cost) for cost in charges_raw]

#checking:
#print(age)
#print(bmi)
#print(children)
#print(charges)

In [245]:
#convert sex string to integer (1 = male, 0 = female)
for gender in range(len(sex)):
    if sex[gender] == 'male':
        sex[gender] = 1
    elif sex[gender] == 'female':
        sex[gender] = 0

#print(sex)

In [246]:
#convert smoker strings to integers (yes = 1, no = 0)
for item in range(len(smoker)):
    if smoker[item] == 'yes':
        smoker[item] = 1
    elif smoker[item] == 'no':
        smoker[item] = 0
        
#print(smoker)

In [247]:
#convert region string to integer (southwest = 0, northwest = 1, southeast = 2, northeast = 3)
for area in range(len(region)):
    if region[area] == 'southwest':
        region[area] = 0
    elif region[area] == 'northwest':
        region[area] = 1
    if region[area] == 'southeast':
        region[area] = 2
    if region[area] == 'northeast':
        region[area] = 3

#print(region)

### Step 3: Analyzing the dataset

In [248]:
#Average and range calculator function
def avg_rng_finder(list_name):
    list_total = 0
    for item in list_name:
        list_total += item
    list_average = list_total / len(list_name)
    list_range = max(list_name) - min(list_name)
    return list_average, list_range

#Average and range for age
age_average, age_range = avg_rng_finder(age)
print("The average age is {} and the range is {}.".format(round(age_average, 1), age_range))

#Average and range for bmi
bmi_average, bmi_range = avg_rng_finder(bmi)
print("The average bmi is {} and the range is {}.".format(round(bmi_average, 1), bmi_range))

#Average and range for children
children_average, children_range = avg_rng_finder(children)
print("The average number of children is {} and the range is {}.".format(round(children_average, 1), children_range))

#Average and range for charges
charges_average, charges_range = avg_rng_finder(charges)
print("The average insurance cost is ${} and the range is ${}.".format(round(charges_average, 2), round(charges_range, 2)))

The average age is 39.2 and the range is 46.
The average bmi is 30.7 and the range is 37.17.
The average number of children is 1.1 and the range is 5.
The average insurance cost is $13270.42 and the range is $62648.55.


In [249]:
#Determining most common using class
class Common:
    
    def __init__(self):
        self.sex = sex
        self.smoker = smoker
        self.region = region

    def common_sex():
        total_male = 0
        for gender in sex:
            total_male += gender
        #since male = 1 and female = 0, total of list = total number of males
        if total_male > len(sex) / 2:
            return ("The most common sex is male, with a total number of {} in a population of {}.".format(total_male, len(sex)))
        elif total_male < len(sex) / 2:
            return ("The most common sex is female, with a total number of {} in a population of {}.".format((len(sex) - total_male), len(sex)))
        else:
            return ("The number of males and females is equal in a population of {}.".format(len(sex)))
    
    def common_smoker():
        total_smoker = 0
        total_nonsmoker = 0
        for answer in smoker:
            if answer == 1:
                total_smoker += 1
            if answer == 0:
                total_nonsmoker += 1
        if total_smoker > total_nonsmoker:
            return ("The average individual is a smoker, with a total of {} smokers in a population of {}.".format(total_smoker, len(smoker)))
        elif total_smoker < total_nonsmoker:
            return ("The average individual is a non-smoker, with a total of {} non-smokers in a population of {}.".format(total_nonsmoker, len(smoker)))
        else:
            return ("There is an equal number of smokers and non-smokers in a population of {}.".format(len(smoker)))
        
    def common_region():
        #(southwest = 0, northwest = 1, southeast = 2, northeast = 3)
        southwest = 0
        northwest = 0
        southeast = 0
        northeast = 0
        for location in region:
            if location == 0:
                southwest += 1
            if location == 1:
                northwest += 1
            if location == 2:
                southeast += 1
            if location == 3:
                northeast += 1
        if southwest > northwest and southwest > southeast and southwest > northeast:
            return "The average person in this dataset lives in the southwest region."
        elif northwest > southwest and northwest > southeast and northwest > northeast:
            return "The average person in this dataset lives in the northwest region."
        elif southeast > southwest and southeast > northwest and southeast > northeast:
            return "The average person in this dataset lives in the southeast region."
        elif northeast > northwest and northeast > southwast and northeast > southwest:
            return "The average person in this dataset lives in the southwest region."
        
#most common sex
most_common_sex = Common.common_sex()
print(most_common_sex)
#most common smoker
most_common_smoker = Common.common_smoker()
print(most_common_smoker)
#most common region
most_common_region = Common.common_region()
print(most_common_region)

The most common sex is male, with a total number of 676 in a population of 1338.
The average individual is a non-smoker, with a total of 1064 non-smokers in a population of 1338.
The average person in this dataset lives in the southeast region.


In [250]:
#Create master 2D list
all_dataset = [[year, gender, mass, kids, smoke, area, cost] for year, gender, mass, kids, smoke, area, cost in zip(age,sex,bmi,children,smoker,region,charges)]

#Most expensive plans w/ attributes
most_expensive_charge = 0
most_expensive_attributes = []

for item in all_dataset:
    if item[-1] > most_expensive_charge:
        most_expensive_charge = item[-1]
        most_expensive_attributes = item
print("The most expensive charge is ${} and has the following attributes: {}".format(round(most_expensive_charge, 2), most_expensive_attributes[:-1]))

#cheapest plan w/ attributes
least_expensive_charge = 100000
least_expensive_attributes = []

for item in all_dataset:
    if item[-1] < least_expensive_charge:
        least_expensive_charge = item[-1]
        least_expensive_attributes = item
print("The least expensive charge is ${} and has the following attributes: {}".format(round(least_expensive_charge, 2), least_expensive_attributes[:-1]))

The most expensive charge is $63770.43 and has the following attributes: [54, 0, 47.41, 0, 1, 2]
The least expensive charge is $1121.87 and has the following attributes: [18, 1, 23.21, 0, 0, 2]


## Results

This project demonstrates Python utility for data science methods. From the calculations in Step 3, a number of conclusions can be established:
- The average age is 39.2 with a range of 46
- The average BMI is 30.7 with a range of 37.17
- The average number of children is 1.1 with a range of 5
- The average insurance cost is \$13270.42 and the range is \\$62648.55.
- The most common sex is male, with a total number of 676 in a population of 1338.
- The average individual is a non-smoker, with a total of 1064 non-smokers in a population of 1338.
- The average person in this dataset lives in the southeast region.
- The most expensive charge is \$63770.43 and has the following attributes: [54, 0, 47.41, 0, 1, 2]
    - This individual is a 54 year-old female smoker with a BMI of 47.41 and no children who lives in the southeast region
- The least expensive charge is \$1121.87 and has the following attributes: [18, 1, 23.21, 0, 0, 2]
    - This individual is an 18 year-old male non-smoker with a BMI of 23.21 and no children who lives in the southwest region

A number of actionable steps can be made from these results. For example, individuals in these regions can mirror the least expensive insurance plan, when possible, to reduce their insurance cost.