# U.S. Medical Insurance Costs

### Getting familiar with the database

First we import our database and analyse how much patients do we have on hand?

In [153]:
import csv

filename = 'insurance.csv'
user_count = 0
insurance_list = list()

with open(filename, 'r') as insurance_csv:
    insurance_dict = csv.DictReader(insurance_csv)
    for user in insurance_dict:
        insurance_list.append(user)
        user_count += 1
        
print("There are", user_count, "users in the \'insurance.csv\' database we are going to analyse."); print()

There are 1338 users in the 'insurance.csv' database we are going to analyse.



what keys does the database have?

In [154]:
insurance_keys = list(insurance_list[0].keys())
print(insurance_keys)

['age', 'sex', 'bmi', 'children', 'smoker', 'region', 'charges']


We now know we have access to the patients:
 - `age`: In years.
 - `sex`: The gender. Either male or female.
 - `bmi`: Body Mass Index. 
 - `children`: Number of children. 
 - `smoker`: Either 'Yes' or 'No'. 
 - `region`: Actual living location. 
 - `charges`: insurance cost. 

Let's see the distribution of male and female cases in the database.
But before, we'll need a function that wil convert our values into percentages!

In [155]:
def percentage_convert(amount, in_data):
    return round((amount/ in_data) * 100, 2)

Now let's get on with the counting.

In [156]:
male_patients = 0
female_patients = 0

for user in insurance_list:
    if user['sex'] == 'male': male_patients += 1
    else: female_patients += 1
        
# Turning the values into percentages.
male_patients_percentage = percentage_convert(male_patients, user_count)
female_patients_percentage = percentage_convert(female_patients, user_count)

print("There are {} male patients, which is equal to {}% of the database.".format(male_patients, male_patients_percentage))
print("And {} female patients, which is equal to {}% of the database.".format(female_patients, female_patients_percentage)); print()

There are 676 male patients, which is equal to 50.52% of the database.
And 662 female patients, which is equal to 49.48% of the database.



## Analysing data

### Smokers

How much of the individuals are smokers? And which sex smokes more? We'll answer both of those questions in the latter code.


In [178]:
# Counting smokers and non smokers in the database.
smokers = 0
non_smokers = 0
smoking_patients = []
non_smoking_patients = []

for user in insurance_list:
    if user['smoker'] == 'yes' : 
        smokers += 1
        smoking_patients.append(user)
    else : 
        non_smokers += 1
        non_smoking_patients.append(user)
    
smokers_percentage = percentage_convert(smokers, user_count)
print("There are a total of", smokers, "smokers, out of the", user_count, "individuals. Which amounts to {}% of the patients.".format(smokers_percentage))
print("These means there are", non_smokers, "non smokers.")

There are a total of 274 smokers, out of the 1338 individuals. Which amounts to 20.48% of the patients.
These means there are 1064 non smokers.


What is the distribution of smokers between both genders?

In [158]:
# Determining smokers sex.
male_smokers = 0
female_smokers = 0

for smoker in smoking_patients:
    if smoker['sex'] == 'male' : male_smokers += 1
    else : female_smokers += 1
        
male_smokers_percentage = percentage_convert(male_smokers, male_patients)
female_smokers_percentage = percentage_convert(female_smokers, female_patients)
male_smokers_compared = percentage_convert(male_smokers, smokers)
female_smokers_compared = percentage_convert(female_smokers, smokers)

print("Out of the", smokers, "smokers,", male_smokers, "are male smokers and", female_smokers, "are female smokers. So {}% men, and {}% female.".format(male_smokers_compared, female_smokers_compared))
print("{}% of the male patients are smokers (male smokers/ male patients). Against {}% of female patients.".format(male_smokers_percentage, female_smokers_percentage)); print()

Out of the 274 smokers, 159 are male smokers and 115 are female smokers. So 58.03% men, and 41.97% female.
23.52% of the male patients are smokers (male smokers/ male patients). Against 17.37% of female patients.



What is the probability a smoker is male or female?

In [159]:
# Probabily smoker is male or female.
m_smokers_user_count = percentage_convert(male_smokers, user_count)
f_smokers_user_count = percentage_convert(female_smokers, user_count)
male_smoking_against_female = round((percentage_convert(m_smokers_user_count, f_smokers_user_count)) - 100, 2)

print("If you were to randomly pick a person from the database, you'd have {}% chance of falling upon a male smoker and {}% falling upon a female smoker.".format(m_smokers_user_count, f_smokers_user_count))
print("This means, men are {}% more likely to be smokers compared to women.".format(male_smoking_against_female)); print()

If you were to randomly pick a person from the database, you'd have 11.88% chance of falling upon a male smoker and 8.59% falling upon a female smoker.
This means, men are 38.3% more likely to be smokers compared to women.



### Insurance cost for smokers and non smokers

What is the average insurance cost for someone who smokes compared to those who don't? How important is the difference?

Let's start by counting the average insurance cost for a typical smoker.

In [172]:
# We sum the total charges of each smoker
total_smokers_insurance_cost = 0
for smoker in smoking_patients:
    total_smokers_insurance_cost += float(smoker['charges'])

# Then divide the value by the number of smokers
average_smokers_insurance_cost = round((total_smokers_insurance_cost/ smokers), 2)

In [175]:
print("On average, a smoker has to pay", average_smokers_insurance_cost, "dollars in charges for his insurance."); print()

On average, a smoker has to pay 32050.23 dollars in charges for his insurance.



Now let's get the average insurnace cost for a non smoker.

In [180]:
# Same process as before, we sum the total charges of each non-smoker
total_non_smokers_insurance_cost = 0
for non_smoker in non_smoking_patients:
    total_non_smokers_insurance_cost += float(non_smoker['charges'])

# Then divide the value by the number of non smokers
average_non_smokers_insurance_cost = round((total_non_smokers_insurance_cost/ non_smokers), 2)

In [181]:
print("On average, a non smoker has to pay", average_non_smokers_insurance_cost, "dollars in charges for his insurance."); print()

On average, a non smoker has to pay 8434.27 dollars in charges for his insurance.



That's a big difference! Let's see how much exactly.

In [191]:
price_difference = average_smokers_insurance_cost - average_non_smokers_insurance_cost
difference_in_times = round((average_smokers_insurance_cost/ average_non_smokers_insurance_cost), 3)

print("On average, a smoker has to pay", str(price_difference)+"$", "more yearly. That's a whopping", difference_in_times, "times difference to the non smokers!")
print("We can conclude on something that will help you lower your insurance cost: don't smoke, eat clean, lift weights."); print()

On average, a smoker has to pay 23615.96$ more yearly. That's a whopping 3.8 times difference to the non smokers!
We can conclude on something that will help you lower your insurance cost: don't smoke, eat clean, lift weights.

