# U.S. Medical Insurance Costs

In [2]:
import numpy as np
import pandas as pd

In [96]:
medical_cost = pd.read_csv("insurance.csv");
print(medical_cost)

      age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520
...   ...     ...     ...       ...    ...        ...          ...
1333   50    male  30.970         3     no  northwest  10600.54830
1334   18  female  31.920         0     no  northeast   2205.98080
1335   18  female  36.850         0     no  southeast   1629.83350
1336   21  female  25.800         0     no  southwest   2007.94500
1337   61  female  29.070         0    yes  northwest  29141.36030

[1338 rows x 7 columns]


There are 7 features (columns) and 1338 records (rows). Let's see if there any missing data

In [97]:
print(medical_cost.dtypes,"\n")
print(medical_cost.isna().any())

age           int64
sex          object
bmi         float64
children      int64
smoker       object
region       object
charges     float64
dtype: object 

age         False
sex         False
bmi         False
children    False
smoker      False
region      False
charges     False
dtype: bool


As shown above there is no missing value.

### Let's check any bias in parents vs non-parents

#### Helper Functions

In [137]:
def get_percent_no_children():
    return medical_cost[medical_cost.children == 0]['children'].count() / medical_cost.children.count()*100

def get_average_age(rule):
    return medical_cost[rule].age.mean()
    
def get_average_cost(rule):
    return medical_cost[rule].charges.mean()

### Percentage of People that do not have any children

In [131]:
percent_no_children = get_percent_no_children()
print("{}%".format(round(percent_no_children,2)))

42.9%


As show above, there is slightly a low number of people with no children in our data in relation with parents of at least one child.

### Avarage age 

In [135]:
parents_avg_age = get_average_age(medical_cost.children > 0)
non_parent_avg_age = get_average_age(medical_cost.children == 0)
print("Average age for parents: ",round(parents_avg_age,2))
print("Average age for non-parents",round(non_parent_avg_age,2))

Average age for parents:  39.78
Average age for non-parents 38.44


What is show above might be counterintuitive. The average ages for parents and non-parents yield no great difference.

### Insurance Cost

In [138]:
parents_avg_cost = get_average_cost(medical_cost.children > 0)
non_parent_avg_cost = get_average_cost(medical_cost.children == 0) 
print("Average insurante cost for parents: {:,.2f}".format(parents_avg_cost))
print("Average insurante cost for non-parents: {:,.2f}".format(non_parent_avg_cost))

Average insurante cost for parents: 13,949.94
Average insurante cost for non-parents: 12,365.98


At the end, the average insurace cost for both groups makes sense since it covers children, so it increases as the number of children increases.

### Conclusion

There is no bias between parents and non-parents in the given the given data set. Indeed, number of children does influece what you end up paying for your insurance. 