# U.S. Medical Insurance Costs


Download a zip file here with the necessary datasets and an empty Jupyter Notebook where you can write your code.

Open insurance.csv and take a look at the file. Take note of how information is organized. How will this affect how you analyze the data in Python? Is there anything of particular interest to you in the dataset that you want to investigate? Think about these things before you jump into analyzing it.

In [1]:
import pandas as pd
import csv
import numpy as np


Import your dataset
Import insurance.csv into your Python file and inspect the contents.

In [2]:
insurance = pd.read_csv('/Users/nochomo/Documents/GitHub/insurance.csv')
print(insurance.head())
print('\nTotal number of records is {}\n'.format(len(insurance)))
print('The data was collected from the following regions: {}'.format(insurance.region.unique()))

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520

Total number of records is 1338

The data was collected from the following regions: ['southwest' 'southeast' 'northwest' 'northeast']


## The Goals of this analysis will include:
1. Find out the average cost of insurance across the country and within each region.
2. Analyze where a majority of the individuals are from.
3. Investigating the impact of smoking on insurance costs.
4. The differences in insurance costs between males and females keeping other factors constant

## Average cost of insurance.
Calculating average insurance cost, Median, Percentiles and deviation

In [24]:
insurance_cost_mean = insurance.charges.mean()
insurance_cost_q1 = np.percentile(insurance.charges, 25)
insurance_cost_q3 = np.percentile(insurance.charges, 75)
insurance_cost_median = insurance.charges.median()
insurance_cost_std = insurance.charges.std()
print('The average cost os insurance is {a}$, with a standard deviation of {b}$.\n\
The median insurance cost is {c}$.\n\
The Interquartile range is {d}$'.format(
                                    a = round(insurance_cost_mean),
                                    b = round(insurance_cost_std),
                                    c = round(insurance_cost_median),
                                    d = round(insurance_cost_q3 - insurance_cost_q1)))


The average cost os insurance is 13270$, with a standard deviation of 12110$.
The median insurance cost is 9382$.
The Interquartile range is 11900$


## Regional distribution of the insurance data

In [44]:
insurance_by_region = insurance.groupby(['region'])['children'].count()
print(insurance_by_region)

region
northeast    324
northwest    325
southeast    364
southwest    325
Name: children, dtype: int64


##  The impact of smoking on insurance costs.

In [57]:
smokers_insurance_costs = insurance[insurance.smoker == 'yes']
smokers_insurance_mean = round(smokers_insurance_costs.charges.mean(), 2)

non_smokers_insurance_costs = insurance[insurance.smoker == 'no']
non_smokers_insurance_mean =round(non_smokers_insurance_costs.charges.mean(), 2)

print('The average insurance cost for smokers was {a}$, while the average insurance\
costs for non smokers was {b}$.'.format(a = smokers_insurance_mean, b = non_smokers_insurance_mean))

The average insurance cost for smokers was 32050.23$, while the average insurance costs for non smokers was 8434.27$.


## Insurance costs for non smoking females aged between 18-50 with no children

In [79]:
females_insurance = insurance[(insurance.smoker == 'no') 
                              & (insurance.children == 0)
                              &(insurance.age >= 18)
                              & (insurance.age <= 50) 
                              & (insurance.sex == 'female')]
females_insurance_mean = round(females_insurance.charges.mean(), 2)


print('The average insurance cost for a non smoking female\
 aged 18 -50 with no children is {}$'.format(females_insurance_mean))

The average insurance cost for a non smoking female aged 18 -50 with no children is 4884.54$


## Insurance costs for non smoking males aged between 18-50 with no children

In [80]:
males_insurance = insurance[(insurance.smoker == 'no') 
                              & (insurance.children == 0)
                              &(insurance.age >= 18)
                              & (insurance.age <= 50) 
                              & (insurance.sex == 'male')]
males_insurance_mean = round(males_insurance.charges.mean(), 2)


print('The average insurance cost for a non smoking female\
 aged 18 -50 with no children is {}$'.format(males_insurance_mean))

The average insurance cost for a non smoking female aged 18 -50 with no children is 4487.33$
