# U.S. Medical Insurance Costs

In this project, a **CSV** file with medical insurance costs will be investigated using Python Pandas. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [1]:
import pandas as pd

In [2]:
#load the dataset
insurance_csv = pd.read_csv('insurance.csv')

In [3]:
print(insurance_csv.head())

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520


In [4]:
print(len(insurance_csv))

#see if there are any missing values in the data set
print(insurance_csv.isna().sum())

1338
age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64


### Project Goals

* Analyze where a majority of the individuals are from 
* Look at the different costs between smokers vs. non-smokers
* Does sex (male,female) matter in charges

In [5]:
#get data
age = insurance_csv.age
sex = insurance_csv.sex
bmi = insurance_csv.bmi
children = insurance_csv.children
smoker = insurance_csv.smoker
region = insurance_csv.region
charges = insurance_csv.charges

In [6]:
#Get the majority of individuals
region.value_counts()

southeast    364
northwest    325
southwest    325
northeast    324
Name: region, dtype: int64

In [7]:
#Look at the different costs between smokers vs. non-smokers
smoker_charges = insurance_csv.groupby('smoker')['charges'].mean()
print(smoker_charges)
percentage = (smoker_charges[1]-smoker_charges[0])/smoker_charges[0]*100 

print('')
print('The smokers pay: ' +"{:.2f}".format(percentage) +'% more than the non smokers')

smoker
no      8434.268298
yes    32050.231832
Name: charges, dtype: float64

The smokers pay: 280.00% more than the non smokers


In [8]:
#Does sex (male,female) matter in charges
sex_charges = insurance_csv.groupby('sex').mean()
print(sex_charges)
percentage_2 = (sex_charges['charges'][1]-sex_charges['charges'][0])/sex_charges['charges'][0]*100 

print('')
print('The men pay: ' +"{:.2f}".format(percentage_2) +'% more than the female')

              age        bmi  children       charges
sex                                                 
female  39.503021  30.377749  1.074018  12569.578844
male    38.917160  30.943129  1.115385  13956.751178

The men pay: 11.04% more than the female
