# U.S. Medical Insurance Costs

## Introduction 

Welcome to the U.S. Medical Insurance Costs Analysis. For this project we were provided a dataset that has been separated into 7 different variables: 

   * Age = The age in years of the customer (Integer)
   * Sex = Is the customer male or female (String)
   * BMI = The body mass index (BMI) of the customer (Float)
   * Children = The number of children that the customer has (Integer)
   * Smoker = Does the customer smoke? (String)
   * Region = The region of the country the customer lives in (String)
   * Charges = The amount the customer spends for medical care per year (Float)

1) Exploring our customer population
    
    * What is the average age of the customers in this dataset?
    * What percentage of the customers are women? 
    * What is the average BMI of customers in the dataset?
    * On average, how many children do the customer's have?
    * What percentage of the customers are smokers?
    * Which region are the majority of our customers from? 
    * What is the average annual cost of medical care
    
2) Exploring cost differences

    * Difference in average cost by decade of life
    * Difference in average cost by sex
    * Difference in average cost by BMI classification
    * Difference in average cost by number of children
    * Difference in average cost by smoker status
    * Difference in average cost by region
    
    

## Setup

Our project begins by importing the csv (Comma Separated Values) package, which will allow us to interact with the table that we will open next. 

In [16]:
import csv

To assess the type and quality of the data, we have opened the insurance.csv file and viewed it's contents.
To view this raw information for yourself, please uncomment the following code segment.

In [97]:
# with open('insurance.csv') as insurance_data:
#     print(insurance_data.read())

Although the raw data contains all of the information we will need, it is not in a format that is very useful to us. In order to explore the data we will have to seperate the variables discussed above into seperate lists. To ensure we do not get any duplicate values from running this code multiple times, I am defining and populating the lists in a single code block.

In [57]:
# Creating empty lists to populate using the function: 
age = []
sex = []
bmi = []
children = []
smoker = []
region = []
charges = []

# Declaring the function with the arguements lst, file and column:
# lst for the list we want to populate; file for the file we want to read, and column for the column we want to read
def list_populator(lst, file, column):
# opens the file in read-only and assigns it to the variable name insurance_data.
    with open(file) as insurance_data:
# Converts the data in insurance_data into a dictionary using the csv.DictReader object, saves result as csv_dict
        csv_dict = csv.DictReader(insurance_data)
# for every row in the csv_dict, the information for the chosen column will be appended to the list we designated, and the
# lst will be returned
        for row in csv_dict:
            lst.append(row[column])
        
        
    

# Calling the function for each list:    
list_populator(age, 'insurance.csv', 'age')
list_populator(sex,'insurance.csv', 'sex' )
list_populator(bmi,'insurance.csv', 'bmi')
list_populator(children,'insurance.csv','children')
list_populator(smoker,'insurance.csv','smoker')
list_populator(region,'insurance.csv','region')
list_populator(charges,'insurance.csv','charges')

# Print statements to double-check each list. These should be commented out when done testing.
# print(age)
# print(sex)
# print(bmi)
# print(children)
# print(smoker)
# print(region)
# print(charges)

Lastly, We are going to define a few universal variables and functions which will save us time in the long run.

In [105]:
# Total = 1338
total = len(age) 

# Function for computing averages
def make_average(num, whole):
    return round(num/whole,1)

# Function for computing percentages
def make_percent(num, whole):
    return round(num/whole*100 ,1)

Now that we have some functions definded, and our lists populated and tested, it's time to explore our data!

## Exploring Customer Population

### Average Age

In [128]:
# In order to find the average age of the customers in this dataset, we need to find their combined age. In order to obtain
# this we are going to declair a variable called combined_age and set it to 0.
combined_age = 0

# this for loop will iterate over each list item in age and add its value to combined age.
for i in range(0, total):
    combined_age += float(age[i])


average_age = make_average(combined_age, total)
print('The average age of a customer in our database is ' + str(average_age) + ' years old')

The average age of a customer in our database is 39.2 years old


### Percentage of Women

In [126]:
female_customers = 0

for i in range(0,total):
    if sex[i] == 'female':
        female_customers += 1

female_percent = make_percent(female_customers, total)

print('Out of the ' + str(total) + ' customers in our dataset, ' + str(female_customers) + ' or ' + str(female_percent) 
      + '% are women' )

Out of the 1338 customers in our dataset, 662 or 49.5% are women


### Average Body Mass Index (BMI)

In [125]:
total_bmi = 0

for i in range(0,total):
    total_bmi += float(bmi[i])

average_bmi = make_average(total_bmi, total)

print('The average BMI for this dataset is: '+ str(average_bmi))

The average BMI for this dataset is: 30.7


### Average Number of Children

In [124]:
total_kids = 0

for i in range(0, total):
    total_kids += float(children[i])

average_kids = make_average(total_kids, total)
print('Average number of kids per customer: ' + str(average_kids))

# What percentage have no kids
no_kids = 0 

for i in range(0,total):
    if children[i] == '0':
        no_kids += 1

percent_no_kids = make_percent(no_kids, total)
print(str(no_kids) + ' customers or ' + str(percent_no_kids) + '% of the database have no children')

has_kids = total - no_kids
new_average_kids = make_average(total_kids, has_kids)
print('When you factor out the customers without children, the average number of children per customer is ' + str(new_average_kids))

Average number of kids per customer: 1.1
574 customers or 42.9% of the database have no children
When you factor out the customers without children, the average number of children per customer is 1.9


### Percentage of Smokers

In [121]:
num_smokers = 0 

for i in range(0, total):
    if smoker[i] == 'yes':
        num_smokers += 1
        

smoker_percent = make_percent(num_smokers, total)
print(str(num_smokers) + ' or ' + str(smoker_percent) + '% of the customers in our dataset smoke')


274 or 20.5% of the customers in our dataset smoke


### Percentage from Customers from Each Region

In [154]:
north_east_region = 0
north_west_region = 0
south_east_region = 0
south_west_region = 0

for i in range(0,total):
    if region[i] == 'southwest':
        south_west_region += 1
    elif region[i] == 'southeast':
        south_east_region += 1
    elif region[i] == 'northeast':
        north_east_region += 1
    elif region[i] == 'northwest':
        north_west_region += 1


percent_nw = make_percent(north_west_region, total)
percent_ne = make_percent(north_east_region, total)
percent_sw = make_percent(south_west_region, total)
percent_se = make_percent(south_east_region, total)

print('Percentage of customers from the North West: ' + str(percent_nw) + '%')
print('Percentage of customers from the North East: ' + str(percent_ne) + '%')
print('Percentage of customers from the South West: ' + str(percent_sw) + '%')
print('Percentage of customers from the South East: ' + str(percent_se) + '%')

Percentage of customers from the North West: 24.3%
Percentage of customers from the North East: 24.2%
Percentage of customers from the South West: 24.3%
Percentage of customers from the South East: 27.2%


### Average Annual Cost of Healthcare

In [145]:
total_cost = 0

for i in range(0,total):
    total_cost += float(charges[i])
    rounded_cost = round(total_cost,2)
    
average_cost= make_average(rounded_cost, total)

print('Average annual cost of healthcare for customers in databank: '+ str(average_cost))

Average annual cost of healthcare for customers in databank: 13270.4


## Exploring Cost Differences

### Difference in Average Cost by Decade of Life

### Difference in Average Cost by Gender

### Difference in Average Cost by BMI Classification

### Difference in Average Cost by Number of Children

### Difference in Average Cost by Smoker Status

### Difference in Average Cost by Region