# U.S. Medical Insurance Costs

The goal of this project to explore the provided data (in CSV file) using Python fundamentals. This file contains people data for calculating insurance.

The data is provided by Codeacademy.com

## Introduction

The CSV file (insurance.csv) consists of the following columns for each person:
- age
- sex
- bmi (body mass index)
- number of children
- smoker status
- region 
- charges (insurance cost)

There is no missing data in the file.

The following formula is used for insurance cost calculation:

$ insurance = 250 * age - 128 * sex + 370 * bmi +425 * children + 24000 * smoker - 12500 $

For analysis, I chose the body mass index parameter, since it has a fairly large range of values and directly affects the cost of insurance.

The bmi has the official classification:
- bmi < 16.5 - severe underweight
- bmi >= 16.5 and bmi < 18.5 - underweight
- bmi >= 18.5 and bmi < 24.5 - normal weight
- bmi >= 24.5 and bmi < 30 - overweight
- bmi > 30 - obesity

I'm going to take people categorized as obese (the largest contribution to the cost of insurance) and answer the next questions:

- Obese men/women proportions 
- Percentage of obese people among people who smoke
- Percentage of obese women among women with children
- Percentage of obese people among young people

## The functions used in the program

To create several dictionaries depending on bmi categories:

In [2]:
def bmi_grouping(dictionary):
  dict_1, dict_2, dict_3, dict_4, dict_5 = {}, {}, {}, {}, {}
  for key, value in dictionary.items():
    if float(value[2]) < 16.5:
      dict_1[key] = value
    elif float(value[2]) >= 16.5 and float(value[2]) < 18.5:
      dict_2[key] = value
    elif float(value[2]) >= 18.5 and float(value[2]) < 24.5:
      dict_3[key] = value
    elif float(value[2]) >= 24.5 and float(value[2]) < 30:
     dict_4[key] = value
    elif float(value[2]) >= 30:
     dict_5[key] = value
  return dict_1, dict_2, dict_3, dict_4, dict_5



To define the amount of male to female in a dictionary:

In [3]:
def male_to_female(dictionary):
  m_count, f_count = 0, 0
  for value in dictionary.values():
    if value[1] == 'male':
      m_count += 1
    elif value[1] == 'female':
      f_count += 1
  return m_count, f_count

To define the ratio of obese male to obese female:

In [14]:
def obesity_ratio(m_count, f_count):
  if m_count > f_count:
    print('There are {0:.1f}% more obese men than obese women.'.format((m_count / f_count - 1) * 100))
  elif m_count < f_count:
    print('There are {0:.1f}% more obese women than obese men.'.format((f_count / m_count - 1) * 100))
  else:
    print("Obese men and women are the same amounts.")

To define the amount of smokers:

In [6]:
def smokers(dictionary):
  m_count, w_count = 0, 0
  for value in dictionary.values():
    if value[1] == 'male' and value[4] == 'yes':
      m_count += 1
    elif value[1] == 'female' and value[4] == 'yes':
      w_count += 1
  return m_count, w_count

To define the amount of woman with children:

In [7]:
def woman_with_children(dictionary):
  child_count = 0
  for value in dictionary.values():
    if value[1] == 'female' and int(value[3]) >= 1:
      child_count += 1
  return child_count 

To define the amount of people less than N years old):

In [18]:
def less_than_age(dictionary, age):
  count = 0
  for value in dictionary.values():
    if int(value[0]) < age:
      count += 1
  return count 

To create list for one parameter in dictionary:

In [9]:
def create_lst(dictionary, column):
  lst = []
  for value in dictionary.values():
    lst.append(value[column])
  return lst

## The main body of the program

In [10]:
# dictionary {'name': [age, sex, bmi, children, smoker, region, charges]}
patients = {}

with open('insurance.csv') as data_csv:
  data_dict = csv.DictReader(data_csv, delimiter=',')
  count = 0
  for item in data_dict:
    count += 1
    patients['Patient{}'.format(count)] = [item['age'], item['sex'], item['bmi'], item['children'], item['smoker'], item['region'], item['charges']]

In [11]:
# choose only obese category
_,_,_,_,obesity = bmi_grouping(patients)

In [15]:
ob_male, ob_female = male_to_female(obesity)
obesity_ratio(ob_male, ob_female)

There are 11.7% more obese men than obese women.


In [16]:
ob_male_smokers, ob_female_smokers = smokers(obesity)
print('There are {0:.1f}% of smokers among obese men.'.format((ob_male_smokers / ob_male) * 100))
print('There are {0:.1f}% of smokers among obese women.'.format((ob_female_smokers / ob_female) * 100))

There are 24.9% of smokers among obese men.
There are 15.6% of smokers among obese women.


In [17]:
ob_female_childs = woman_with_children(obesity)
print('{0:.1f}% of women with children are obese.'.format((ob_female_childs / ob_female) * 100))

57.2% of women with children are obese.


In [19]:
ob_young_count =  less_than_age(obesity, 21)
print('There are {0:.1f}% of people under 21 among obese people.'.format((ob_young_count / len(obesity)) * 100))

There are 12.3% of people under 21 among obese people.


## Conclusions

The questions asked were answered, the python fundamentals, such as functions and dictionaries, were used to solve the tasks.

I would like to note a number of disadvantages of the formula for calculating the cost of insurance:

- The higher the body mass index, the more expensive the insurance. But people with a very low body mass index also have health problems, so I think the BMI categories should be weighted.
- People with the same BMI category may pay differently due to different quantitative BMI values, however, their health is assessed equally according to the official BMI classification.
