# U.S. Medical Insurance Costs

The following is a project from Codecademy in the Data Analyst syllabus. This is the final portion of the learning python prior to learning data mining and analytics.

The insurance.csv file is comma delimited and has the following seven columns:

- age
- sex
- bmi
- children
- smoker
- region
- charges

There is no missing data, some columns are numerical (age, bmi, children, charges) while some are categorical (sex, smoker, region).

For this project we will look at the following areas:

- Average Age of patients
- Most represented region
- Difference in cost between smokers and non-smokers
- Average Age of a patient with at least one child

In [1]:
# import required modules
import csv

In [8]:
# create list to hold each patient record
record_data = []

# open the insurance.csv and create a dictionary for each record and add
# add it to the record_data list
with open("insurance.csv") as patient_records:
  patient_reader = csv.DictReader(patient_records)
  for record in patient_reader:
    record_data.append(record)

Now that the data has been logged, it is time to build out functions and/or a class in order to perform the analysis that I am ready to do.

Starting with Average Age of Patients:

In [10]:
def average_age(records):
  total = 0
  for record in records:
    total += int(record["age"])
  return total / len(records)

avg_age = average_age(record_data)



39.20702541106129


Next is to find the most represented region:

In [15]:
def region_rep(records):
  regions = {}
  for record in records:
    if record["region"] not in regions:
      regions[record["region"]] = 1
    else:
      regions[record["region"]] += 1
  max_region = max(regions, key=regions.get)
  return (max_region, regions[max_region])

max_region = region_rep(record_data)

('southeast', 364)


How much more do smokers pay over non-smokers?

In [20]:
def smoker_diff(records):
  smoker_cost = [0, 0]
  nosmoker_cost = [0, 0]
  for record in records:
    if record["smoker"] == "yes":
      smoker_cost[0] += 1
      smoker_cost[1] += float(record["charges"])
    elif record["smoker"] == "no":
      nosmoker_cost[0] += 1
      nosmoker_cost[1] += float(record["charges"])
  return (smoker_cost[1] / smoker_cost[0]) - (nosmoker_cost[1] / nosmoker_cost[0])

print(smoker_diff(record_data))

23615.96353367665


Finally, what is the average age for patients that have at least one child.

In [22]:
def average_age_of_parent(records):
  parents = []
  for record in records:
    if int(record["children"]) > 0:
      parents.append(record)
  return average_age(parents)

print(average_age_of_parent(record_data))

39.78010471204188
