# U.S. Medical Insurance Costs

## STEP 1: BREAKING DOWN CSV DATA

In this first step, I want to break down data from the csv file. I saw that **insurance.csv** has the following headers _age, sex, bmi, children, smoker, regions and charges_, so the first thing is to create different lists of each one of them in order to push data into those lists.

In [2]:
import csv

age = list()
sex = list()
bmi = list()
children = list()
smoker = list()
region = list()
charges = list()

with open('insurance.csv', newline = '') as insurance_csv:
  
  reader = csv.DictReader(insurance_csv,)

  for row in reader:
    age.append(row['age'])
    sex.append(row['sex'])
    bmi.append(row['bmi'])
    children.append(row['children'])
    smoker.append(row['smoker'])
    region.append(row['region'])
    charges.append(row['charges'])

## STEP 2: GOALS

Now that I have data distributed in lists, I want to answer the following questions within my analysis:
  
  
- **What is the average age?**

- **Are there more males or females?**

- **Which region has the most insurances?**

- **What is the avareage cost for non-smoker vs. smokers?**

### Question #1
The first goal I'll be working on is **What is the average age?** Whilst I am parsing a csv file, the age list is not returning integers, but strings instead. So I need to get those strings into integers and then make a **function** to calculate the average.

In [3]:
age_formated = list()

for index in age:
  age_formated.append(int(index))

def calculate_average_age(ages):
  total_ages = 0

  for age in ages:
    total_ages += age

  return round((total_ages / len(ages)), 0)

average_age = calculate_average_age(age_formated)
print("The average age of individuals in this analysis is {average_age} years old".format(average_age = average_age))


The average age of individuals in this analysis is 39.0 years old


### Question #2
Now that we have the **average age** of the individuals, I want to know how many _males_ and _females_ are in this dataset. In order to achieve that, I need to loop within **sex list**, instantiate a variable to count how many times the word _male_ or _female_ repeats, and compare between each one of them.

In [4]:
def counting_males_females(sex_list):
  total_males = 0
  total_females = 0
  not_rated = 0

  for sex in sex_list:
    if sex == 'male':
      total_males += 1
    elif sex == 'female':
      total_females += 1
    else:
      not_rated +=1

  total_count = total_males + total_females

  male_percentage = round(((total_males / total_count) * 100), 2)
  female_percentage = round(((total_females / total_count) * 100), 2)

  print("There are {total_males} males and {total_females} females, giving us a total of {total_count}. The percentage of males in this dataset is {male_percentage}% vs. a {female_percentage}% of females.".format(total_males = total_males, total_females = total_females, total_count = total_count, male_percentage = male_percentage, female_percentage = female_percentage))
  

males_females_count = counting_males_females(sex)

There are 676 males and 662 females, giving us a total of 1338. The percentage of males in this dataset is 50.52% vs. a 49.48% of females.


### Question #3
Moving on with the next question **Which region has the most insurances**. We have to understand that we have another _list_ called _region_, which is populated from the regions of the csv. As all of this data are strings, we only need to know how many regions there are and how many times do they repeat.

In [36]:
def total_regions(regions):
  all_regions = list()

  for region in regions:
    if region not in all_regions:
      all_regions.append(region)

    southwest = 0
    southeast = 0
    northwest = 0
    northeast = 0

  for count in regions:
    if count == 'southwest':
      southwest += 1
    elif count == 'southeast':
      southeast += 1
    elif count == 'northwest':
      northwest += 1
    elif count == 'northeast':
      northeast += 1
    else: 
      pass

  regions_count = {
    "southwest": southwest,
    "southeast": southeast,
    "northwest": northwest,
    "northeast": northeast
  }

  max_region_value = ''
  max_region_count = 0

  for region in regions_count.items():
    if region[1] > max_region_count:
      max_region_value = region[0]
      max_region_count = region[1]
    
  print("There are a total of four regions in our analysis, which are Southwest with {southwest} people, Southeast with {southeast}, Northwest with {northwest}, and Northeast with {northeast}. Given this information, we can say that {max_region_value} is the region which has the most insurances with a total count of {max_region_count}".format(southwest = southwest, southeast = southeast, northwest = northwest, northeast = northeast, max_region_value = max_region_value, max_region_count = max_region_count))

total_regions(region)



There are a total of four regions in our analysis, which are Southwest with 325 people, Southeast with 364, Northwest with 325, and Northeast with 324. Given this information, we can say that southeast is the region which has the most insurances with a total count of 364
