# US medical insurance cost
---------------------------

### Project Objectives:
1. Import a dataset into your program
    + [Dataset source](https://www.kaggle.com/mirichoi0218/insurance)
    + [csv library](https://docs.python.org/3/library/csv.html)
2. Analyze a dataset by building out functions or class methods
3. Use libraries to assist in your analysis
    + Panda
    + Numpy
4. *Optional: Document and organize your findings*
5. *Optional: Make predictions about a dataset’s features based on your findings*

### Analysis
+ What is the **average age** of the patients?
+ What is the **average age** for someone who has **at least one child**?
+ At which **age** the insurance **cost is above average**?
+ What is the **average** insurance cost for:
    - all gender
    - female
    - male
    - per region
+ Where a **majority** of the individuals **are from**?
+ Look at the **different costs** between **smokers vs. non-smokers**.

In [35]:
# Import the CSV file in python using csv library
import csv

# open file for reading
with open('insurance.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)
    number_patients = 0 - 1 #-1 for the header
    fields = []
    patients_list = []

    # Iterate through csv_reader and append to patient_list
    for row in csv_reader:
        if number_patients == -1:
            fields += row       # apppend the header into fields list
            number_patients += 1
        else:
            patients_list.append(row)
            number_patients += 1
 
    ## What is the average age of the patients?
    age_sum = 0
    for patient in patients_list:
        age_sum += int(patient[0])

    average_age = int(age_sum) / int(number_patients)

    print('The average age of patients is {age} years old'.format(age=str(round(average_age))))


    ## What is the average age for someone who has at least one child?
    age_sum_parents = 0
    number_parents = 0 
    for patient in patients_list:
        if int(patient[3]) >= 1:
            age_sum_parents += int(patient[0])
            number_parents += 1

    average_age_parents = int(age_sum_parents) / int(number_parents)

    print('The average age for parents with more than 1 child is {age} years old'.format(age=str(round(average_age_parents))))


    ## Around which age the insurance cost starts to be above average?
    charges_sum = 0
    age_above_sum = 0
    age_count = 0  
    
    for patient in patients_list:
        charges_sum += float(patient[6])
    average_charges = charges_sum / number_patients
    
    for patient in patients_list:
        if float(patient[6]) > average_charges:
            age_above_sum += int(patient[0])
            age_count += 1
    age_cost_above = age_above_sum / age_count
    
    print('After {age} years old, medical insurance cost will be above the ${cost} average'.format(age=str(round(age_cost_above)), cost=str(round(average_charges))))
    
    
    ## What is the **average** insurance cost for: all gender, females, males, per region
    # genders
    females = 0
    males = 0
    fem_sum_charges = 0
    male_sum_charges = 0
    # regions
    nw_count = 0
    nw_charges = 0
    ne_count = 0
    ne_charges = 0
    sw_count = 0
    sw_charges = 0
    se_count = 0
    se_charges = 0
    
    for patient in patients_list:
        if 'female' in patient[1]:
            females += 1
            fem_sum_charges += float(patient[6])
        elif 'male' in patient[1]:
            males += 1
            male_sum_charges += float(patient[6])
    
    for patient in patients_list:
        if 'northwest' in patient[5]:
            nw_count += 1
            nw_charges += float(patient[6])
        elif 'northeast' in patient[5]:
            ne_count += 1
            ne_charges += float(patient[6])
        elif 'southwest' in patient[5]:
            sw_count += 1
            sw_charges += float(patient[6])
        elif 'southeast' in patient[5]:
            se_count += 1
            se_charges += float(patient[6])
    f_average_cost = fem_sum_charges / females
    m_average_cost = male_sum_charges / males
    nw_average_cost = nw_charges / nw_count
    ne_average_cost = ne_charges / ne_count
    sw_average_cost = sw_charges / sw_count
    se_average_cost = se_charges / se_count

    print('All gender average insurance cost: ${cost}'.format(cost=round(average_charges)))
    print('Females average insurance cost: ${cost}'.format(cost=round(f_average_cost)))
    print('Males average insurance cost: ${cost}'.format(cost=round(m_average_cost)))
    print('Northwest average insurance cost: ${cost}'.format(cost=round(nw_average_cost)))
    print('Northeast average insurance cost: ${cost}'.format(cost=round(ne_average_cost)))
    print('Southwest average insurance cost: ${cost}'.format(cost=round(sw_average_cost)))
    print('Southeast average insurance cost: ${cost}'.format(cost=round(se_average_cost)))


    ## Where a majority of the individuals are from?
    region_majority = max(nw_count, ne_count, sw_count, se_count)
    
    if region_majority == nw_count:
        print('A majority of individuals are from the Northwest')
    elif region_majority == ne_count:
        print('A majority of individuals are from the Northeast')
    elif region_majority == sw_count:
        print('A majority of individuals are from the Southwest')
    elif region_majority == se_count:
        print('A majority of individuals are from the Southeast')


    ## What are the different costs between smokers vs. non-smokers
    smoker = 0
    non_smoker = 0
    smoker_cost = 0
    nonsmoker_cost = 0

    for patient in patients_list:
        if 'yes' in patient[4]:
            smoker += 1
            smoker_cost += float(patient[6])
        elif 'no' in patient[4]:
            non_smoker += 1
            nonsmoker_cost += float(patient[6])
    smoker_cost_average = smoker_cost / smoker
    nonsmoker_cost_average = nonsmoker_cost / non_smoker
    smoke_difference = round(smoker_cost_average - nonsmoker_cost_average)

    print('On average, a non-smoker will save ${smoke_diff} on medical insurance compared to a smoker'.format(smoke_diff=smoke_difference))

The average age of patients is 39 years old
The average age for parents with more than 1 child is 40 years old
After 42 years old, medical insurance cost will be above the $13270 average
All gender average insurance cost: $13270
Females average insurance cost: $12570
Males average insurance cost: $13957
Northwest average insurance cost: $12418
Northeast average insurance cost: $13406
Southwest average insurance cost: $12347
Southeast average insurance cost: $14735
A majority of individuals are from the Southeast
On average, a non-smoker will save $23616 on medical insurance compared to a smoker
