# U.S. Medical Insurance Costs

## Project Goals 

The project obtains data on US medical insurance costs from a csv file. The analysis compasses how different variables, including `age`, `sex`, `bmi`, `children` and `region`, influence insurance `charge`.

## Data

- Create a database on US medical insurance costs from a csv file using a dictionary called insurance_dictionary.

In [1]:
# import csv library
import csv

# Obtaining insurance data from the csv file and save them in a dictionary
with open('insurance.csv',newline='') as insurance_csv:
    insurance_csv_dict = csv.DictReader(insurance_csv)
    insurance_dictionary = {}
    i=1
    for row in insurance_csv_dict:
        insurance_dictionary[i]=row
        i+=1

# print(insurance_dictionary)


# Define a function to convert string data type into number types in the dictionary
# Female is 0 and male is 1
# Non-smoker is 0 and smoker is 1

def convert_dictionary_format(insurance_dictionary):
    for keys, value in insurance_dictionary.items():
        value['age']=int(value['age'])
        value['children']=int(value['children'])
        if value['sex']=="female":
            value['sex']=0
        else: 
            value['sex']=1
        value['bmi']=float(value['bmi'])
        if value['smoker']=="yes":
            value['smoker']=1
        else: 
            value['smoker']=0
        value['charges']=float(value['charges'])
    return insurance_dictionary

updated_insurance_dictionary = convert_dictionary_format(insurance_dictionary)


# test
# print(updated_insurance_dictionary)

- Organize the insurance costs by regions

In [2]:
# Define a funciton to create an insurance dictionary by regions

def insurance_records_by_regions(updated_insurance_dictionary):
    insurance_records_by_regions = dict()
    for key, record in updated_insurance_dictionary.items():
        if record["region"] not in insurance_records_by_regions:
            insurance_records_by_regions[record.get("region")]=[record]
        else:
            insurance_records_by_regions[record.get("region")].append(record)
    return insurance_records_by_regions

insurance_records_by_regions = insurance_records_by_regions(updated_insurance_dictionary)

# test
# print(insurance_records_by_regions)

## Descriptive Analysis

- By regions:
     - Total insurance costs
     - Total number of insurees
     - Average cost per person
     - Average age
     - Male propertion
     - Average bmi
     - Average children number
     - Smokers propertion

In [3]:
# define a function to create a dictionary with keys of regions and values of the insurance costs in those regions
def region_characteristics(insurance_records_by_regions):
    region_characteristics={}
    region_characteristics["tot insurance charges"]={'attribute':"tot insurance charges"}
    region_characteristics["tot insurees"]={'attribute':"tot insurees"}
    region_characteristics["avg cost pp"]={'attribute':"avg cost pp"}
    region_characteristics["avg age"]={'attribute':"avg age"}
    region_characteristics["males%"]={'attribute':"males%" }
    region_characteristics["avg bmi"]={'attribute':"avg bmi"}
    region_characteristics["avg children"]={'attribute':"avg children"}
    region_characteristics["smokers%"]={'attribute':"smokers%"}

    for region, records in insurance_records_by_regions.items():
        tot_charges = 0
        tot_age = 0
        num = 0
        tot_males = 0
        tot_bmi =0
        tot_children = 0
        tot_smokers = 0
        for record in records:
            tot_charges += record.get("charges")
            num += 1
            tot_age += record.get("age")
            tot_males += record.get("sex")
            tot_bmi += record.get("bmi")
            tot_children += record.get("children")
            tot_smokers += record.get("smoker")

        region_characteristics["tot insurance charges"][region]=tot_charges
        region_characteristics["tot insurees"][region]=num
        region_characteristics["avg cost pp"][region]=tot_charges/num
        region_characteristics["avg age"][region]=tot_age/num
        region_characteristics["males%"][region]=tot_males/num*100
        region_characteristics["avg bmi"][region]=tot_bmi/num
        region_characteristics["avg children"][region]=tot_children/num
        region_characteristics["smokers%"][region]=tot_smokers/num*100
    return region_characteristics

region_characteristics = region_characteristics(insurance_records_by_regions)

print("Characteristics of insurance charges across regions are: {}".format(region_characteristics))    

Characteristics of insurance charges across regions are: {'tot insurance charges': {'attribute': 'tot insurance charges', 'southwest': 4012754.647620001, 'southeast': 5363689.763290002, 'northwest': 4035711.9965399993, 'northeast': 4343668.583308999}, 'tot insurees': {'attribute': 'tot insurees', 'southwest': 325, 'southeast': 364, 'northwest': 325, 'northeast': 324}, 'avg cost pp': {'attribute': 'avg cost pp', 'southwest': 12346.93737729231, 'southeast': 14735.411437609895, 'northwest': 12417.575373969228, 'northeast': 13406.3845163858}, 'avg age': {'attribute': 'avg age', 'southwest': 39.45538461538462, 'southeast': 38.93956043956044, 'northwest': 39.19692307692308, 'northeast': 39.26851851851852}, 'males%': {'attribute': 'males%', 'southwest': 50.153846153846146, 'southeast': 51.92307692307693, 'northwest': 49.53846153846154, 'northeast': 50.308641975308646}, 'avg bmi': {'attribute': 'avg bmi', 'southwest': 30.59661538461538, 'southeast': 33.35598901098903, 'northwest': 29.199784615

- Write the the summary of the regions into new a new csv file named "region-summary-table.csv"

In [11]:
with open('region-summary-table.csv','w') as region_summary_csv:
    fildnames =  list(region_characteristics["tot insurance charges"])
    writer = csv.DictWriter(region_summary_csv,fildnames)
    writer.writeheader()
    for attribue, regions in region_characteristics.items():
        writer.writerow(regions)