# U.S. Medical Insurance Costs
In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [3]:
import csv

In order to start with the analysis of the data, the csv library has to be imported.

The next step is to look through insurance.csv in order to get aquanted with the data. The following aspects of the data file will be checked in order to plan out how to import the data into a Python file:

* The names of columns and rows
* Any noticeable missing data
* Types of values (numerical vs. categorical)

In [15]:
# creating lists for the different attributes in insurance.csv

ages = []
sex = []
bmi = []
num_children = []
smoker_status = []
region = []
insurance_charges = []

There are seven columns in total: Age, Sex, BMI, Children, Smoker, Region and Charges.

Following a quick look over the data, no noticeable data is missing.

In [18]:
# importing the csv data into each respective list

def load_data(lst, csv_file, column_name):
    with open(csv_file) as csv_reader:
        csv_dict = csv.DictReader(csv_reader)
        for row in csv_dict:
            lst.append(row[column_name])
        return lst

In [22]:
load_data(ages, 'insurance.csv', 'age')
load_data(sex, 'insurance.csv', 'sex')
load_data(bmi, 'insurance.csv', 'bmi')
load_data(num_children, 'insurance.csv', 'children')
load_data(smoker_status, 'insurance.csv', 'smoker')
load_data(region, 'insurance.csv', 'region')
load_data(insurance_charges, 'insurance.csv', 'charges')

['16884.924',
 '1725.5523',
 '4449.462',
 '21984.47061',
 '3866.8552',
 '3756.6216',
 '8240.5896',
 '7281.5056',
 '6406.4107',
 '28923.13692',
 '2721.3208',
 '27808.7251',
 '1826.843',
 '11090.7178',
 '39611.7577',
 '1837.237',
 '10797.3362',
 '2395.17155',
 '10602.385',
 '36837.467',
 '13228.84695',
 '4149.736',
 '1137.011',
 '37701.8768',
 '6203.90175',
 '14001.1338',
 '14451.83515',
 '12268.63225',
 '2775.19215',
 '38711',
 '35585.576',
 '2198.18985',
 '4687.797',
 '13770.0979',
 '51194.55914',
 '1625.43375',
 '15612.19335',
 '2302.3',
 '39774.2763',
 '48173.361',
 '3046.062',
 '4949.7587',
 '6272.4772',
 '6313.759',
 '6079.6715',
 '20630.28351',
 '3393.35635',
 '3556.9223',
 '12629.8967',
 '38709.176',
 '2211.13075',
 '3579.8287',
 '23568.272',
 '37742.5757',
 '8059.6791',
 '47496.49445',
 '13607.36875',
 '34303.1672',
 '23244.7902',
 '5989.52365',
 '8606.2174',
 '4504.6624',
 '30166.61817',
 '4133.64165',
 '14711.7438',
 '1743.214',
 '14235.072',
 '6389.37785',
 '5920.1041',
 '176

After every list is filled with data, functions are going to be defined in order to analysis the data. It has to be considerd which information are interesting and how the analsis should be performed. Following functions are going to be defined:

* calculating average age of patients
* returning the number of males vs. females counted in the dataset
* finding geographical location of the patients
* creating a dictionary that contains all patient information


In [27]:
def calculate_average_age(lst):
    age = 0
    
    for patient in lst:
        age += int(patient)
    average_age = round(age/len(lst),2)
    
    return average_age

print("The average age of a patient is:", calculate_average_age(ages))

The average age of a patient is: 39.21


In [33]:
def compare_male_female(lst):
    num_male = 0
    num_female = 0
    
    for patient in lst:
        if patient == "male":
            num_male += 1
        elif patient == "female":
            num_female += 1
    
    return num_male, num_female

num_male, num_female = compare_male_female(sex)
print("The number of males in the data is", num_male, ". The number of females in the data is", num_female)

The number of males in the data is 676 . The number of females in the data is 662


In [56]:
def find_location(lst):
    northwest = 0
    northeast = 0
    southwest = 0
    southeast = 0
    
    for location in lst:
        if location == "northwest":
            northwest += 1
        elif location == "northeast":
            northeast += 1
        elif location == "southwest":
            southwest += 1
        elif location == "southeast":
            southeast += 1
    
    return northwest, northeast, southeast, southwest

num_nw, num_ne, num_sw, num_se = find_location(region)
print(num_nw, "patients live in the northwest.", num_ne, "patients live in the northeast.",
      num_sw, "patients live in the southwest", num_se, "patients live in the southeast.")

325 patients live in the northwest. 324 patients live in the northeast. 364 patients live in the southwest 325 patients live in the southeast.


In [61]:
list_patients = []
counter = 1

for patient in sex:
    list_patients.append(counter)
    counter += 1

In [63]:
def create_dictionary(list_patients, ages, sex, bmi, num_children, smoker_status, region, insurance_charges):
    dictionary = {}
    num_patients = len(list_patients)
    
    for i in range(num_patients):
        dictionary[list_patients[i]] = {"Age": ages[i],
                             "Sex": sex[i],
                             "BMI": bmi[i],
                             "Number of children": num_children[i],
                             "Smoker status": smoker_status[i],
                             "Region": region[i],
                             "Insurance charges": insurance_charges[i]}
    
    return dictionary

dictionary = create_dictionary(list_patients, ages, sex, bmi, num_children, smoker_status, region, insurance_charges)
print(dictionary)

{1: {'Age': '19', 'Sex': 'female', 'BMI': '27.9', 'Number of children': '0', 'Smoker status': 'yes', 'Region': 'southwest', 'Insurance charges': '16884.924'}, 2: {'Age': '18', 'Sex': 'male', 'BMI': '33.77', 'Number of children': '1', 'Smoker status': 'no', 'Region': 'southeast', 'Insurance charges': '1725.5523'}, 3: {'Age': '28', 'Sex': 'male', 'BMI': '33', 'Number of children': '3', 'Smoker status': 'no', 'Region': 'southeast', 'Insurance charges': '4449.462'}, 4: {'Age': '33', 'Sex': 'male', 'BMI': '22.705', 'Number of children': '0', 'Smoker status': 'no', 'Region': 'northwest', 'Insurance charges': '21984.47061'}, 5: {'Age': '32', 'Sex': 'male', 'BMI': '28.88', 'Number of children': '0', 'Smoker status': 'no', 'Region': 'northwest', 'Insurance charges': '3866.8552'}, 6: {'Age': '31', 'Sex': 'female', 'BMI': '25.74', 'Number of children': '0', 'Smoker status': 'no', 'Region': 'southeast', 'Insurance charges': '3756.6216'}, 7: {'Age': '46', 'Sex': 'female', 'BMI': '33.44', 'Number of