# U.S. Medical Insurance Costs

##### Thoughts when looking over the dataset

Column titles:
- age - int
- sex - female/male
- bmi - float
- children - int
- smoker - yes/no
- region - northeast etc.
- charges - float

**No missing data**

Items of particular interest: 
- How does the number of children affect insurance cost? Is there a female/male split within this? 

##### Scoping the project

**Overall goal**
- Describe the relationship between number of children and insurance cost. Assess whether there is differences in the relationship for males and females and different age groups. 

**Data** 
- Pull the data from the CSV file into a dictionary with each row as an entry and the column headers as keys.

**Analyse**  
- Group data based on number of children and find the min, max and average insurance cost for each group.
- Group data based on number of children and female/male and find the min, max and average insurance cost for each group.
- Group data based on number of children and 20 year age brackets and find the min, max and average insurance cost for each group.
- Group data based on number of children, 20 year age brakets and female/male and find the min, max and average insurance cost for each group.
- Find the group size of each group.

**Evaluate**
- Ensure the group sizes are statistically significant and then compare the averages of the diffent groups and draw conclusions. 
- Further analyse data based on evaluations.

**Output**
- Write a paragraph summarising findings. 

In [1]:
#Importing data from the CSV file.

import csv

# Stores the data in a dictionary with the following format - id_number: patient infofmation dictionary.
with open('insurance.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    data_dict = {}
    i = 0
    for row in reader:
        data_dict[i] = row
        i += 1
#print(data_dict)


In [2]:
# Function to convert a string to a float.
def string_to_float(dict, key):
    for data in dict.values():
        string = data[key]
        converted = float(string)
        data[key] = converted
        

# Calling the function on bmi and charges
string_to_float(data_dict, 'charges')
string_to_float(data_dict, 'bmi')

# Function to convert a string to an int.
def string_to_int(dict, key):
    for data in dict.values():
        string = data[key]
        converted = int(string)
        data[key] = converted
        

# Calling the function on age and number of children
string_to_int(data_dict, 'age')
string_to_int(data_dict, 'children')

#print(data_dict)
        


In [3]:
# Function to find the average.
def average(dict, key):
    print('Number of patients: ' + str(len(dict)))
    total = 0
    for data in dict.values():
        cost = data[key]
        total += cost
    average = round(total / len(dict), 2)
    print('Average ' + key +': ' + str(average))
        
# Calling average function on charges
average(data_dict, 'charges')
average(data_dict, 'children')
  

Number of patients: 1338
Average charges: 13270.42
Number of patients: 1338
Average children: 1.09


In [4]:
# Function to find the maximum value
def find_max(dict, key):
    max = 0
    for data in dict.values():
        value = data[key]
        if value > max:
            max = round(value, 2)
    print('Maximum ' + key + ': ' + str(max))

# Calling max function on children
find_max(data_dict, 'children')

Maximum children: 5


In [5]:
# Function to find the minimum value
def find_min(dict, key):
    min = float('inf')
    for data in dict.values():
        value = data[key]
        if value < min:
            min = round(value, 2)
    print('Minimum ' + key + ': ' + str(min))

# Calling min function on children as test
find_min(data_dict, 'children')

Minimum children: 0


In [6]:
# Function to create a smaller dictionary based on specific criteria e.g. patient having no children
def create_dict(dict, key, criteria):
    new_dict = {}
    i = 0
    for data in dict.values():
        value = data[key]
        if value == criteria:
            new_dict[i] = data
            i += 1
    return new_dict

# Calling create_dict to group patients with 0 children
zero_children = create_dict(data_dict, 'children', 0)

# Function to find min, max and average insurance cost from a dictionary
def min_max_avg(dict, key):
    find_min(dict, key)
    find_max(dict, key)
    average(dict, key)

# Calling min_max_avg on zero_children dictionary
min_max_avg(zero_children, 'charges')

Minimum charges: 1121.87
Maximum charges: 63770.43
Number of patients: 574
Average charges: 12365.98


In [7]:
# Finding min, max and average insurance cost for patients with 1 child
one_child = create_dict(data_dict, 'children', 1)
min_max_avg(one_child, 'charges')

Minimum charges: 1711.03
Maximum charges: 58571.07
Number of patients: 324
Average charges: 12731.17


In [8]:
# Finding min, max and average insurance cost for patients with 2 children
two_children = create_dict(data_dict, 'children', 2)
min_max_avg(two_children, 'charges')

Minimum charges: 2304.0
Maximum charges: 49577.66
Number of patients: 240
Average charges: 15073.56


In [9]:
# Finding min, max and average insurance cost for patients with 3 children
three_children = create_dict(data_dict, 'children', 3)
min_max_avg(three_children, 'charges')

Minimum charges: 3443.06
Maximum charges: 60021.4
Number of patients: 157
Average charges: 15355.32


In [10]:
# Finding min, max and average insurance cost for patients with 4 children
four_children = create_dict(data_dict, 'children', 4)
min_max_avg(four_children, 'charges')

Minimum charges: 4504.66
Maximum charges: 40182.25
Number of patients: 25
Average charges: 13850.66


In [11]:
# Finding min, max and average insurance cost for patients with 5 children
five_children = create_dict(data_dict, 'children', 5)
min_max_avg(five_children, 'charges')

Minimum charges: 4687.8
Maximum charges: 19023.26
Number of patients: 18
Average charges: 8786.04


In [12]:
# Finding min, max and average insurance cost for male patients
male = create_dict(data_dict, 'sex', 'male')
min_max_avg(male, 'charges')

Minimum charges: 1121.87
Maximum charges: 62592.87
Number of patients: 676
Average charges: 13956.75


In [13]:
# Finding the max number of children for male patients
find_max(male, 'children')

Maximum children: 5


In [14]:
# Finding min, max and average insurance cost for male patients with 0 children
male_zero_child = create_dict(male, 'children', 0)
min_max_avg(male_zero_child, 'charges')

Minimum charges: 1121.87
Maximum charges: 62592.87
Number of patients: 285
Average charges: 12832.7


In [15]:
# Finding min, max and average insurance cost for male patients with 1 child
male_one_child = create_dict(male, 'children', 1)
min_max_avg(male_one_child, 'charges')

Minimum charges: 1711.03
Maximum charges: 51194.56
Number of patients: 166
Average charges: 13273.52


In [25]:
# Iterating through the number of children for the male group
children = [0, 1, 2, 3, 4, 5]
male_dict = []
for number in children:
    male_dict.append(create_dict(male, 'children', number))
i = 0
for group in male_dict:
    if i != 1:
        print('\n For male patients with ' + str(i) + ' children:')
    else:
        print('\n For male patients with ' + str(i) + ' child:')
    i += 1
    min_max_avg(group, 'charges')


 For male patients with 0 children:
Minimum charges: 1121.87
Maximum charges: 62592.87
Number of patients: 285
Average charges: 12832.7

 For male patients with 1 child:
Minimum charges: 1711.03
Maximum charges: 51194.56
Number of patients: 166
Average charges: 13273.52

 For male patients with 2 children:
Minimum charges: 2304.0
Maximum charges: 49577.66
Number of patients: 121
Average charges: 16187.1

 For male patients with 3 children:
Minimum charges: 3443.06
Maximum charges: 60021.4
Number of patients: 80
Average charges: 16789.17

 For male patients with 4 children:
Minimum charges: 4504.66
Maximum charges: 40182.25
Number of patients: 14
Average charges: 13782.28

 For male patients with 5 children:
Minimum charges: 4915.06
Maximum charges: 14478.33
Number of patients: 10
Average charges: 7931.66
