# U.S. Medical Insurance Costs

## Dataset Overview:
* 7 columns- age:INT, sex:STR, bmi:FLOAT, children:INT, smoker:STR, region:STR, charges:FLOAT
    * Age- Range of discrete integers
    * Sex- Categorical binary between male and female
    * BMI- Continuous float values with varying precision with some integers
    * Children- Discrete integers that start at 0
    * Smoker- Categorical binary between yes and no
    * Region- Categorical set of 4 possible values
    * Charges- Continuous float values with varying precision with some integers
* 1 header row, 1338 records

## Data Formatting and Cleaning:
* Convert sex and smoker values to binaries (if needed)
* Code regions into integers (if needed)
* Make sure bmi and charges are all floats, not integers (if needed)

## Data Analysis Plan:
Relationships between insurance variables to charges model: 
***
![insurance_analysis_model.png](attachment:insurance_analysis_model.png)
***
Hypothetical relationships and rationale based on personal expectations

Analysis Procedure:
* Use classes to define and manipulate categorical data
* Use functions to analyze numeric data against categorical and other numeric data

In [2]:
# Imports insurance data from CSV

import csv

with open('insurance.csv') as insurance_csv:
    insurance_data = insurance_csv.read()
print(insurance_data)
print(type(insurance_data))

age,sex,bmi,children,smoker,region,charges
19,female,27.9,0,yes,southwest,16884.924
18,male,33.77,1,no,southeast,1725.5523
28,male,33,3,no,southeast,4449.462
33,male,22.705,0,no,northwest,21984.47061
32,male,28.88,0,no,northwest,3866.8552
31,female,25.74,0,no,southeast,3756.6216
46,female,33.44,1,no,southeast,8240.5896
37,female,27.74,3,no,northwest,7281.5056
37,male,29.83,2,no,northeast,6406.4107
60,female,25.84,0,no,northwest,28923.13692
25,male,26.22,0,no,northeast,2721.3208
62,female,26.29,0,yes,southeast,27808.7251
23,male,34.4,0,no,southwest,1826.843
56,female,39.82,0,no,southeast,11090.7178
27,male,42.13,0,yes,southeast,39611.7577
19,male,24.6,1,no,southwest,1837.237
52,female,30.78,1,no,northeast,10797.3362
23,male,23.845,0,no,northeast,2395.17155
56,male,40.3,0,no,southwest,10602.385
30,male,35.3,0,yes,southwest,36837.467
60,female,36.005,0,no,northeast,13228.84695
30,female,32.4,1,no,southwest,4149.736
18,male,34.1,0,no,southeast,1137.011
34,female,31.92,1,yes,northeast,37701

## Import Inspection:
CSVs import as strings
***
**Task:** Format CSV columns into variables that contain lists of their values

In [3]:
# Creates lists of variables from CSV data

split_insurance_data = insurance_data.split('\n')

listed_insurance_records = []
for record in split_insurance_data:
    listed_insurance_records.append(record.split(','))
listed_insurance_records.pop(0) # Removes header row
listed_insurance_records.pop() # Removes empty string/list (Caused by extra line at the end of the CSV?)

age = [int(record[0]) for record in listed_insurance_records]
sex = [record[1] for record in listed_insurance_records]
bmi = [float(record[2]) for record in listed_insurance_records]
children = [int(record[3]) for record in listed_insurance_records]
smoker = [record[4] for record in listed_insurance_records]
region = [record[5] for record in listed_insurance_records]
charges = [float(record[6]) for record in listed_insurance_records]

print(charges)

[16884.924, 1725.5523, 4449.462, 21984.47061, 3866.8552, 3756.6216, 8240.5896, 7281.5056, 6406.4107, 28923.13692, 2721.3208, 27808.7251, 1826.843, 11090.7178, 39611.7577, 1837.237, 10797.3362, 2395.17155, 10602.385, 36837.467, 13228.84695, 4149.736, 1137.011, 37701.8768, 6203.90175, 14001.1338, 14451.83515, 12268.63225, 2775.19215, 38711.0, 35585.576, 2198.18985, 4687.797, 13770.0979, 51194.55914, 1625.43375, 15612.19335, 2302.3, 39774.2763, 48173.361, 3046.062, 4949.7587, 6272.4772, 6313.759, 6079.6715, 20630.28351, 3393.35635, 3556.9223, 12629.8967, 38709.176, 2211.13075, 3579.8287, 23568.272, 37742.5757, 8059.6791, 47496.49445, 13607.36875, 34303.1672, 23244.7902, 5989.52365, 8606.2174, 4504.6624, 30166.61817, 4133.64165, 14711.7438, 1743.214, 14235.072, 6389.37785, 5920.1041, 17663.1442, 16577.7795, 6799.458, 11741.726, 11946.6259, 7726.854, 11356.6609, 3947.4131, 1532.4697, 2755.02095, 6571.02435, 4441.21315, 7935.29115, 37165.1638, 11033.6617, 39836.519, 21098.55405, 43578.9394, 

## Group Tasks:
* Get descriptive statistics for parsed variables.
* Organize and define data for children and region.
* Use predominately class methods to carry out analysis and functions when appropriate.

In [11]:
# Class for analysis via methods

class Med_data():
    
    def __init__(self, age, sex, bmi, children, smoker, region, charges):
        self.age = age
        self.sex = sex
        self.bmi = bmi
        self.children = children
        self.smoker = smoker
        self.region = region
        self.charges = charges
        
    def complete_means(self):
        means = {}
        
        means['Mean Age of Dataset'] = round(sum(self.age) / len(self.age), 2)
        means['Mean BMI of Dataset'] = round(sum(self.bmi) / len(self.bmi), 2)
        means['Mean Number of Children of Dataset'] = round(sum(self.children) / len(self.children), 2)
        means['Mean Charges of Dataset'] = round(sum(self.charges) / len(self.charges), 2)
        
        return means
    
    def complete_proportions(self):
        proportions = {}
        
        proportions['Percentage of Males in Dataset'] = round((self.sex.count('male') / len(self.sex)) * 100, 2)
        proportions['Percentage of Females in Dataset'] = round((self.sex.count('female') / len(self.sex)) * 100, 2)
        proportions['Percentage of Smokers in Dataset'] = round((self.smoker.count('yes') / len(self.smoker)) * 100, 2)
        proportions['Percentage of Non-smokers in Dataset'] = round((self.smoker.count('no') / len(self.smoker)) * 100, 2)
        proportions['Percentage of Northeast in Dataset'] = round((self.region.count('northeast') / len(self.region)) * 100, 2)
        proportions['Percentage of Northwest in Dataset'] = round((self.region.count('northwest') / len(self.region)) * 100, 2)
        proportions['Percentage of Southeast in Dataset'] = round((self.region.count('southeast') / len(self.region)) * 100, 2)
        proportions['Percentage of Southwest in Dataset'] = round((self.region.count('southwest') / len(self.region)) * 100, 2)
        
        return proportions
    
    def children_mean(self, num):
        proto_children_charges = list(zip(self.children, self.charges))
        
        children_charges = []
        for item in proto_children_charges:
            children_charges.append(list(item))
            
        max_children = 0
        for i in range(len(children_charges) - 1):
            if children_charges[i + 1][0] > children_charges[i][0]:
                max_children = children_charges[i + 1][0]
            
        sum_charge = 0
        num_charged = 0
        for item in children_charges:
            if num > max_children:
                raise Exception("Choosen value is out of range!")
                break
            if item[0] == num:
                sum_charge += item[1]
                num_charged += 1
                
        mean_charge = sum_charge / num_charged
            
        return f"The average insurance cost for an individual with {num} children is {round(mean_charge, 2)}"
            
    
    

## Class Tests:

In [13]:
medical_insurance = Med_data(age, sex, bmi, children, smoker, region, charges)

print(medical_insurance.complete_means())
print(medical_insurance.complete_proportions())
print(medical_insurance.children_mean(50))

{'Mean Age of Dataset': 39.21, 'Mean BMI of Dataset': 30.66, 'Mean Number of Children of Dataset': 1.09, 'Mean Charges of Dataset': 13270.42}
{'Percentage of Males in Dataset': 50.52, 'Percentage of Females in Dataset': 49.48, 'Percentage of Smokers in Dataset': 20.48, 'Percentage of Non-smokers in Dataset': 79.52, 'Percentage of Northeast in Dataset': 24.22, 'Percentage of Northwest in Dataset': 24.29, 'Percentage of Southeast in Dataset': 27.2, 'Percentage of Southwest in Dataset': 24.29}


Exception: Choosen value is out of range!