# U.S. Medical Insurance Costs Project

In this project, a **CSV** file with medical insurance costs will be investigated using **Python fundamentals**. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset. I decided to perform analysis on two column variables at a time, setting one as the *focus* or *constraint* variable.

1. Import necessary library

Since the file is in **.csv** format, the necessary library is the *python csv library* which is imported below

In [4]:
import csv

2. Determine what data you want to analyze

After looking throught the data i decided that i wanted to analyze the relationship between of smoker status and age, sex, insurance cost and obesity, as well as the relationship between obesity and age, sex, smoker status and insurance cost. I then created the function below to obtain the data i needed.

In [5]:
def get_data(filename,column,constraint_column,constraint_value):
    data =[]
    with open(filename,newline='') as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            if row[constraint_column].isalpha() and row[constraint_column]==constraint_value:
                data.append(row[column])
            elif not row[constraint_column].isalpha() and float(row[constraint_column]) >= constraint_value:
                
                data.append(row[column])

    return data 

The conditional statement is added to deal with categorical data and numeric data. Now i call the **get_data** function with the appropriate arguements as shown below

In [7]:
filename = 'insurance.csv'

smoker_ages =get_data(filename,'age','smoker','yes') 
smoker_sex = get_data(filename,'sex','smoker','yes')
smoker_bmi = get_data(filename,'bmi','smoker','yes')  
smoker_charges = get_data(filename,'charges','smoker','yes')

obese_ages = get_data(filename,'age','bmi',30.0)
obese_sex = get_data(filename,'sex','bmi',30.0)
obese_smoker = get_data(filename,'smoker','bmi',30.0)
obese_charges = get_data(filename,'charges','bmi',30.0)

3. Analyze Data

Now that i have my data, the next thing i did was create a class with 5 functions that performed the analysis i needed. This class is shown below

In [20]:
class PatientInfos:
    def __init__(self,smoker_ages,smoker_sex,smoker_bmi,smoker_charges,obese_ages,obese_sex,obese_smoker,obese_charges):
        self.smoker_ages =smoker_ages
        self.obese_ages =obese_ages
        self.smoker_sex=smoker_sex
        self.obese_sex=obese_sex
        self.smoker_bmi = smoker_bmi
        self.obese_smoker = obese_smoker
        self.smoker_charges =smoker_charges
        self.obese_charges =obese_charges

    #Analyze min, max and average age for focus
    def analyze_age(self,focus):
        total_age = 0
        min_age=float("inf")
        max_age=0
        if focus =="smoker":
            ages =self.smoker_ages
        elif focus =="obese":
            ages =self.obese_ages    
        for age in ages:
            total_age+=int(age)
            min_age=min(min_age,int(age))
            max_age=max(max_age,int(age))
        return "{}=> average age: {}, youngest: {}, oldest:{}".format(focus,int(total_age/len(ages)),min_age,max_age)

    #Analyze percentage of males and females for focus
    def analyze_sex(self,focus):
        male_count=0
        female_count=0
        if focus =="smoker":
            sexes =self.smoker_sex
        elif focus =="obese":
            sexes =self.obese_sex
        for sex in sexes:
            if sex=="male":
                male_count+=1
            else:
                female_count+=1
        male_percentage =round((male_count*100)/len(sexes),2)
        female_percentage =round((female_count*100)/len(sexes),2)
        return "{} => male: {}({}%), female: {}({}%)".format(focus,male_count,male_percentage,female_count,female_percentage)            
    
    #Analyze overweight and and obesity stats for given focus
    def analyze_bmi(self):
        obese_count =0
        overweight_count =0
        bmis = self.smoker_bmi
        focus = "smoker"
        for bmi in bmis:
            if float(bmi) > 30.0:
                obese_count+=1
            elif float(bmi)>=25.0 and float(bmi) < 30.0:
                overweight_count+=1 
        over_p = round((overweight_count*100)/len(bmis),2)  
        obese_p =round((obese_count*100)/len(bmis),2)        
        return "{}=> overweight:{}({}%), obese:{}({}%)".format(focus,overweight_count,over_p,obese_count,obese_p)

    #Analyze average insurance charge for given focus
    def analyze_charges(self,focus):
        total_charges = 0
        if focus =="smoker":
            charges =self.smoker_charges
        elif focus =="obese":
            charges =self.obese_charges 
        for charge in charges:
            total_charges+=float(charge)
        average_charge = round(total_charges/len(charges),2)    
        return '{}=> average charge: ${}'.format(focus,average_charge)

    #Analyze average non-smoker and smoker percentages for given focus
    def analyze_smoker(self):
        smoker_count=0
        non_smoker_count = 0
        smokers=self.obese_smoker
        focus = "obese"
        for smoker in smokers:
            if smoker=='yes':
                smoker_count+=1
            else:
                non_smoker_count+=1   
        smoker_count_p = round((smoker_count*100)/len(smokers),2)   
        non_smoker_count_p = round((non_smoker_count*100)/len(smokers),2)      
        return '{} => smokers:{}({}%), non-smokers:{}({}%)'.format(focus,smoker_count, smoker_count_p,non_smoker_count, non_smoker_count_p)             

Lets see the class in action

In [21]:
p_i = PatientInfos(smoker_ages,smoker_sex,smoker_bmi,smoker_charges,obese_ages,obese_sex,obese_smoker,obese_charges)
print(p_i.analyze_age('smoker'))   
print(p_i.analyze_sex('smoker'))   
print(p_i.analyze_bmi())   
print(p_i.analyze_charges('smoker'))

print(p_i.analyze_age('obese'))
print(p_i.analyze_sex('obese'))
print(p_i.analyze_smoker())
print(p_i.analyze_charges('obese')) 

smoker=> average age: 38, youngest: 18, oldest:64
smoker => male: 159(58.03%), female: 115(41.97%)
smoker=> overweight:74(27.01%), obese:144(52.55%)
smoker=> average charge: $32050.23
obese=> average age: 40, youngest: 18, oldest:64
obese => male: 373(52.76%), female: 334(47.24%)
obese => smokers:145(20.51%), non-smokers:562(79.49%)
obese=> average charge: $15552.34


4. My conclusions

of all the analysis that was done, the one that surprised me the most was the smoker stats for obese. I expected there to be more smokers than non-smokers.