# U.S. Medical Insurance Costs
In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [95]:
# Importing CSV Library
import csv

The next step is to look through **insurance.csv** in order to get aquanted with the data. The following aspects of the data file will be checked in order to plan out how to import the data into a Python file:
* The names of columns and rows
* Any noticeable missing data
* Types of values (numerical vs. categorical)

In [96]:
#We are creating empty list for every column in the csv file
ages = []
sex = []
bmis = []
num_of_children = []
smoker_status = []
regions = []
insurance_charges = []


Then we are going to develop a function, so that we could send the information we have to the correct list. 

In [97]:
#Developing a function to append the information to the correct list
def append_info_list(list, csv_file, column_name):
    with open(csv_file) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            list.append(row[column_name])
    return list
#We can use this function to append the information for any csv file, 
#to any list, and to any column name.
#With this fuction we can avoid repeating code multiple times. 

Now we are going to append every column information to its correct list


In [98]:
#We are adding the file information to the correct list
ages = append_info_list(ages, 'insurance.csv', 'age')
sex = append_info_list(sex, 'insurance.csv', 'sex')
bmis = append_info_list(bmis, 'insurance.csv', 'bmi')
num_of_children = append_info_list(num_of_children, 'insurance.csv', 'children')
smoker_status = append_info_list(smoker_status, 'insurance.csv', 'smoker')
regions = append_info_list(regions, 'insurance.csv', 'region')
insurance_charges = append_info_list(insurance_charges, 'insurance.csv', 'charges')


Now we are going to start our analysis. We are going to look through all the information, and get the following important information:
* find average age of the patients
* return the number of males vs. females counted in the dataset
* find geographical location of the patients
* return the average yearly medical charges of the patients
* return the average amount of children that patients have
* return the number of non-smokers vs smokers counted in the dataset
* creating a dictionary that contains all patient information


You can notice that every information we are getting on this first analysis is about the patient information, so lets begin. 


In [99]:
class Patinent_info:
    #The method takes seven parameters, each representing a different aspect of patient information
    # : ages, sexes, BMIs, number of children, smoker statuses, regions, and insurance charges. 
    # These are likely lists or similar data structures, where each index across the lists corresponds to a single patient.
    def __init__(self, patients_ages, patients_sexes, patients_bmis, patients_num_children, 
                 patients_smoker_statuses, patients_regions, patients_charges):
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges

    
#Function to get the average age of the patients
    def age_analyze(self):
        total_age = 0
        for age in self.patients_ages:
            total_age += int(age)
        return ("The average age of the patients is: ", str(round(total_age / len(self.patients_ages), 2)))

    
    #Function to get the number of males vs females counted in the dataset
    def analyze_sexes(self):
        # initialize number of males and females to zero
        females = 0
        males = 0
        # iterate through each sex in the sexes list
        for sex in self.patients_sexes:
            # if female add to female variable
            if sex == 'female':
                females += 1
            # if male add to male variable
            elif sex == 'male':
                males += 1
        # print out the number of each
        print("Count for female: ", females)
        print("Count for male: ", males)

    #Function to find geographical location of the patients
    def analyze_regions(self):
        unique_regions = []
        for region in self.patients_regions:
            #This will analyze if the region is already in the unique_regions list, 
            # if not it will append it to the list
            if region not in unique_regions:
                unique_regions.append(region)
        return ("The geographical locations of the patients are: ", unique_regions)
    
    #Function to return the average yearly medical charges of the patients
    def analyze_charges(self):
        total_charges = 0
        for charge in self.patients_charges:
            total_charges += float(charge)
        return ("The average yearly medical charges of the patients is: ", str(round(total_charges / len(self.patients_charges), 2)))
    
    #Function to return the average amount of children the patients have
    def average_children(self):
        total_children = 0
        for child in self.patients_num_children:
            total_children += int(child)
        return ("The average amount of children the patients have is: ", str(round(total_children / len(self.patients_num_children), 2)))

    #Function to get the total smokers vs non-smokers counted in the dataset
    def analyze_smoker_statuses(self):
        # initialize number of smokers and non-smokers to zero
        smokers = 0
        non_smokers = 0
        # iterate through each smoker status in the smoker statuses list
        for status in self.patients_smoker_statuses:
            # if smoker add to smoker variable
            if status == 'yes':
                smokers += 1
            # if non-smoker add to non-smoker variable
            elif status == 'no':
                non_smokers += 1
        # print out the number of each
        print("Count for smokers: ", smokers)
        print("Count for non-smokers: ", non_smokers)

    #Function to create a dictionary that contains all patient information
    def create_patient_dict(self):
        self.patient_dict = {}
        self.patient_dict['ages'] = [int(age) for age in self.patients_ages]
        self.patient_dict['sex'] = [self.patients_sexes]
        self.patient_dict['bmi'] = [self.patients_bmis]
        self.patient_dict['num_of_children'] = [self.patients_num_children]
        self.patient_dict['smoker_status'] = [self.patients_smoker_statuses]
        self.patient_dict['regions'] = [self.patients_regions]
        self.patient_dict['insurance_charges'] = [self.patients_charges]
        return self.patient_dict



Now that we had analyze all the information of the dataset, as we place all of it into a class, we need to call that class to see all the results:

In [101]:
#Calling the class and its methods to see the results
patient_info = Patinent_info(ages, sex, bmis, num_of_children, smoker_status, regions, insurance_charges)

Now i would love to answer the following questions:
* What is the average age of the patients in the dataset?
* Is there more people smoker or non-smokers?
* What are the average yearly medical charges of the patients?



In [104]:
#The average age of the patients in the dataset
patient_info.age_analyze()

('The average age of the patients is: ', '39.21')

In [105]:
#Is there more people smoker or non-smokers?
patient_info.analyze_smoker_statuses()

Count for smokers:  274
Count for non-smokers:  1064


In [106]:
#What are the average yearly medical charges of the patients?
patient_info.analyze_charges()

('The average yearly medical charges of the patients is: ', '13270.42')