# U.S. Medical Insurance Costs

In this project, I will investigate medical insurance data using Python fundamnetals. I will analyze various attributes within the file **insurance.csv** to learn more about the data and gain insight into potential use cases.

In [3]:
import csv

To start, I import my libraries. For this project, I only need to use the **csv** library to work with the data.

Next, I will look through the data to acquaint myself with it. This will allow me to think more critically about my analysis and plan how I will import the data into my program

In [4]:
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

The data file contains the following columns:
* Age
* Sex
* BMI
* Number of children
* Smoking status
* U.S. geographic region
* Yearly insurance cost

There are no signs of missing data. Knowing this, I decided to create seven empty lists to hold the data from each column.

In [3]:
def load_list_data(lst, csv_file, column_name):
    with open(csv_file) as csv_info: # Opens the csv file
        csv_dict = csv.DictReader(csv_info) # Reads data
        for row in csv_dict: # Loops through data in each row
            lst.append(row[column_name]) # Adds data to a list
        return lst # Returns list

I decided to build this helper function to make the loading of data into my lists as efficient as possible. Without this function I would need seven for-loops. With this function, I simply need to call the function seven times:

In [None]:
load_list_data(ages, 'insurance.csv', 'age')
load_list_data(sexes, 'insurance.csv', 'sex')
load_list_data(bmis, 'insurance.csv', 'bmi')
load_list_data(num_children, 'insurance.csv', 'children')
load_list_data(smoker_statuses, 'insurance.csv', 'smoker')
load_list_data(regions, 'insurance.csv', 'region')
load_list_data(insurance_charges, 'insurance.csv', 'charges')

The data are now neatly organized into labeled lists. I can now begin my analysis. This is where I must plan out what I want to investigate and how to perform the analysis. I have decided to perform the following analyses:

* Find the average age of patients
* Count the number of males and females in the dataset
* Find the geographic locations of patients
* Find the average yearly medical charge of the patients
* Create a dictionary containing all patient information

To perform these inspections, I built a class called `PatientsInfo` that contains five methods:
* `analyze_ages()`
* `analyze_sexes()`
* `unique_regions()`
* `average_charges()`
* `create_dictionary()`

In [5]:
class PatientsInfo:
    
    def __init__(self, patients_ages, patients_sexes, patients_bmis, patients_num_children, 
                 patients_smoker_statuses, patients_regions, patients_charges):
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges
        
    def analyze_ages(self):
        total_age = 0
        for age in self.patients_ages:
            total_age += int(age)
        return("Average Patient Age:", str(round(total_age/len(self.patients_ages), 2)), "years")
    
    def analyze_sexes(self):
        females = 0
        males = 0
        for sex in self.patients_sexes:
            if sex == 'female':
                females += 1
            else:
                males += 1
        print("Number of females:", females)
        print("Number of males:", males)
        
    def unique_regions(self):
        unique_regions = []
        for region in self.patients_regions:
            if region not in unique_regions:
                unique_regions.append(region)
        return unique_regions
    
    def average_charges(self):
        total_charges = 0
        for charge in self.patients_charges:
            total_charges += float(charge)
        return("Average Annual Medical Insurance Charges:", str(round(total_charges/len(self.patients_charges), 2)), "dollars.")
    
    def create_dictionary(self):
        self.patients_dictionary = {}
        self.patients_dictionary["age"] = [int(age) for age in self.patients_ages]
        self.patients_dictionary["sex"] = self.patients_sexes
        self.patients_dictionary["bmi"] = self.patients_bmis
        self.patients_dictionary["children"] = self.patients_num_children
        self.patients_dictionary["smoker"] = self.patients_smoker_statuses
        self.patients_dictionary["regions"] = self.patients_regions
        self.patients_dictionary["charges"] = self.patients_charges
        return self.patients_dictionary

I then create an instance of the class `patient_info`. With this instance, I use each method to see the results of my analysis and interpret my findings.

In [6]:
patient_info = PatientsInfo(ages, sexes, bmis, num_children, smoker_statuses, regions, insurance_charges)

In [7]:
patient_info.analyze_ages()

('Average Patient Age:', '39.21', 'years')

The average patient's age in the dataset is about 39 years old. This is important to check because we want to know if our data are representative of a broad population. If we want to use these data to make inferences about other populations, we must ensure that the data is abundant and broad enough to make such inferences.

We would have to conduct further analyses to ensure that the range and standard deviation of the patient age group is indicative of a random sample of individuals.

In [8]:
patient_info.analyze_sexes()

Number of females: 662
Number of males: 676


For reasons similar to those stated above, it is important to check the gender balance of the data to ensure that they are representative of a braod population. Often, real-world data are unbalanced, which can lead to statistical issues when performing analyses.

In [9]:
patient_info.unique_regions()

['southwest', 'southeast', 'northwest', 'northeast']

There are four unique geographic regions in this dataset. All patients come from the United States.

In [10]:
patient_info.average_charges()

('Average Annual Medical Insurance Charges:', '13270.42', 'dollars.')

The average yearly medical insurance cost per individual is USD 13270. Further analysis can be done to see which patient attributes contribute most strongly to lower or higher insurance charges. For example, one could check to see if patient age correlates with their insurance charge.

In [None]:
patient_info.create_dictionary()

All patient data is now neatly organized into a dictionary. This is convenient for further analysis if I decide to continue investigating the dataset.