# Data Science Portfolio - US Medical Insurance Notebook ##

**Created By: Albert B. Schultz**

**Date Created: 05/21/2023**

**Version: 1.00**

## Table of Contents ##

1. [Introduction](#1.-Introduction)
2. [Understanding Purpose, Goals, and Vision](#2.-Understanding-Purpose,-Goals,-and-Vision)
3. [Import the Raw CSV Medical Insurance Data Set](#3-Import-the-Raw-CSV-Medical-Insurance-Data-Set)
4. [Perform Exploratory Data Analysis](#4.-Perform-Exploratory-Data-Analysis)

## 1. Introduction ##

In this project, a **csv** file with the medical insurance costs will be investigated using the Python Fundamentals to understand the medical insurance data and make the information understandable by stakeholders who wishes to further analyze the data set. The goal of this portfolio project will be to analyze various attributes within the **insurance.csv** file and to gain insights about potential use cases for the medical insurance data set.

## 2. Understanding Purpose, Goals, and Vision ##

The vision of the this Medical Insurance Portfolio notebook is; to create an understandable and easy to read analysis of the cleaned data set of the medical insurance table so others interested in this data science mission can continue going through the cleaned data analysis to hopefully come up with a root cause and solution to the medical insurance issues presented in the csv file.

**Vision:** To understand the medical insurance data set to meet our vision, we must set some goals so we can understand how to get to the vision of a clean medical insurance data set so others may see and understand our point of view during the data analysis to exploration phases.

**Goals:**
1. Review the insurance.csv file to understand if there are missing information for observations along with abnormalities in the attributes of the table.
2. Import the insurance.csv file into the Python IDE environment for staging, extractions, data manipulations and presentation.
3. Create lists and import the insurance.csv data based on the attribute column names for brevity and organizational sake.
4. Perform Exploratory Data Analysis to understand aspects of the cleaned medical data set into the dicitonary lists using various tools like loops, comprehension lists, and functions.
5. Present the cleaned data set to stakeholders in case they want to continue using the cleaned data set dictionaries for their own usages.

## 3. Import the Raw CSV Medical Insurance Data Set ##

**Introduction:** In this section, the data set, insurancs.csv, will be imported into this project to perform data cleaning into separate empty lists so the analysis can be done on the clean data sets.

1. To get started, let us import the csv Python library module first before importing the data set.

In [None]:
import csv

2. The next step is, to look through the **insurance.csv** file and get acquainted with the attributes and the observations within the raw data set. Some things to keep in mind:
a. Names of the attributes (columns) and observations (rows).
b. Any noticeable missing information from the observations.
c. Types of values in the attributes.

In [None]:
#Create empty lists that represents each attributes with a total of 7 attributes according to the raw data set csv file.
ages = []
sexes = []
bmis = []
num_children = []
smoker_statuses = []
regions = []
insurance_charges = []

3. Based on the review of the **insurance.csv file, I have found 7 attributes and they are:
a. **Patient Age**
b. **Patient Sex**
c. **Paitent BMI**
d. **Patient Number of Children**
e. **Patient Smoking Status**
f. **Patient U.S Geographical Region**
g. **Patient Yearly Medical Insurance Premium**

4. Based on the preliminary review of the raw data set, there were **no missing data** in the insurance.csv file.

5. To store the data into seven empty lists executed above, seven empty list would be created using the custom function, **load_list_data** with input variables, **lst, csv_file, column_name**.

In [None]:
#Create a function that loads the data from the insurance.csv file to the seven empty lists.
def load_list_data(lst, csv_file, column_name):
    #Ppen csv file
    with open(csv_file) as csv_info:
        #Read the data from the csv file
        csv_dict = csv.DictReader(csv_info)
        #Loop through the data in each row of the csv
        for row in csv_dict:
            #Add the data from each row to a list
            lst.append(row[column_name])
        #Return the list
        return lst

6. This helper function above was created to load data into the list much easier. Now, the load_list_data function will be utilized to load the csv that matches the attributes name into separate section through the function below.

In [None]:
load_list_data(ages, 'Data Science Datasets/insurance.csv', 'age')
load_list_data(sexes, 'Data Science Datasets/insurance.csv', 'sex')
load_list_data(bmis, 'Data Science Datasets/insurance.csv', 'bmi')
load_list_data(num_children, 'Data Science Datasets/insurance.csv', 'children')
load_list_data(smoker_statuses, 'Data Science Datasets/insurance.csv', 'smoker')
load_list_data(regions, 'Data Science Datasets/insurance.csv', 'region')
load_list_data(insurance_charges, 'Data Science Datasets/insurance.csv', 'charges')

**Summary:** Now that the data has been properly moved into separate empty lists, the analysis can be done properly to gather needed information about the medical insurance data set.

## 4. Perform Exploratory Data Analysis ##

**Introduction:** Since we have the newly filled data lists, we can perform various analysis of the medical insurance data set. Here are some examples that can be done:
a. Finding the average age of the patients.
b. Finding the number of male vs. female counts in the cleaned data set.
c. Finding the geographical locations of the patients.
d. Creating a dictionary that contains all of the medical records.

1. To perform the above inspection examples, let us create a **class** called **PatientInfo** that has five sub-classes below:
analyze_ages()
analyze_sexes()
unique_regions()
average_charges()
medical_dictionary()

In [None]:
#Create a overall class called, PatientInfo, which contains five sub-classes of the functions needed to be done to the new data set.
class PatientInfo:
    #Use the init method to take each list parameters and assign it to the self.patients_* variable for the class, PatientInfo.
    def __init__(self,
                 patients_ages,
                 patients_sexes,
                 patients_bmis,
                 patients_num_children,
                 patients_smoker_statuses,
                 patients_regions,
                 patients_charges
                 ):
        self.patients_ages = patients_ages
        self.patients_sexes = patients_sexes
        self.patients_bmis = patients_bmis
        self.patients_num_children = patients_num_children
        self.patients_smoker_statuses = patients_smoker_statuses
        self.patients_regions = patients_regions
        self.patients_charges = patients_charges

    #Use the function method to calculate the average ages of the patients in the insurance.csv cleaned data set.
    def analyze_ages(self):
        #Create the variable total_age and set it to zero for the initial value.
        total_age = 0
        #Iterate through the ages in the self.patients_ages list that was defined from the def __init__ method.
        for age in self.patients_ages:
            #Find the sum of all of the patients' ages from the list variable, ages that was created in Chapter 3 Step 6 using the load_list_data() method.
            total_age += int(age) #Convert it to integer and add the total of the ages together.
        return print(f"The average patient age is, {str(round(total_age/len(self.patients_ages),2))} years old.")

    #Use the function method to calculate the number of males and females in the insurance.csv cleaned data set.
    def analyze_sexes(self):
        #Create the variable female and male and set them to 0.
        males = 0
        females = 0
        #Iterate through the sexes in the self.patients_sexes list that was defined from the def __init__ method.
        for sex in self.patients_sexes:
            #If male add to male variable.
            if sex == "male":
                males += 1
            #If female add to female variable.
            elif sex == "female":
                females += 1
        #Print out the two variable total count from both sexes.
        print(f"Count for male is, {str(males)}.")
        print(f"Count for female is, {str(females)}.")

    #Use the function method to find each unique region that the patients came from.
    def unique_regions(self):
        #Create an empty initialized list called, unique_regions.
        unique_regions = []
        #Iterate through the regions in the self.patients_regions list that was defined from the def __init__ method.
        for region in self.patients_regions:
            #If region is not already in the unique regions list, then add the region to the unique regions list.
            if region not in unique_regions:
                unique_regions.append(region)
        #Return unique region list.
        return unique_regions

    #Use the function method to find the average medical charges for the patients in the cleaned insurance.csv data set.
    def average_charges(self):
        #Create a initialized variable called, total_charges with a 0.
        total_charges = 0
        #Iterate through the charges in self.patients_charges list that was defined from the def __init__ method.
        for charge in self.patients_charges:
            #Add each charge to total_charges variable.
            total_charges += float(charge)
        #Return the average charges rounded to the hundredths place.
        return print(f"The average yearly medical insurance charges were, ${str(round(total_charges/len(self.patients_charges), 2))} dollars.")

    #Create a function that compiles individual lists into a dictionary to make it into a medical_dictionary().
    def medical_dictionary(self):
        self.patients_dictionary = {}
        self.patients_dictionary["ages"] = [int(age) for age in self.patients_ages]
        self.patients_dictionary["sex"] = self.patients_sexes
        self.patients_dictionary["bmi"] = self.patients_bmis
        self.patients_dictionary["children"] = self.patients_num_children
        self.patients_dictionary["smoker"] = self.patients_smoker_statuses
        self.patients_dictionary["regions"] = self.patients_regions
        self.patients_dictionary["charges"] = self.patients_charges
        return self.patients_dictionary

2. The next step is to create an instance that has a class called, **patient_info** which contains each method that can be used to see the results of the medical data analysis.

In [None]:
patient_info = PatientInfo(ages, sexes, bmis, num_children, smoker_statuses, regions, insurance_charges) #These fields contains load_list_data(lst) variables from Chapter 3 Step 6.

In [None]:
#Print out the patient average age to the console below.
patient_info.analyze_ages()

In [None]:
#Print out the patient sexes counts within the clean.csv below.
patient_info.analyze_sexes()

In [None]:
#Print out the patient unique regions below.
patient_info.unique_regions()

In [None]:
#Print out the patient medical dicitonary below.
patient_info.medical_dictionary()

3. All the patient data has been organized neatly in a fashion that allows to be further analyzed later.

**Summary:** This data project portfolio goes through the concepts of importing the medical insurance to analyzing the cleaned data to make data analysis easier for future stakeholders should they decide to use this notebook.