# U.S. Medical Insurance Costs
In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [47]:
# import csv library
import csv

The **CSV** library is imported to handle reading and processing the **insurance.csv** file. This library provides functionality to work with **CSV** files, allowing for easy manipulation of row-based data stored in this common format.

In [50]:
# Open the CSV file
with open('insurance.csv', mode='r') as insurance_datafile:
    
    # Create a CSV reader
    reader = csv.DictReader(insurance_datafile)

    # Read and store the data
    data = [row for row in reader]

# Print the first few rows for inspection
print(data[:5])


[{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}, {'age': '18', 'sex': 'male', 'bmi': '33.77', 'children': '1', 'smoker': 'no', 'region': 'southeast', 'charges': '1725.5523'}, {'age': '28', 'sex': 'male', 'bmi': '33', 'children': '3', 'smoker': 'no', 'region': 'southeast', 'charges': '4449.462'}, {'age': '33', 'sex': 'male', 'bmi': '22.705', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '21984.47061'}, {'age': '32', 'sex': 'male', 'bmi': '28.88', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '3866.8552'}]


This code opens the `insurance.csv` file in read mode. Using The **csv.DictReader**, the file's contents are read into a list of dictionaries, where each row becomes a dictionary. The keys of these dictionaries correspond to the column headers of the `CSV` file, simplifying data access.

The **insurance.csv** contains the following columns:

* Patient Age
* Patient Sex
* Patient BMI
* Patient Number of Children
* Patient Smoking Status
* Patient U.S Geopraphical Region
* Patient Yearly Medical Insurance Cost

There are no signs of missing data.

In [31]:
def read_csv_to_dicts(file_path):
    """
    Reads a CSV file and converts each row into a dictionary.

    :file_path: Path to the CSV file.
    :return: A list of dictionaries, each representing a row in the CSV.

    This function opens a specified CSV file, reads the first line to 
    extract column headers, and then iterates over each line in the file,
    converting them into dictionaries where the keys are the column headers
    and the values are the corresponding data entries.
    """

    # Initialize an empty list to store our dictionaries
    data = []

    # Open the CSV file in read mode
    with open(file_path, 'r') as file:
        # Read the first line to get headers (column names)
        headers = file.readline().strip().split(',')
        
        # Iterate over the remaining lines in the file
        for line in file:
            # Split each line into a list of values
            values = line.strip().split(',')
            
            # Create a dictionary by zipping headers with values
            # and append it to the data list
            row = dict(zip(headers, values))
            data.append(row)

    # Return the list of dictionaries
    return data

# Example usage of the function
file_path = r'/Users/bunnie/Desktop/Python Class/insurance.csv' 
insurancedata_dicts = read_csv_to_dicts(file_path)
print(insurancedata_dicts[:5])  # Print the first 5 entries for verification


[{'age': '19', 'sex': 'female', 'bmi': '27.9', 'children': '0', 'smoker': 'yes', 'region': 'southwest', 'charges': '16884.924'}, {'age': '18', 'sex': 'male', 'bmi': '33.77', 'children': '1', 'smoker': 'no', 'region': 'southeast', 'charges': '1725.5523'}, {'age': '28', 'sex': 'male', 'bmi': '33', 'children': '3', 'smoker': 'no', 'region': 'southeast', 'charges': '4449.462'}, {'age': '33', 'sex': 'male', 'bmi': '22.705', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '21984.47061'}, {'age': '32', 'sex': 'male', 'bmi': '28.88', 'children': '0', 'smoker': 'no', 'region': 'northwest', 'charges': '3866.8552'}]


This code also does the same job as the first one, but this function provides a more reusable, adaptable and potentially robust solution, especialy for larger projects or when working with multiple `CSV` files.

In this step, we focus on importing the data from the **insurance.csv** file into Python in a structured and accessible format. To achieve this, we define a function **read_csv_to_dicts** which takes the file path of a `CSV` file as input and returns a list of dictionaries. Each dictionary represents a row in the `CSV` file, with keys corresponding to column headers.

This approach is beneficial as it provides a flexible and reusable way to read `CSV` files. The function can be used for any `CSV` file, making it a versatile tool in our data processing toolkit. Additionally, having data in the form of dictionaries makes it easier to access and manipulate specific data fields in subsequent analysis steps.

In [51]:
# Initialize lists to store data
ages = []
sexes = []
bmis = []
num_children = []
smokers = []
regions = []
charges = []


This step is to initialize `lists` to store the different categories of data. This helps in organizing the data for easy access during analysis. These `lists` will be filled with data extracted from the CSV file

In [52]:
def average_age(ages):
    """Calculate and return the average age of the patients."""
    return sum(ages) / len(ages)

def count_sexes(sexes):
    """Return the count of males and females in the dataset."""
    males = sexes.count('male')
    females = sexes.count('female')
    return males, females

def unique_regions(regions):
    """Find and return the unique geographical locations of the patients."""
    return set(regions)

def average_charges(charges):
    """Calculate and return the average yearly medical charges of the patients."""
    return sum(charges) / len(charges)

def create_patients_dict(ages, sexes, bmis, num_children, smokers, regions, charges):
    """Create and return a dictionary containing all patient information."""
    patients_dict = {}
    for i in range(len(ages)):
        patients_dict[i] = {
            "Age": ages[i],
            "Sex": sexes[i],
            "BMI": bmis[i],
            "Children": num_children[i],
            "Smoker": smokers[i],
            "Region": regions[i],
            "Charges": charges[i]
        }
    return patients_dict


These `functions` are created to do some summary statistics on the data
* **average_age:** This function calculates the average age of patients. It sums up all the ages and divides by the number of patients.
* **count_sexes:** It counts and returns the number of male and female patients in the dataset.
* **unique_regions:** This function identifies all the unique geographical regions represented in the dataset.
* **average_charges:** It computes the average yearly medical charges for all patients.
* **create_patients_dict:** This function creates a comprehensive dictionary where each patient's information is stored under a unique key.


In [64]:
# Function to load data from CSV file
def load_data(filename):
    with open(filename) as file:
        csv_reader = csv.DictReader(file)
        for row in csv_reader:
            ages.append(int(row['age']))
            sexes.append(row['sex'])
            bmis.append(float(row['bmi']))
            num_children.append(int(row['children']))
            smokers.append(row['smoker'])
            regions.append(row['region'])
            charges.append(float(row['charges']))

# Load the data
load_data('insurance.csv')

This code loads data from `insurance.csv` using Python's csv module. By reading the CSV file as a dictionary, each row is easily accessible by column names, facilitating the extraction and conversion of data into appropriate types (e.g., integer for age, float for BMI). This method is advantageous for its simplicity and flexibility, allowing for easy adaptation to other `CSV` files and straightforward data manipulation for subsequent analysis.

In [79]:
# Execute analysis functions
average_age_result = average_age(ages)
sex_count = count_sexes(sexes)
unique_regions_result = unique_regions(regions)
average_charges_result = average_charges(charges)
patients_info = create_patients_dict(ages, sexes, bmis, num_children, smokers, regions, charges)
print(patients_info)

{0: {'Age': 19, 'Sex': 'female', 'BMI': 27.9, 'Children': 0, 'Smoker': 'yes', 'Region': 'southwest', 'Charges': 16884.924}, 1: {'Age': 18, 'Sex': 'male', 'BMI': 33.77, 'Children': 1, 'Smoker': 'no', 'Region': 'southeast', 'Charges': 1725.5523}, 2: {'Age': 28, 'Sex': 'male', 'BMI': 33.0, 'Children': 3, 'Smoker': 'no', 'Region': 'southeast', 'Charges': 4449.462}, 3: {'Age': 33, 'Sex': 'male', 'BMI': 22.705, 'Children': 0, 'Smoker': 'no', 'Region': 'northwest', 'Charges': 21984.47061}, 4: {'Age': 32, 'Sex': 'male', 'BMI': 28.88, 'Children': 0, 'Smoker': 'no', 'Region': 'northwest', 'Charges': 3866.8552}, 5: {'Age': 31, 'Sex': 'female', 'BMI': 25.74, 'Children': 0, 'Smoker': 'no', 'Region': 'southeast', 'Charges': 3756.6216}, 6: {'Age': 46, 'Sex': 'female', 'BMI': 33.44, 'Children': 1, 'Smoker': 'no', 'Region': 'southeast', 'Charges': 8240.5896}, 7: {'Age': 37, 'Sex': 'female', 'BMI': 27.74, 'Children': 3, 'Smoker': 'no', 'Region': 'northwest', 'Charges': 7281.5056}, 8: {'Age': 37, 'Sex': 

This segment of the code executes various analytical functions on the data loaded previously. `Functions` like `average_age`, `count_sexes`, and `average_charges` perform specific calculations to extract meaningful insights from the data (e.g., average age of patients, gender distribution, average medical charges). The **create_patients_dict** function compiles all patient data into a comprehensive dictionary, enhancing data accessibility and facilitating detailed analysis. These functions exemplify modular and focused analysis, allowing for clear and maintainable code.

In [82]:
def average_age(ages):
    return sum(ages) / len(ages) if ages else 0

patients_average_age = average_age(ages)
print("Patients Average Age:", patients_average_age)

Patients Average Age: 39.20702541106129


This `function` calculates the `average age` of patients in the dataset by taking a list of ages as input, summing them up, and dividing by the total number of patients. It returns the `average age` as a float. Its primary use is to provide insight into the demographic age distribution of the dataset.

In [83]:
def count_sexes(sexes):
    sex_count = {"male": 0, "female": 0}
    for sex in sexes:
        if sex.lower() in sex_count:
            sex_count[sex.lower()] += 1
    return sex_count

patients_sex_count = count_sexes(sexes)
print("Patients Sex Count:", patients_sex_count)

Patients Sex Count: {'male': 6084, 'female': 5958}


The `count_sexes` function takes a list of gender data and iterates through it to count the number of males and females. The result is returned as a dictionary with the counts for each gender. This function is essential for analyzing the gender composition of the patient dataset.

In [73]:
def BMI_category_analysis(bmis):
    categories = {"Underweight": 0, "Normal": 0, "Overweight": 0, "Obese": 0}
    for bmi in bmis:
        if bmi < 18.5:
            categories["Underweight"] += 1
        elif 18.5 <= bmi < 25:
            categories["Normal"] += 1
        elif 25 <= bmi < 30:
            categories["Overweight"] += 1
        else:
            categories["Obese"] += 1
    return categories
# to print out the BMI Category Analysis
patients_BMIcategories = BMI_category_analysis(bmis)
print("Patients BMI Category Analysis:", patients_BMIcategories)

Patients BMI Category Analysis: {'Underweight': 180, 'Normal': 2025, 'Overweight': 3474, 'Obese': 6363}


Categorizes patients into `BMI` groups and counts the number in each category.Utilizes the `BMI` to classify patients into categories such as `Underweight`,`Normal`, `Overweight`, and `Obese`. This function is beneficial for understanding the health demographics of the patient population in terms of weight-related issues.

In [72]:
def smoking_status_analysis(smokers):
    smoker_count = smokers.count("yes")
    non_smoker_count = smokers.count("no")
    return {"Smokers": smoker_count, "Non-Smokers": non_smoker_count}

patients_smokinganalysis = smoking_status_analysis(smokers)
print("Patients Smoking Status:", patients_smokinganalysis)

Patients Smoking Status: {'Smokers': 2466, 'Non-Smokers': 9576}


Compares the number of `smokers` and `non-smokers` in the dataset.This analysis is crucial for understanding smoking prevalence and its potential impact on health in the patient population.

In [74]:
def children_count_analysis(num_children):
    children_distribution = {}
    for children in num_children:
        if children not in children_distribution:
            children_distribution[children] = 1
        else:
            children_distribution[children] += 1
    return children_distribution

patients_childrencounts = children_count_analysis(num_children)
print("Patients' Children Count:", patients_childrencounts)

Patients' Children Count: {0: 5166, 1: 2916, 3: 1413, 2: 2160, 5: 162, 4: 225}


Analyzes the distribution of the number of children among patients. This provides insights into family sizes, which can be a factor in healthcare needs and insurance planning.

In [75]:
def region_charges_analysis(regions, charges):
    region_charges = {}
    for region, charge in zip(regions, charges):
        if region not in region_charges:
            region_charges[region] = [charge]
        else:
            region_charges[region].append(charge)
    average_charges = {region: sum(charges) / len(charges) for region, charges in region_charges.items()}
    return average_charges

patients_region_analysis = region_charges_analysis(region, charges)
print("Patient's Region Analysis",patients_region_analysis)

Patient's Region Analysis {'n': 16884.924, 'o': 1725.5523, 'r': 4449.462, 't': 14195.440655, 'h': 3866.8552, 'e': 3756.6216, 'a': 8240.5896, 's': 7281.5056}


Compares average medical charges across different regions. This analysis helps identify regional cost variations, which can be crucial for healthcare policy and insurance rate setting.

In [76]:
def smoking_impact_on_charges(smokers, charges):
    smoker_charges = [charge for smoker, charge in zip(smokers, charges) if smoker == "yes"]
    non_smoker_charges = [charge for smoker, charge in zip(smokers, charges) if smoker == "no"]
    average_smoker_charge = sum(smoker_charges) / len(smoker_charges)
    average_non_smoker_charge = sum(non_smoker_charges) / len(non_smoker_charges)
    return {"Average Smoker Charge": average_smoker_charge, "Average Non-Smoker Charge": average_non_smoker_charge}

patients_smoking_on_charges =smoking_impact_on_charges(smokers, charges)
print("Patients Smoking Impact On Charges:", patients_smoking_on_charges)

Patients Smoking Impact On Charges: {'Average Smoker Charge': 32050.231831532852, 'Average Non-Smoker Charge': 8434.268297856235}


Analyzes the difference in medical charges between `smokers `and `non-smokers`.This sheds light on the financial impact of smoking on healthcare costs,important for public health campaigns and insurance risk assessments.

This patient data has been meticulously structured into a comprehensive dictionary, facilitating an efficient framework for any further investigative analyses of the attributes contained within the `insurance.csv` dataset. This organized format is instrumental for advanced data analysis and examination.