# U.S. Medical Insurance Costs

## Project Overview
This project aims to analyze medical insurance costs in the United States, focusing on identifying key demographic and lifestyle factors that affect insurance prices. By exploring this dataset, we seek to uncover insights that can help better understand the relationship between insurance costs and factors such as age, smoking habits, and geographical region.
## Objectives
The primary objectives of this analysis are to:
1. *Calculate the average age of patients* - Gain insights into the age distribution of individuals in the dataset.
2. *Identify the predominant regions* - where most individuals are located, giving a geographical perspective on the data.
3. *Compare insurance costs between smokers and non-smokers* - Explore how lifestyle choices impact medical insurance expenses.
4. *Analyze the average cost of insurance of individuals with at least one child* - What is the impact on cost insurance among those with family responsibilities.
5. *Determine if it is more expensive a female insurance cost or a male insurance cost* - Comparing average cost for women and men.



In [1]:
# Importing the .csv file and appending it's rows into a python list: 
import csv
list_of_insurance = []
with open('insurance.csv') as insurance_csv:
    insurance_reader = csv.DictReader(insurance_csv)
    for row in insurance_reader:
        list_of_insurance.append(row)


## 1. Calculating the average age of patients:

In [2]:
# Creating Function to get the average value of age
list_of_ages = []
age_average = 0

def age_average_funct():
    for person in list_of_insurance:
        # Converting age value into integer and appending it to list of ages
        list_of_ages.append(int(person['age']))
    
    # Calculating age average as long as there's ages left
    if len(list_of_ages) > 0:
        age_average = sum(list_of_ages) / len(list_of_ages)
        return age_average
    else:
        return 0 

# Calling function
average_age = age_average_funct()
print("Average Age:", average_age)
    

Average Age: 39.20702541106129


## 2. Identifying the predominant regions:

In [3]:
# Creating a list of regions:
list_of_regions = []
for person in list_of_insurance:
    list_of_regions.append(person['region'])
# Counting each region and storing them in a separate variable:
northeast = 0
northwest = 0
southeast = 0
southwest = 0
for region in list_of_regions:
    if region == 'northeast':
        northeast += 1
    if region == 'northwest':
        northwest += 1
    if region == 'southeast':
        southeast += 1
    if region == 'southwest':
        southwest += 1
print('Northeast: '+ str(northeast))
print('Northwest: '+ str(northwest))
print('Southeast: '+ str(southeast))
print('Southwest: '+ str(southwest))


Northeast: 324
Northwest: 325
Southeast: 364
Southwest: 325


## 3. Comparing insurance costs between smokers and non-smokers:

In [4]:
# Creating two lists to store smoker patients and non-smoker patients:
list_of_smokers = []
list_of_non_smokers = []
# Appending to the lists the charges for each patient
def smokers():
    for person in list_of_insurance:
        if person['smoker'] == 'yes':
            list_of_smokers.append(float(person['charges']))
        elif person['smoker'] == 'no':
            list_of_non_smokers.append(float(person['charges']))
smokers()
# Creating function to calculate average insurance cost 
def smoker_avg():
    smokers_avg = sum(list_of_smokers) / len(list_of_smokers)
    Non_smokers_avg = sum(list_of_non_smokers) / len(list_of_non_smokers)
    print('The average insurance cost for smoker patients is: $' + str(smokers_avg) + " while the average insurance cost for non-smoker patients is: $" + str(Non_smokers_avg))
    print("The difference between a smoker patient and a non-smoker patient is: $" + str(smokers_avg - Non_smokers_avg))
smoker_avg()

The average insurance cost for smoker patients is: $32050.23183153285 while the average insurance cost for non-smoker patients is: $8434.268297856202
The difference between a smoker patient and a non-smoker patient is: $23615.963533676644


## 4. Analyze the average cost of insurance of individuals with at least one child:

In [5]:
# Creating two separate lists for people with at least one child and people with no child:
no_kids_list = []
kids_list = []
# Appending data to our lists:
def kids_categorizer():
    for person in list_of_insurance:
        if person['children'] == '0':
            no_kids_list.append(float(person['charges']))
        else:
            kids_list.append(float(person['charges']))
kids_categorizer()
# Creating a function that returns average cost for each list:
def cost_avg_if_kids ():
    avg_for_non_parents= sum(no_kids_list) / len(no_kids_list)
    avg_for_parents =sum(kids_list) / len(kids_list)
    difference = avg_for_parents - avg_for_non_parents
    print('The average insurance cost for people with at least one children is $' + str(avg_for_parents) + '.')
    print('The average insurance cost for people with no children is $' + str(avg_for_non_parents)+ '.')
    print('On average, people with at least one child in our data pay $' +str(difference)+ ' more than people without children.')
cost_avg_if_kids()




The average insurance cost for people with at least one children is $13949.941093481675.
The average insurance cost for people with no children is $12365.975601635888.
On average, people with at least one child in our data pay $1583.9654918457873 more than people without children.


## 5. Determine if it is more expensive a female insurance cost or a male insurance cost:

In [6]:
#Lists to store female and male patients
women= []
men=[]
#function appends data to our lists:
def sex_function():
    for person in list_of_insurance:
        if person['sex'] == 'female':
            women.append(float(person['charges']))
        elif person['sex'] == 'male':
            men.append(float(person['charges']))
sex_function()
#Function returns the average insurance cost for female patients and male patients
def avg_for_sex():    
    avg_charges_women = sum(women) / len(women)
    avg_charges_men = sum(men) / len(men)
    difference_wm = avg_charges_men - avg_charges_women
    print('The average insurance cost for female patients is $' + str(avg_charges_women) + '.')
    print('The average insurance cost for male patients is $' + str(avg_charges_men)+ '.')
    print('On average, male patients in our data pay $' +str(difference_wm)+ ' more than female patients.')
avg_for_sex()


The average insurance cost for female patients is $12569.578843835347.
The average insurance cost for male patients is $13956.751177721893.
On average, male patients in our data pay $1387.1723338865468 more than female patients.


# Key Findings
## After completing an in-depth analysis of the patient data, several key insights have emerged. These findings provide a clearer understanding of the demographic trends, cost differentials based on specific characteristics, and potential implications for cost management.

### 1.- Average Patient Age
The average age of patients across the dataset is 39 years old. This insight highlights that the typical patient profile leans toward a middle-aged demographic, which may influence healthcare needs and associated costs due to age-related health risks.
### 2.- Regional Concentration of Patients
The Southeast region is the most represented in the dataset, with a total of 364 records. This dominance suggests a regional trend that could indicate a higher demand for health services or varying health risk factors prevalent in this area compared to others. This insight may prompt further exploration into environmental or lifestyle factors unique to the Southeast region.
### 3.- Impact of Smoking on Insurance Costs
A significant cost disparity is observed between smokers and non-smokers. The average cost difference attributed to smoking status is approximately $23,615.96. This substantial difference underscores the economic impact of smoking on healthcare costs, suggesting that targeted health interventions could not only improve health outcomes but potentially reduce costs for both insurers and patients.
### 4.- Effect of Having Children on Insurance Costs
Patients with children incur $1,583.96 more on average in insurance costs compared to those without children. While this difference is less pronounced than other factors, it may reflect additional healthcare requirements or a correlation with age-related health conditions common among individuals with dependents.
### 5.- Gender-Based Cost Differences
On average, male patients pay $1,387.17 more in insurance costs than female patients. This gender-based cost differential could reflect differing risk profiles, healthcare usage patterns, or other factors tied to gender, warranting further investigation to identify the underlying causes.


# Conclusion
This analysis provides valuable insights into factors that influence insurance costs among different patient demographics. Key findings revealed that age, smoking status, and gender significantly affect average costs, with smoking status leading to the highest increase in expenses. The data also underscores a regional concentration of patients in the Southeast, which may signal the need for targeted health programs and regional policy adjustments.

These insights suggest opportunities for strategic interventions that can help reduce insurance costs and improve patient outcomes. For instance, promoting smoking cessation programs or offering family-oriented insurance plans could mitigate some of the costs associated with these high-impact factors. Additionally, understanding and addressing regional trends can enable insurers to better serve concentrated demographics, potentially reducing costs over time.

By leveraging these findings, stakeholders can design more tailored, cost-effective insurance solutions that respond to the unique needs of different patient groups, ultimately fostering a more efficient and responsive healthcare system. Further analysis could deepen our understanding of these trends and explore additional factors that may influence insurance costs.









## This project is part of my GitHub portfolio, demonstrating proficiency in Python, data analysis, and visual storytelling.