# U.S. Medical Insurance Costs

In this project, a **CSV** file with medical insurance costs will be investigated using Python fundamentals plus more things I discovered on the internet. The goal with this project will be to analyze various attributes within **insurance.csv** to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

In [1]:
import pandas as pd

To start, all necessary libraries must be imported. For this project the library needed is the `pandas` library in order to work with the **insurance.csv** data. There are other potential libraries that could help with this project; however, for this analysis, using just the `pandas` library will suffice.

In [2]:
df = pd.read_csv("insurance.csv")

In this project, a CSV file with medical insurance costs will be investigated using Python fundamentals. The goal with this project will be to analyze various attributes within insurance.csv to learn more about the patient information in the file and gain insight into potential use cases for the dataset.

The next step is to look through **insurance.csv** in order to get aquanted with the data. Using `pandas` makes this fairly easy. The following aspects of the data file will be checked in order to plan out how to import the data into a Python file:
* The names of columns and rows
* Any noticeable missing data
* Types of values (numerical vs. categorical)

In [3]:
df.columns

Index(['age', 'sex', 'bmi', 'children', 'smoker', 'region', 'charges'], dtype='object')

**insurance.csv** contains the following columns:
* Patient Age
* Patient Sex 
* Patient BMI
* Patient Number of Children
* Patient Smoking Status
* Patient U.S Geopraphical Region
* Patient Yearly Medical Insurance Cost

There are no signs of missing data. To store this information, seven variables will be created to hold each individual column of data from **insurance.csv**.


In [4]:
ages = df.age
sexes = df.sex
bmis = df.bmi
num_of_children = df.children
smokers = df.smoker
regions = df.region
charges = df.charges

Now that all the data from **insurance.csv** neatly organized into labeled variables, the analysis can be started.  The following operations will be implemented:
* find average age of the patients
* find average bmi of the patients
* return the number of males vs. females counted in the dataset
* find geographical location where the most patients live
* return the average yearly medical charges of the patients
* return the average yearly medical charges of patients that are smokers vs non smokers
* return the average age of smokers vs non smokers
* return the average age of patients with at least one child
* return the average insurance cost based on region
* return the average insurance cost of males and females by region
* return the average bmi of males and females based on region

# Average Age All Patients

In [160]:
print(f"The average age of patients is about {round(ages.mean(),2)} years old.")

The average age of patients is about 39.21 years old.


The average age of the patients in **insurance.csv** is about 39 years old. This is important to check in order to ensure the data in **insurance.csv** is representative for a broader population. If it is decided to use the dataset to make inferences about other populations, the data must abundant and broad enough for such use cases.

A further analysis would have to be done to make sure the [range](https://www.mathsisfun.com/data/range.html#:~:text=The%20Range%20is%20the%20difference,is%209%20%E2%88%92%203%20%3D%206.) and [standard deviation](https://www.mathsisfun.com/data/standard-deviation.html) of the patient age group in **insurance.csv** is indicative of a random sampling of individuals. 

# Number of Males VS Females

In [161]:
females = df.loc[df['sex'] == 'female']
males = df.loc[df['sex'] == 'male']

In [162]:
def analyze_m_f_count(m, f):
    return f"There are {len(m)} males and {len(f)} females."

In [163]:
analyze_m_f_count(males, females)

'There are 676 males and 662 females.'

The next step of the analysis is to check the balance of males vs. females in **insurance.csv**. It is important to check that this dataset is representative of a broader population of individuals. If a person were to use this dataset to create a classification model, it would be imperitive to make sure that the attributes are balanced.

# Geographical Location Where Most Patients Live

In [164]:
regions.value_counts()

southeast    364
northwest    325
southwest    325
northeast    324
Name: region, dtype: int64

The next step of the analysis is to check the number of unique regions and which region has the most patients in **insurance.csv**.
There are four unique geographical regions in this dataset, and it is important to note that all the patients come from the United States. 

# Average Yearly Medical Charges

In [165]:
def analyze_total_costs(total):
    print(f"Average Insurance: {total}")

In [166]:
def calculate_avg_charges(data):
    total = 0
    for charge in data:
        total += charge
    return "$"+'{:,}'.format(round(total / len(data),2))

In [167]:
analyze_total_costs(calculate_avg_charges(charges))

Average Insurance: $13,270.42


Here I analyze the average insurance cost for all patients.

# Average Yearly Medical Charges Smokers VS Non Smokers

In [168]:
smokers = df.loc[df['smoker'] == 'yes']
non_smokers = df.loc[df['smoker'] == 'no']

In [169]:
smokers_avg_cost = calculate_avg_charges(smokers['charges'])
non_smokers_avg_cost = calculate_avg_charges(non_smokers['charges'])

In [170]:
def analyze_cost_by_smoker_and_non(s, ns):
    print(f"Smokers Average Cost:\t  {s}")
    print(f"Non Smokers Average Cost: {ns}")

In [171]:
analyze_cost_by_smoker_and_non(smokers_avg_cost, non_smokers_avg_cost)

Smokers Average Cost:	  $32,050.23
Non Smokers Average Cost: $8,434.27


Here I analyze the average insurance cost broken down by smokers and non smokers.
Smokers pay significantly more on average than non smokers

# Male Average Cost

In [172]:
def analyze_cost_smoker(sm,ns):
    print(f"Smoker Average Cost:\t {sm}")
    print(f"Non Smoker Average Cost: {ns}")

In [173]:
male_smokers = df.loc[(df['smoker'] == 'yes') & (df['sex'] == 'male')]
male_non_smokers = df.loc[(df['smoker'] == 'no') & (df['sex'] == 'male')]

In [174]:
male_smoker_avg = calculate_avg_charges(male_smokers['charges'])
male_non_smoker_avg = calculate_avg_charges(male_non_smokers['charges'])

In [175]:
analyze_cost_smoker(male_smoker_avg, male_non_smoker_avg)

Smoker Average Cost:	 $33,042.01
Non Smoker Average Cost: $8,087.2


# Female Average Cost

In [176]:
female_smokers = df.loc[(df['smoker'] == 'yes') & (df['sex'] == 'female')]
female_non_smokers = df.loc[(df['smoker'] == 'no') & (df['sex'] == 'female')]

In [177]:
female_smoker_avg = calculate_avg_charges(female_smokers['charges'])
female_non_smoker_avg = calculate_avg_charges(female_non_smokers['charges'])

In [178]:
analyze_cost_smoker(female_smoker_avg, female_non_smoker_avg)

Smoker Average Cost:	 $30,679.0
Non Smoker Average Cost: $8,762.3


For some extra analysis I want to calculate the average cost for a male smoker and a female smoker.
Then analyze the average cost for a male non smoker and female non smoker. Male smokers pay more than female smokers, and male non smokers pay less than female non smokers.

# Average Male Age

In [179]:
def analyze_avg_age(sa, nsa):
    print(f"Smoker Average Age:\t {round(sa,2)}")
    print(f"Non Smoker Average Age:\t {round(nsa,2)}")

In [180]:
male_non_smoker_avg_age = male_non_smokers['age'].mean()
male_smoker_avg_age = male_smokers['age'].mean()

In [181]:
analyze_avg_age(male_smoker_avg_age, male_non_smoker_avg_age)

Smoker Average Age:	 38.45
Non Smoker Average Age:	 39.06


# Average Female Age

In [182]:
female_smoker_avg_age = female_smokers['age'].mean()
female_non_smoker_avg_age = female_non_smokers['age'].mean()

In [183]:
analyze_avg_age(female_smoker_avg_age, female_non_smoker_avg_age)

Smoker Average Age:	 38.61
Non Smoker Average Age:	 39.69


The next step of the analysis is to look at ages of smokers and non smokers in **insurance.csv**. Checking to see if there is any correlation in ages and smoking. Also we seperate the data into male and female smokers

The average ages of a smoker and a non smoker are relatively the same.

# Average Age Patients With Kids VS Without

In [184]:
def analyze_avg_age_parent_vs_non(parent, non_parent):
    print(f"Average Age of Patients with Children: {round(parent)}")
    print(f"Average Age of Patients without Children: {round(non_parent)}")

In [185]:
patients_without = df.loc[df['children'] < 1]
patients_with_children = df.loc[df['children'] > 0]

In [186]:
avg_age_with_children = patients_with_children['age'].mean()
avg_age_without = patients_without['age'].mean()

In [187]:
analyze_avg_age_parent_vs_non(avg_age_with_children, avg_age_without)

Average Age of Patients with Children: 40
Average Age of Patients without Children: 38


The next step of the analysis is to find out the average age of patients with at least one child and the average age of patients without children in **insurance.csv**.

The average age of patients with kids versus without kids is relatively the same.

# Average Yearly Medical Costs by Region

In [188]:
def analyze_avg_by_region(swa,sea,nea,nwa):
    print(f"Southwest Average: {swa}")
    print(f"Southeast Average: {sea}")
    print(f"Northeast Average: {nea}")
    print(f"Northwest Average: {nwa}")

In [189]:
northwest = df.loc[df['region'] == 'northwest']
southwest = df.loc[df['region'] == 'southwest']
southeast = df.loc[df['region'] == 'southeast']
northeast = df.loc[df['region'] == 'northeast']

In [190]:
southwest_average = calculate_avg_charges(southwest['charges'])
southeast_average = calculate_avg_charges(southeast['charges'])
northeast_average = calculate_avg_charges(northeast['charges'])
northwest_average = calculate_avg_charges(northwest['charges'])

In [191]:
analyze_avg_by_region(southwest_average, southeast_average, northeast_average, northwest_average)

Southwest Average: $12,346.94
Southeast Average: $14,735.41
Northeast Average: $13,406.38
Northwest Average: $12,417.58


Here we see the see the Southeast has the highest average yearly medical costs.

# Female Average Yearly Medical Costs by Region:

In [192]:
def analyze_by_region_smokers_v_non(nws, nwn, nes, nen, sws, swn, ses, sen):
    print(f"Norhtwest Smoker Average:\t {nws}")
    print(f"Northwest Non Smoker Average:\t {nwn}\n")
    print(f"Northeast Smoker Average:\t {nes}")
    print(f"Northeast Non Smoker Average:\t {nen}\n")
    print(f"Southwest Smoker Average:\t {sws}")
    print(f"Southwest Non Smoker Average:\t {swn}\n")
    print(f"Southeast Smoker Average:\t {ses}")
    print(f"Southeast Non Smoker Average:\t {sen}")

In [193]:
f_n_west_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'northwest') & (df['sex'] == 'female')]
f_n_west_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'northwest') & (df['sex'] == 'female')]
f_s_east_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'southeast') & (df['sex'] == 'female')]
f_s_east_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'southeast') & (df['sex'] == 'female')]
f_n_east_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'northeast') & (df['sex'] == 'female')]
f_n_east_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'northeast') & (df['sex'] == 'female')]
f_s_west_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'southwest') & (df['sex'] == 'female')]
f_s_west_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'southwest') & (df['sex'] == 'female')]

In [194]:
f_n_west_smoker_avg = calculate_avg_charges(f_n_west_smokers['charges'])
f_n_west_non_smoker_avg = calculate_avg_charges(f_n_west_non_smokers['charges'])
f_s_west_smoker_avg = calculate_avg_charges(f_s_west_smokers['charges'])
f_s_west_non_smoker_avg = calculate_avg_charges(f_s_west_non_smokers['charges'])
f_s_east_smoker_avg = calculate_avg_charges(f_s_east_smokers['charges'])
f_s_east_non_smoker_avg = calculate_avg_charges(f_s_east_non_smokers['charges'])
f_n_east_smoker_avg = calculate_avg_charges(f_n_east_smokers['charges'])
f_n_east_non_smoker_avg = calculate_avg_charges(f_n_east_non_smokers['charges'])

In [195]:
female_by_region = analyze_by_region_smokers_v_non(f_n_west_smoker_avg, f_n_west_non_smoker_avg, f_s_west_smoker_avg, f_s_west_non_smoker_avg, f_s_east_smoker_avg, f_s_east_non_smoker_avg, f_n_east_smoker_avg, f_n_east_non_smoker_avg)

Norhtwest Smoker Average:	 $29,670.82
Northwest Non Smoker Average:	 $8,787.0

Northeast Smoker Average:	 $31,687.99
Northeast Non Smoker Average:	 $8,234.09

Southwest Smoker Average:	 $33,034.82
Southwest Non Smoker Average:	 $8,440.21

Southeast Smoker Average:	 $28,032.05
Southeast Non Smoker Average:	 $9,640.43


Here we see on average that female smokers in the southwest pay the most, and female smokers in the northeast pay the least.

# Cost Analysis for Males by Region:

In [196]:
m_n_west_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'northwest') & (df['sex'] == 'male')]
m_n_west_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'northwest') & (df['sex'] == 'male')]
m_n_east_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'northeast') & (df['sex'] == 'male')]
m_n_east_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'northeast') & (df['sex'] == 'male')]
m_s_west_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'southwest') & (df['sex'] == 'male')]
m_s_west_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'southwest') & (df['sex'] == 'male')]
m_s_east_smokers = df.loc[(df['smoker'] == 'yes') & (df['region'] == 'southeast') & (df['sex'] == 'male')]
m_s_east_non_smokers = df.loc[(df['smoker'] == 'no') & (df['region'] == 'southeast') & (df['sex'] == 'male')]

In [197]:
m_n_west_smoker_avg = calculate_avg_charges(m_n_west_smokers['charges'])
m_n_west_non_smoker_avg = calculate_avg_charges(m_n_west_non_smokers['charges'])
m_s_west_smoker_avg = calculate_avg_charges(m_s_west_smokers['charges'])
m_s_west_non_smoker_avg = calculate_avg_charges(m_s_west_non_smokers['charges'])
m_s_east_smoker_avg = calculate_avg_charges(m_s_east_smokers['charges'])
m_s_east_non_smoker_avg = calculate_avg_charges(m_s_east_non_smokers['charges'])
m_n_east_smoker_avg = calculate_avg_charges(m_n_east_smokers['charges'])
m_n_east_non_smoker_avg = calculate_avg_charges(m_n_east_non_smokers['charges'])

In [198]:
male_by_region = analyze_by_region_smokers_v_non(m_n_west_smoker_avg, m_n_west_non_smoker_avg, m_s_west_smoker_avg, m_s_west_non_smoker_avg, m_s_east_smoker_avg, m_s_east_non_smoker_avg, m_n_east_smoker_avg, m_n_east_non_smoker_avg)

Norhtwest Smoker Average:	 $30,713.18
Northwest Non Smoker Average:	 $8,320.69

Northeast Smoker Average:	 $32,598.86
Northeast Non Smoker Average:	 $7,778.91

Southwest Smoker Average:	 $36,029.84
Southwest Non Smoker Average:	 $7,609.0

Southeast Smoker Average:	 $30,926.25
Southeast Non Smoker Average:	 $8,664.04


Here we can see that the male smoker in the southwest pays the most, and the male smoker in the southwest pays the least.

# Average Patient BMI

In [199]:
round(bmis.mean(),2)

30.66

Here we see the average bmi for all patients is about 31

# Average BMI Smoker VS Non Smoker

In [200]:
average_smoker_bmi = round(smokers['bmi'].mean(),2)
average_non_smoker_bmi = round(non_smokers['bmi'].mean(),2)

In [201]:
def analyze_bmi(asb, ansb):
    print(f"Average Smoker BMI: {asb}")
    print(f"Average Non Smoker BMI: {ansb}")

In [202]:
analyze_bmi(average_smoker_bmi, average_non_smoker_bmi)

Average Smoker BMI: 30.71
Average Non Smoker BMI: 30.65


The average bmi between smokers and non smokers is relatively the same

# Average BMI by Region

In [203]:
avg_n_west_bmi = round(northwest['bmi'].mean(),2)
avg_n_east_bmi = round(northeast['bmi'].mean(),2)
avg_s_west_bmi = round(southwest['bmi'].mean(),2)
avg_s_east_bmi = round(southeast['bmi'].mean(),2)

In [204]:
def analyze_bmi_by_region(anw, ane, asw, ase):
    print(f"Average Northwest BMI: {anw}")
    print(f"Average Northeast BMI: {ane}")
    print(f"Average Southwest BMI: {asw}")
    print(f"Average Southeast BMI: {ase}")

In [205]:
analyze_bmi_by_region(avg_n_west_bmi, avg_n_east_bmi,  avg_s_west_bmi, avg_s_east_bmi)

Average Northwest BMI: 29.2
Average Northeast BMI: 29.17
Average Southwest BMI: 30.6
Average Southeast BMI: 33.36


Patients in the southeast have the highest average bmi

# Average Male BMI by Region

In [206]:
m_n_west_smoker_avg_bmi = round(m_n_west_smokers['bmi'].mean(),2)
m_n_west_non_smoker_avg_bmi = round(m_n_west_non_smokers['bmi'].mean(),2)
m_n_east_smoker_avg_bmi = round(m_n_east_smokers['bmi'].mean(),2)
m_n_east_non_smoker_avg_bmi = round(m_n_east_non_smokers['bmi'].mean(),2)
m_s_west_smoker_avg_bmi = round(m_s_west_smokers['bmi'].mean(),2)
m_s_west_non_smoker_avg_bmi = round(m_s_west_non_smokers['bmi'].mean(),2)
m_s_east_smoker_avg_bmi = round(m_s_east_smokers['bmi'].mean(),2)
m_s_east_non_smoker_avg_bmi = round(m_s_east_non_smokers['bmi'].mean(),2)

In [207]:
def analyze_bmi_by_region_smokers_v_non(nws, nwn, nes, nen, sws, swn, ses, sen):
    print(f"Norhtwest Smoker Average BMI:\t  {nws}")
    print(f"Northwest Non Smoker Average BMI: {nwn}\n")
    print(f"Northeast Smoker Average BMI:\t  {nes}")
    print(f"Northeast Non Smoker Average BMI: {nen}\n")
    print(f"Southwest Smoker Average BMI:\t  {sws}")
    print(f"Southwest Non Smoker Average BMI: {swn}\n")
    print(f"Southeast Smoker Average BMI:\t  {ses}")
    print(f"Southeast Non Smoker Average BMI: {sen}")

In [208]:
analyze_bmi_by_region_smokers_v_non(m_n_west_smoker_avg_bmi, m_n_west_non_smoker_avg_bmi, m_n_east_smoker_avg_bmi, m_n_east_non_smoker_avg_bmi, m_s_west_smoker_avg_bmi, m_s_west_non_smoker_avg_bmi, m_s_east_smoker_avg_bmi, m_s_east_non_smoker_avg_bmi)

Norhtwest Smoker Average BMI:	  29.98
Northwest Non Smoker Average BMI: 28.93

Northeast Smoker Average BMI:	  29.56
Northeast Non Smoker Average BMI: 28.86

Southwest Smoker Average BMI:	  31.5
Southwest Non Smoker Average BMI: 31.02

Southeast Smoker Average BMI:	  33.65
Southeast Non Smoker Average BMI: 34.13


Here we see the average male bmi is highest in the southeast for both smokers and non smokers

# Average Female BMI by Region

In [209]:
f_n_west_smoker_avg_bmi = round(f_n_west_smokers['bmi'].mean(),2)
f_n_west_non_smoker_avg_bmi = round(f_n_west_non_smokers['bmi'].mean(),2)
f_n_east_smoker_avg_bmi = round(f_n_east_smokers['bmi'].mean(),2)
f_n_east_non_smoker_avg_bmi = round(f_n_east_non_smokers['bmi'].mean(),2)
f_s_west_smoker_avg_bmi = round(f_s_west_smokers['bmi'].mean(),2)
f_s_west_non_smoker_avg_bmi = round(f_s_west_non_smokers['bmi'].mean(),2)
f_s_east_smoker_avg_bmi = round(f_s_east_smokers['bmi'].mean(),2)
f_s_east_non_smoker_avg_bmi = round(f_s_east_non_smokers['bmi'].mean(),2)

In [210]:
analyze_bmi_by_region_smokers_v_non(f_n_west_smoker_avg_bmi, f_n_west_non_smoker_avg_bmi, f_n_east_smoker_avg_bmi, f_n_east_non_smoker_avg_bmi, f_s_west_smoker_avg_bmi, f_s_west_non_smoker_avg_bmi, f_s_east_smoker_avg_bmi, f_s_east_non_smoker_avg_bmi)

Norhtwest Smoker Average BMI:	  28.3
Northwest Non Smoker Average BMI: 29.49

Northeast Smoker Average BMI:	  27.26
Northeast Non Smoker Average BMI: 29.78

Southwest Smoker Average BMI:	  30.13
Southwest Non Smoker Average BMI: 30.05

Southeast Smoker Average BMI:	  32.25
Southeast Non Smoker Average BMI: 32.78


Here we see the average female bmi is the highest in the southeast for both smokers and non smokers

# Average Male Age by Region

In [211]:
def analyze_age_by_region_smokers_v_non(nws, nwn, nes, nen, sws, swn, ses, sen):
    print(f"Norhtwest Smoker Average Age:\t  {nws}")
    print(f"Northwest Non Smoker Average Age: {nwn}\n")
    print(f"Northeast Smoker Average Age:\t  {nes}")
    print(f"Northeast Non Smoker Average Age: {nen}\n")
    print(f"Southwest Smoker Average Age:\t  {sws}")
    print(f"Southwest Non Smoker Average Age: {swn}\n")
    print(f"Southeast Smoker Average Age:\t  {ses}")
    print(f"Southeast Non Smoker Average Age: {sen}")

In [212]:
m_n_west_smoker_avg_age = round(m_n_west_smokers['age'].mean(),2)
m_n_west_non_smoker_avg_age = round(m_n_west_non_smokers['age'].mean(),2)
m_n_east_smoker_avg_age = round(m_n_east_smokers['age'].mean(),2)
m_n_east_non_smoker_avg_age = round(m_n_east_non_smokers['age'].mean(),2)
m_s_west_smoker_avg_age = round(m_s_west_smokers['age'].mean(),2)
m_s_west_non_smoker_avg_age = round(m_s_west_non_smokers['age'].mean(),2)
m_s_east_smoker_avg_age = round(m_s_east_smokers['age'].mean(),2)
m_s_east_non_smoker_avg_age = round(m_s_east_non_smokers['age'].mean(),2)

In [213]:
analyze_age_by_region_smokers_v_non(m_n_west_smoker_avg_age, m_n_west_non_smoker_avg_age, m_n_east_smoker_avg_age, m_n_east_non_smoker_avg_age, m_s_west_smoker_avg_age, m_s_west_non_smoker_avg_age, m_s_east_smoker_avg_age, m_s_east_non_smoker_avg_age)

Norhtwest Smoker Average Age:	  39.83
Northwest Non Smoker Average Age: 38.57

Northeast Smoker Average Age:	  37.87
Northeast Non Smoker Average Age: 39.22

Southwest Smoker Average Age:	  35.57
Southwest Non Smoker Average Age: 40.28

Southeast Smoker Average Age:	  40.05
Southeast Non Smoker Average Age: 38.26


Here we see that male patients who smoke in the southeast and male patients who do not smoke in the southwest have the highest average age

# Average Female Age by Region

In [214]:
f_n_west_smoker_avg_age = round(f_n_west_smokers['age'].mean(),2)
f_n_west_non_smoker_avg_age = round(f_n_west_non_smokers['age'].mean(),2)
f_n_east_smoker_avg_age = round(f_n_east_smokers['age'].mean(),2)
f_n_east_non_smoker_avg_age = round(f_n_east_non_smokers['age'].mean(),2)
f_s_west_smoker_avg_age = round(f_s_west_smokers['age'].mean(),2)
f_s_west_non_smoker_avg_age = round(f_s_west_non_smokers['age'].mean(),2)
f_s_east_smoker_avg_age = round(f_s_east_smokers['age'].mean(),2)
f_s_east_non_smoker_avg_age = round(f_s_east_non_smokers['age'].mean(),2)

In [215]:
analyze_age_by_region_smokers_v_non(f_n_west_smoker_avg_age, f_n_west_non_smoker_avg_age, f_n_east_smoker_avg_age, f_n_east_non_smoker_avg_age, f_s_west_smoker_avg_age, f_s_west_non_smoker_avg_age, f_s_east_smoker_avg_age, f_s_east_non_smoker_avg_age)

Norhtwest Smoker Average Age:	  38.83
Northwest Non Smoker Average Age: 39.76

Northeast Smoker Average Age:	  38.72
Northeast Non Smoker Average Age: 39.84

Southwest Smoker Average Age:	  37.05
Southwest Non Smoker Average Age: 40.1

Southeast Smoker Average Age:	  39.25
Southeast Non Smoker Average Age: 39.07


Here we see that female patients throught all regions have fairly similar ages on average

# Conclusion