# Questions to Answer:
Area VS Disease: Does living in rural or urban areas affect the risk of heart disease?

Area VS Stroke:  Does living in rural or urban areas affect the risk of stroke?

In [1]:
%matplotlib notebook

In [2]:
# Import dependencies
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [3]:
stroke_df = pd.read_csv("Resources/healthcare-dataset-stroke-data-cleaned.csv")
stroke_df

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,work_type,Residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,Male,67.0,0,1,1,Private,Urban,228.69,36.600000,formerly smoked,1
1,51676,Female,61.0,0,0,1,Self-employed,Rural,202.21,31.735817,never smoked,1
2,31112,Male,80.0,0,1,1,Private,Rural,105.92,32.500000,never smoked,1
3,60182,Female,49.0,0,0,1,Private,Urban,171.23,34.400000,smokes,1
4,1665,Female,79.0,1,0,1,Self-employed,Rural,174.12,24.000000,never smoked,1
...,...,...,...,...,...,...,...,...,...,...,...,...
5105,18234,Female,80.0,1,0,1,Private,Urban,83.75,33.905702,never smoked,0
5106,44873,Female,81.0,0,0,1,Self-employed,Urban,125.20,40.000000,never smoked,0
5107,19723,Female,35.0,0,0,1,Self-employed,Rural,82.99,30.600000,never smoked,0
5108,37544,Male,51.0,0,0,1,Private,Rural,166.29,25.600000,formerly smoked,0


In [4]:
# Filter the rows based on the conditions (heart disease/urban)
filtered_rows = stroke_df[(stroke_df['Residence_type'] == 'Urban') & (stroke_df['heart_disease'] == 1)]

# Count the number of rows
row_count = len(filtered_rows)

print("The number of people with heart disease who live in an urban residence type is", row_count)

# Filter the rows based on the conditions (heart disease/rural)
filtered_rows = stroke_df[(stroke_df['Residence_type'] == 'Rural') & (stroke_df['heart_disease'] == 1)]

# Count the number of rows
row_count = len(filtered_rows)

print("The number of people with heart disease who live in a rural residence type is", row_count)

The number of people with heart disease who live in an urban residence type is 142
The number of people with heart disease who live in a rural residence type is 134


In [5]:
# Data
residence_type = ["Urban", "Rural"]
people_with_HeartDisease = [142, 134]
x_axis = np.arange(len(people_with_HeartDisease))

# Create a bar chart based upon the above data
plt.bar(x_axis, people_with_HeartDisease, color="orchid", align="center")

<IPython.core.display.Javascript object>

<BarContainer object of 2 artists>

**Analysis:** The data shows that while only a slight difference, people are more likely to suffer from Heart Disease living in an urban area. It is important to note that the variance could be due to the fact that population density is typically higher in urban areas vs rural. 

In [6]:
# Give the chart a title, x label, and y label
plt.title("Heart Disease vs Residence Type")
plt.xlabel("Residence Type")
plt.ylabel("Number of People with Heart Disease")
plt.show()

In [7]:
# Create the ticks for bar chart's x axis
tick_locations = [value for value in x_axis]
plt.xticks(tick_locations, residence_type)
plt.show()

# Set the limits of the x axis
plt.xlim(-0.75, len(x_axis)-0.25)
plt.show()

# Set the limits of the y axis
plt.ylim(0, max(people_with_HeartDisease)+10)
plt.show()

In [8]:
# Labels for the sections of our pie chart
labels = ["Urban", "Rural"]
plt.title('Heart Disease in Urban and Rural Areas')

# The values of each section of the pie chart
sizes = [142, 134]

# The colors of each section of the pie chart
colors = ["darkseagreen", "lightskyblue"]

# Tells matplotlib to separate the "Humans" section from the others
explode = (0.1, 0)

# Creates the pie chart based upon the values above & automatically finds the percentages of each part of the pie chart
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct="%1.1f%%", shadow=True, startangle=90)
plt.show()

<IPython.core.display.Javascript object>

**Analysis:** The pie chart is a helpful visual to break down the number of cases of Heart Disease into a percentage, as well as to display how minimal the difference is between urban and rural areas.

In [9]:
# Filter the rows based on the conditions (stroke/urban)
filtered_rows = stroke_df[(stroke_df['Residence_type'] == 'Urban') & (stroke_df['stroke'] == 1)]

# Count the number of rows
row_count = len(filtered_rows)

print("The number of people who've had a stroke and live in an urban residence type is", row_count)

# Filter the rows based on the conditions (stroke/rural)
filtered_rows = stroke_df[(stroke_df['Residence_type'] == 'Rural') & (stroke_df['stroke'] == 1)]

# Count the number of rows
row_count = len(filtered_rows)

print("The number of people who've had a stroke and live in a rural residence type is", row_count)

The number of people who've had a stroke and live in an urban residence type is 135
The number of people who've had a stroke and live in a rural residence type is 114


In [10]:
# Data
residence_type = ["Urban", "Rural"]
people_with_stroke = [135, 114]
x_axis = np.arange(len(people_with_stroke))

# Create a bar chart based upon the above data
plt.bar(x_axis, people_with_stroke, color="salmon", align="center")

<IPython.core.display.Javascript object>

<BarContainer object of 2 artists>

**Analysis:** When comparing the number of people who suffered a stroke and their residence type, it is evident that stroke is more common in urban areas. We see a larger gap between those is urban areas over rural than when comparing residence area with Heart Disease patients.

In [11]:
# Give the chart a title, x label, and y label
plt.title("Stroke vs Residence Type")
plt.xlabel("Residence Type")
plt.ylabel("Number of People who've suffered a Stroke")
plt.show()

In [12]:
# Create the ticks for bar chart's x axis
tick_locations = [value for value in x_axis]
plt.xticks(tick_locations, residence_type)
plt.show()

# Set the limits of the x axis
plt.xlim(-0.75, len(x_axis)-0.25)
plt.show()

# Set the limits of the y axis
plt.ylim(0, max(people_with_stroke)+10)
plt.show()

In [13]:
# Labels for the sections of our pie chart
labels = ["Urban", "Rural"]
plt.title('Stroke in Urban and Rural Areas')

# The values of each section of the pie chart
sizes = [135, 114]

# The colors of each section of the pie chart
colors = ["burlywood", "darkturquoise"]

# Tells matplotlib to separate the "Humans" section from the others
explode = (0.1, 0)

# Creates the pie chart based upon the values above & automatically finds the percentages of each part of the pie chart
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct="%1.1f%%", shadow=True, startangle=90)
plt.show()

<IPython.core.display.Javascript object>

**Analysis:** Again, the pie chart shows the percentage of people who have suffered a stroke in relation to their residence type. The number of people who suffered a stroke in rural areas is almost 10% less, meaning the residence area could be impactful to one's risk of stroke. 