![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=nhanes-blood-pressure/examining-blood-pressure.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## Health Data

### Recommended Grade levels: 8-12
<br>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

How do the health measurements in the supplied dataset compare to those in the classroom? This notebook is interactive in that a classroom element can be added by allowing students to collect their own measurements and compare.

### Goal
Our goal is to investigate an accessible type of data (body measurements) and compare it against at least one other set of the same measurements. 

- How are the datasets similar? 
- How are they different? 
- What could be a cause of the differences?
- Is there anything in the data that's surprising or sticks out?

The data that we're looking at in particular is taken from several thousand American children and adults. For each person we have height mesurements, systolic and diastolic blood pressure measurements, as well as [resting heart rate](https://www.heartandstroke.ca/heart-disease/what-is-heart-disease/how-a-healthy-heart-works). All measurements were recorded three times.

### Blood Pressure

Blood pressure is an [important indicator of cardiovascular health](https://www.heartandstroke.ca/heart-disease/risk-and-prevention/condition-risk-factors/high-blood-pressure). The measurement typically consists of two numbers: systolic blood pressure, and diastolic blood pressure. Blood pressure is usually recorded as:

$$
\frac{120}{80} mmHg
$$

where the units are **millimetres of mercury (mmHg)**.

The first number, systolic, is the pressure exerted by the heart on the arteries as it contracts and forces blood out of the left ventricle. This number can change quickly and frequently, often as a response to both negative stressors on the body (emotion, illness) or as a natural response to healthy states (exercise, sleep). The **typical healthy systolic blood pressure in adults is ~120 mmHg**, though it's not uncommon to have lower blood pressures in active adults and children. Higher (resting) values of systolic blood pressure are generally a negative sign.

Diastolic blood pressure is the second value reported and is more stable than systolic blood pressure. It represents the pressure in the arteries when the heart is between beats. A **typical healthy diastolic blood pressure is ~80 mmHg**, though similar to systolic, it's not unusual to have lower values. As this measurement is more stable, seeing higher values of diastolic blood pressure is more concerning than higher values of systolic.

When *either* resting blood pressure value is above a certain number (>135 for systolic, >85 for diastolic), it indicates the presence of a condition called **hypertension**. Hypertension, or just simply high blood pressure, [can put you at risk of several nasty health problems](https://www.heartandstroke.ca/-/media/pdf-files/canada/health-information-catalogue/en-managing-your-blood-pressure.ashx) such as heart attack and stroke. The routine measurement of blood pressure is important to detect hypertension. It's also critical to not make a diagnosis on a single measurement as that value can be abnormally high (or low) on that particular day and may not truly be of concern.

### Heart Rate

Probably one of the simplest measurements of body function to collect is heart rate. There are [multiple locations on the body that can be used to measure heart rate](https://www.healthline.com/health/how-to-check-heart-rate), but the easiest is the radial artery in the wrist:

![Radial HR measurement](img/VHFC0134_How_do_I_measure_my_heart_rate_image1.jpeg)
<p>
<font size="1"> 
    https://s32917.pcdn.co/wp-content/uploads/2020/02/VHFC0134_How_do_I_measure_my_heart_rate_image1.jpeg
</font>
</p>



By counting the number of beats that occur in a 15-second time period, and multiplying that number by 4, you can measure your heart rate in **beats per minute (bpm)**.

A typical (resting) heart rate in adults is between 60-72 bpm (up to 100 bpm in children), though extremely fit individuals can have a much lower heart rate. Heart rate can be elevated by the same stressors as blood pressure (exercise, emotional state) and can drop considerably during sleep.

Though blood pressure requires equipment to properly measure, both heart rate and height can be measured in the classroom. As you learn about visualizing this data, think about collecting your own and see how it compares.

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
## Import libraries
%pip install -q pyodide_http plotly nbformat
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go

### Data:

We're going to collect the data from three different sources. The links below are to the files themselves, but to save time they've already been downloaded and are accessed in the next step

- [National Health and Nutrition Examination Survey (NHANES) Body Measurement](https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&Cycle=2017-2020)
- [National Health and Nutrition Examination Survey (NHANES) Blood Pressure](https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&Cycle=2017-2020)
- [National Health and Nutrition Examination Survey (NHANES) Demographics](https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&Cycle=2017-2020)

### Import the data

In [None]:
# Import data
nhanesDataHt = pd.read_sas('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/nhanes-blood-pressure/data/P_BMX.XPT')[['SEQN','BMXHT']]
nhanesDataBp = pd.read_sas('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/nhanes-blood-pressure/data/P_BPXO.XPT')[['SEQN', 
                                                                                                                                           'BPXOSY1', 
                                                                                                                                           'BPXOSY2', 
                                                                                                                                           'BPXOSY3', 
                                                                                                                                           'BPXODI1', 
                                                                                                                                           'BPXODI2', 
                                                                                                                                           'BPXODI3', 
                                                                                                                                           'BPXOPLS1', 
                                                                                                                                           'BPXOPLS2', 
                                                                                                                                           'BPXOPLS3']]
nhanesDataDem = pd.read_sas('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/nhanes-blood-pressure/data/P_DEMO.XPT')[['SEQN', 'RIAGENDR']]

### Comment on the data
We have three datasets from the NHANES, one containing data about height, another containing data about blood pressure and heart rate. We also have a third dataset that has information about the sexes of our participants, so we can use that as well to show differences between males and females. 

The datasets contain far more data than we actually need, so we only selected the columns that we need and discarded the rest. However, the names of the columns aren't very descriptive, so in the next step we can make it easier to handle.

# Organize

Now we will rename our columns, as the coding given to them by the researchers who collected the data isn't very helpful. Thankfully, the researchers also published a 'data dictionary' that helps us convert the codes into something more helpful.

In [None]:
# Data cleaning
anthNames = {
    'SEQN': 'ID', 
    'BMXHT': 'Height (cm)'
}
bpNames = { 
    'SEQN' : 'ID',
    'BPXOSY1' : 'Systolic BP (1st reading)',
    'BPXOSY2' : 'Systolic BP (2nd reading)',
    'BPXOSY3' : 'Systolic BP (3rd reading)',
    'BPXODI1' : 'Diastolic BP (1st reading)',
    'BPXODI2' : 'Diastolic BP (2nd reading)',
    'BPXODI3' : 'Diastolic BP (3rd reading)',
    'BPXOPLS1' : 'Heart rate (1st reading)',
    'BPXOPLS2' : 'Heart rate (2nd reading)',
    'BPXOPLS3' : 'Heart rate (3rd reading)'
}
demNames = {
    'SEQN': 'ID',
    'RIAGENDR': 'Sex'
}
nhanesDataHt.rename(anthNames, axis=1, inplace=True)
nhanesDataBp.rename(bpNames, axis=1, inplace=True)
nhanesDataDem.rename(demNames, axis=1, inplace=True)

In [None]:
# Display the data
print('Height data')
display(nhanesDataHt)
print('Blood pressure and heart rate data')
display(nhanesDataBp)
print('Demographic data')
display(nhanesDataDem)

Hey wait a second, what does a sex of '1.0' mean? Or '2.0'? 

Another quirk of working with data is that computers much prefer numbers to words. Therefore, features that would normally have a text (or 'categorical') response are initially recorded as numbers. We're not computers though, so we can switch the values to something that makes a little more sense to us.

In [None]:
# Replace sex coding
nhanesDataDem.replace({1.0: 'Male', 2.0: 'Female'}, inplace=True)
display(nhanesDataDem)

### Comment on the data

We have our datasets prepared and with column names that are descriptive of the data they contain. There's one final step that we need to do though: add the information about sex to the other two datasets. This will allow us to look at the differences between males and females.

Joining datasets is extremely common in data science as it's a necessary step anytime you want to compare and contrast two separate datasets.

In [None]:
# Combine datasets
nhanesDataHt_comb = pd.merge(nhanesDataHt, 
                        nhanesDataDem, 
                        how='inner', 
                        on='ID')
nhanesDataBp_comb = pd.merge(nhanesDataBp, 
                        nhanesDataDem, 
                        how='inner', 
                        on='ID')
display(nhanesDataHt_comb)
display(nhanesDataBp_comb)

Now it should be more clear what exactly the data represents. We can also see that for the first dataset of heights, we have 14,300 observations, whereas the second dataset of blood pressures and heart rates is a little less at 11,656. 

In the height dataset above, some of the data have values of "NaN" which indicates missing data. This is not uncommon in data science; data can be missing for a variety of reasons, such as equipment malfunction, errors in entry, or it simply never existed to begin with. Using "NaN" is more helpful than leaving the value blank or replacing it with a value of zero as that can cause confusion or throw off any calculations.

For our purposes here, missing values are not a big problem, as our plotting functions will just ignore them. However, in more advanced statistics, handling missing data becomes a very important task.

As you may have also noticed, for the blood pressure and heart rate measurements, we have multiple recordings. It might make for an interesting visualization to compare the individual measurements to their average. In order to do that, we have to create a new column in our dataset that represents the mean of the three measurements for both blood pressure and heart rate.

In [None]:
# Create mean columns
nhanesDataBp_comb['Average Systolic BP'] = round(nhanesDataBp_comb[['Systolic BP (1st reading)', 'Systolic BP (2nd reading)', 'Systolic BP (3rd reading)']].mean(axis=1),1)
nhanesDataBp_comb['Average Diastolic BP'] = round(nhanesDataBp_comb[['Diastolic BP (1st reading)', 'Diastolic BP (2nd reading)', 'Diastolic BP (3rd reading)']].mean(axis=1),1)
nhanesDataBp_comb['Average Heart Rate'] = round(nhanesDataBp_comb[['Heart rate (1st reading)', 'Heart rate (2nd reading)', 'Heart rate (3rd reading)']].mean(axis=1),1)
display(nhanesDataBp_comb)

# Explore

We'll start by plotting our data, which is always a good first step when you're analyzing data as it can reveal patterns or anomalies.

Another important aspect of science is using your background knowledge to put your data in context. The health data we're looking at in this notebook is commonly collected and studied by health professionals. We can use that information to give more meaning to our data by comparing it to the population averages and to specified cutoffs for healthy values.

In [None]:
# Height data
fig = go.Figure()
fig.add_trace(go.Histogram(x=nhanesDataHt_comb['Height (cm)'], 
                           name='Both', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_trace(go.Histogram(x=nhanesDataHt_comb[nhanesDataHt_comb['Sex']=='Male']['Height (cm)'], 
                           name='Male', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_trace(go.Histogram(x=nhanesDataHt_comb[nhanesDataHt_comb['Sex']=='Female']['Height (cm)'], 
                           name='Female', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_vline(175.3, annotation_text='Male (adult) average', annotation_position='right top')
fig.add_vline(161.5, annotation_text='Female (adult) average', annotation_position='left top')
fig.update_layout(barmode='overlay',
                 title='Histogram of NHANES heights',
                 legend=dict(title='Sex'),
                 height=800)
fig.update_xaxes(title_text='Height (cm)')
fig.update_yaxes(title_text='Count')
fig.show()

In [None]:
# Blood pressure data
fig = px.histogram(nhanesDataBp_comb,
                   x=['Systolic BP (1st reading)', 'Systolic BP (2nd reading)', 'Systolic BP (3rd reading)', 'Average Systolic BP'],
                   title='Histogram of Systolic BP readings',
                   barmode='overlay',
                   nbins=100,
                   height=800
                   )
fig.add_vline(120, annotation_text='Healthy  ', annotation_position='top left')
fig.add_vline(135, annotation_text='High      ', annotation_position='top left')
fig.add_vline(135, annotation_text='  Hypertension', annotation_position='top right')

fig.show()

In [None]:
# Blood pressure data
fig = px.histogram(nhanesDataBp_comb,
                   x=['Diastolic BP (1st reading)', 'Diastolic BP (2nd reading)', 'Diastolic BP (3rd reading)', 'Average Diastolic BP'],
                   title='Histogram of Diastolic BP readings',
                   barmode='overlay',
                   nbins=100,
                   height=800
                   )
fig.add_vline(80, annotation_text='Healthy  ', annotation_position='top left')
fig.add_vline(80, annotation_text='High', annotation_position='top right')
fig.add_vline(85, annotation_text='  Hypertension', annotation_position='top right')

fig.show()

In [None]:
# Heart rate data
fig = px.histogram(nhanesDataBp_comb,
                   x=['Heart rate (1st reading)', 'Heart rate (2nd reading)', 'Heart rate (3rd reading)', 'Average Heart Rate'],
                   title='Histogram of heart rate readings',
                   barmode='overlay',
                   nbins=100,
                   height=800
                   )
fig.add_vrect(x0=60, x1=100,
              annotation_text="Typical resting HR range (adolescents)", 
              annotation_position="top right",
              fillcolor="grey", 
              opacity=0.2, 
              line_width=0)

fig.show()

# Interpret

As we can see in the plots above, there's a generally "normal" distribution to the blood pressure and heart rate data. The vast majority of subjects tend to have measurements that cluster around the mean, with counts dropping off quickly as you get farther away from the center in either direction. There are significant outliers in both directions, but that's an inevitability in any biological dataset. [Normal distributions are common when describing natural measurements](https://wiki.kidzsearch.com/wiki/Normal_distribution).

However, height doesn't follow the same pattern. Why might that be?

#### Height
- How do your measured values compare to the data? Why would there be any difference? 
- Why is neither the male nor female average height located at the highest point of the histogram?

#### Blood Pressure
- Is there a noticeable difference between subsequent measurements of blood pressure (you can turn them on and off by clicking on the legend)? 
- Why might that exist (or not exist)?
- Given the shape of the histogram, would you say most participants in this dataset have a healthy blood pressure? How many do not?

#### Heart Rate
- How does your measured heart rate compare to the data? Can you think of any reason why they might not fall in the center of the plot?
- Perform a quick exercise for 30 seconds or so, like jumping jacks or pushups. How does that change your heart rate? Is it still in the normal range?
- Why would subsequent measurements of heart rate differ? You can investigate this by showing and hiding the different variables by clicking on the legend

## Try it yourself!

Enter your heart rate in the code cell below and re-run the cell to see how your resting and elevated heart rates compare to the data:

In [None]:
# Enter heart rate below as a number (i.e., 70 or 70.0):

userRestHR = 70
userElevHR = 90

# Code for plotting
fig = px.histogram(nhanesDataBp_comb,
                   x=['Heart rate (1st reading)', 'Heart rate (2nd reading)', 'Heart rate (3rd reading)', 'Average Heart Rate'],
                   title='Histogram of heart rate readings',
                   barmode='overlay',
                   nbins=100,
                   height=800
                   )
fig.add_vrect(x0=60, x1=100,
              annotation_text="Typical resting HR range (adolescents)", 
              annotation_position="top right",
              fillcolor="grey", 
              opacity=0.2, 
              line_width=0)
fig.add_vline(userRestHR, annotation_text='User Rest HR', 
              annotation_position='left',
              line_dash='dash',
              line_width=3)
fig.add_vline(userElevHR, annotation_text='User Elevated HR', 
              annotation_position='right',
              line_dash='dashdot',
              line_width=3)

fig.show()

And we can do the same with your height:

In [None]:
# Enter height below as a number (i.e., 170 or 170.0):

userHt = 170

# Code for plotting
fig = go.Figure()
fig.add_trace(go.Histogram(x=nhanesDataHt_comb['Height (cm)'], 
                           name='Both', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_trace(go.Histogram(x=nhanesDataHt_comb[nhanesDataHt_comb['Sex']=='Male']['Height (cm)'], 
                           name='Male', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_trace(go.Histogram(x=nhanesDataHt_comb[nhanesDataHt_comb['Sex']=='Female']['Height (cm)'], 
                           name='Female', 
                           nbinsx=100, 
                           opacity=0.6))
fig.add_vline(175.3, annotation_text='Male (adult) average', annotation_position='right top')
fig.add_vline(161.5, annotation_text='Female (adult) average', annotation_position='left top')
fig.add_vline(userHt, annotation_text='User height', line_dash='dash', annotation_position='left')
fig.update_layout(barmode='overlay',
                 title='Histogram of NHANES heights',
                 legend=dict(title='Sex'),
                 height=800)
fig.update_xaxes(title_text='Height (cm)')
fig.update_yaxes(title_text='Count')
fig.show()

# Communicate
Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?

- I used to think __but now I know__.
- I wish I knew more about __.
- This visualization reminds me of __.
- I really like __.


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)