In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("project1.ipynb")

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Project 1: Midterm Project

In this project, you will utilize the skills and concepts you've learned so far in the course to re-create and explore research surrounding cardiovascular disease, which is the leading cause of death in the United States according to the [U.S. Centers for Disease Control and Prevention](https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm) (CDC).


---

## 🎗️ Assignment Reminders

As you work on this project, consider the following:
- **Initialize Otter.** 🚨 Make sure to run the code cell at the top of this notebook that starts with `# Initialize Otter` to load the auto-grader.
- **Complete the Tasks.** Your Tasks are categorized as auto-graded (📍) and manually graded (📍🔎).
    - For all the auto-graded tasks:
        - Replace the `...` in the provided code cell with your own code.
        - Run the `grader.check` code cell to run some tests on your code.
        - Keep in mind that for homework and project assignments, sometimes there are hidden tests that you will not be able to see the results of that we use for scoring the correctness of your response. **Passing the auto-grader does not guarantee that your answer is correct.**
    - For all the manually graded tasks:
        - You might need to provide your own response to the provided prompt. Do so by replacing the template text "_Type your answer here, replacing this text._" with your own words.
        - You might need to produce a graphic or something else using code. Do so by replacing the `...` in the code cell to generate the image, table, etc.
        - In either case, [review the rubric](https://community.canvaslms.com/t5/Student-Guide/How-do-I-view-the-rubric-for-my-assignment/ta-p/275) on the associated <a href="https://ccsf.instructure.com" target="_blank">Canvas</a> Assignment page to understand the scoring criteria.
- **Code Sharing.** By submitting this project, you agree that you will not share your code directly with anybody but your partner. You are welcome to discuss questions with others but don't share the answers directly. The experience of solving the problems in this project will prepare you for our exams and potentially for future work in this field. If someone asks you for the answer, resist! Instead, you can demonstrate how you would solve a similar problem or you can focus on a specific part of a task.
- **Support.** You are not alone! Review the [Course Support page](https://ccsf-math-108.github.io/materials-sp25/resources/course-support.html) for how to get support in this course. If you're ever feeling overwhelmed or don't know how to make progress, talk to your instructor or a tutor.
- **Advice.** Develop your answers incrementally. To perform a complicated table manipulation, break it up into steps, perform each step on a different line, give a new name to each result, and check that each intermediate result is what you expect. You can add any additional names or functions you want to the provided cells. Make sure that you are using distinct and meaningful variable names throughout the notebook.
- **Variable Names.** Throughout this assignment and all future ones, please be sure to not re-assign variables throughout the notebook! _For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!_
- **Re-Submitting.** You may [submit](#🏁-Submit-Your-Assignment-to-Canvas) this assignment as many times as you want before the deadline. Your instructor will score the last version you submit once the deadline has passed.


---

## Configure the Notebook

Run the following cell to configure this Notebook.

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

---

## 📈 Section 1: Leading Causes of Death

The [National Center for Health Statistics (NCHS)](https://www.cdc.gov/nchs/about/index.html) is part of the CDC and serves as the nation's principal health statistics agency. Their data provide insights into the health of people across the United States and how their health changes over time. The following table from the [NCHS Data Brief No. 521](https://www.cdc.gov/nchs/products/databriefs/db521.htm) published on December 2024 shows the leading causes of death in the U.S. for 2023.


| Cause of Death                                      | Number of Deaths per 100,000 U.S. Standard Population |
|-----------------------------------------------------|------------------|
| Heart disease                                      | 162.1          |
| Cancer                                            | 141.8         |
| Unintentional Injuries               | 62.3          |
| Stroke                  | 39         |
| Chronic lower respiratory diseases                 | 33.4         |
| Alzheimer disease                                | 27.7          |
| Diabetes                                          | 22.4          |
| Kidney disease                                    | 13.1 |
| Chronic liver disease and cirrhosis                | 13.0          |
| COVID-19                                          | 11.9        |

From the table, you should see that **the number one cause of death in the U.S. for 2023 was heart disease**. Has this always been the case?

---

### The Data

The CSV file called `causes_of_death.csv` contains the age-adjusted death rates for heart disease, cancer, unintentional injuries, and stroke from 1900 through 2023. Run the following code cell to create the table called `causes_of_death` from the data in `causes_of_death.csv`.

**Note**: An age-adjusted death rate is a mortality rate standardized to a specific age distribution, enabling fair comparisons between populations with different age structures. The following table presents the number of deaths per 100,000 people in the U.S. standard population.

In [None]:
causes_of_death = Table.read_table('causes_of_death.csv')
causes_of_death

---

### Task 1.1 📍

What year is associated with the maximum age-adjusted death rate for heart disease? Using `causes_of_death` and table methods, assign `max_heart_death_rate_year` to the year (`int`) associated with the highest age-adjusted death rate.

_Points:_ 2

In [None]:
max_heart_death_rate_year = ...
max_heart_death_rate_year

In [None]:
grader.check("task_1_1")

---

### Task 1.2 📍

Soon, we will have you visualize the trend in these death rates, but you will need to re-organize the data first. To prepare for that, create a function called `first` that returns the first element of an array provided as input. For example, `first(make_array(1, 2, 3))` should return `1`.

_Points:_ 2

In [None]:
...

In [None]:
grader.check("task_1_2")

---

### Task 1.3 📍

From `causes_of_death`, create a table called `causes_for_plotting`. 
* The table should have the columns `'Year'`,`'Cancer'`, '`Heart Disease'`, `'Stroke'`, and `'Unintentional Injuries'` presented in that order.
* Each row should contain the year and the relevant age-adjusted death rates for each year.
* The data in the table should be sorted in ascending order based on the years.

The first row should contain the following information:

|Year|Cancer|Heart Disease|Stroke|Unintentional Injuries|
|---:|---:|---:|---:|---:|
|1900|114.8|265.4|244.2|90.3|

_Points:_ 3

In [None]:
causes_for_plotting = ...
causes_for_plotting

In [None]:
grader.check("task_1_3")

---

### Task 1.4 📍🔎

<!-- BEGIN QUESTION -->

Using `causes_for_plotting`, visualize the trend of death rates for each of the 4 leading causes of death by year.

_Points:_ 2

In [None]:
....

# Keep this code to give your image a title
plt.title('Age-Adjusted Death Rates over Time')
plt.show()

<!-- END QUESTION -->

---

### Epidemiological Transition

In many countries like the U.S., infectious diseases have declined over time—aside from pandemics in the early 20th and 21st centuries—while chronic diseases have become more prevalent. This shift, known as the epidemiological transition, highlights the growing burden of conditions like heart disease. 

Epidemiological studies investigate how diseases and other health conditions affect populations by identifying differences between those who develop a condition and those who do not. Originally focused on epidemics, epidemiology evolved to study the natural history of diseases and their contributing factors, with the goal of control and prevention. While it may not always resolve fundamental disease mechanisms, it helps identify key risk factors and their relative importance. Historical successes, such as John Snow’s cholera investigation, demonstrated that disease control is possible without fully understanding the underlying pathology by interrupting causal pathways. By the mid-1940s, epidemiology had proven effective in studying various health conditions, making it a promising approach for cardiovascular disease (CVD) research. 

The decline in heart disease-related deaths over the past half-century is one of public health's greatest achievements. Notably, the Framingham Heart Study was the first formal study to establish associations between heart disease and key risk factors such as smoking, high cholesterol, high blood pressure, obesity, and physical inactivity.  

---

## ❤️ Section 2: The Framingham Heart Study

---

### Background

The [Framingham Heart Study](https://www.framinghamheartstudy.org/) is a landmark study that has greatly advanced our understanding of CVD. In 1947, Dr. Joseph W. Mountin, Assistant Surgeon General, Drs. Lewis C. Robinson, and Gilcin Meadors were sent to organize a heart disease study in cooperation with the Massachusetts State Department of Health, Harvard Medical School's Department of Preventive Medicine, and the Health Disease Demonstration Section of the United States Public Health Service. Launched in response to the mid-20th century rise in CVD, the study aimed to identify modifiable risk factors through a preventive approach rather than focusing solely on treatment. As public health advancements reduced infectious diseases, CVD emerged as the leading cause of death, yet little was known about its causes or prevention. Recognizing this gap, researchers set out to uncover the determinants of CVD by studying the interplay between host and environmental factors. The Framingham Study became instrumental in shaping modern epidemiology, influencing preventive medicine, and guiding public health policies.  



---

### Study Design

The study began in 1948 and 5,209 subjects were initially enrolled in the study. Participants have been examined biennially since the inception of the study and all subjects are continuously followed through regular surveillance for cardiovascular outcomes. Clinic examination data has included cardiovascular disease risk factors and markers of disease such as blood pressure, blood chemistry, lung function, smoking history, health behaviors, ECG tracings, echocardiography, and medication use. Through regular surveillance of area hospitals, participant contact, and death certificates, the Framingham Heart Study reviews and adjudicates events for the occurrence of Angina Pectoris, Myocardial Infarction, Heart Failure, and Cerebrovascular disease.

---

### Task 2.1 📍

Based on the above description, which of the following best describes the study design of the Framingham Heart Study?

1. An observational study
2. A non-randomized, non-controlled experiment
3. A randomized experiment
4. A controlled experiment
5. A randomized, controlled experiment

Assign `framingham_design` to the integer that reflects your choice.

_Points:_ 2

In [None]:
framingham_design = ...
framingham_design

In [None]:
grader.check("task_2_1")

---

### The Data

The following data table sourced from [`framingham.csv`](https://search.r-project.org/CRAN/refmans/riskCommunicator/html/framingham.html) is a subset of the data collected as part of the Framingham study and includes laboratory, clinic, questionnaire, and adjudicated event data on 4,434 participants. Participant clinic data was collected during three examination periods, approximately 6 years apart, from roughly 1956 to 1968. Each participant was followed for a total of 24 years for the outcome of the following events: Angina Pectoris, Myocardial Infarction, Atherothrombotic Infarction or Cerebral Hemorrhage (Stroke) or death. 

In this section of the project, we will have you focus on the variables:
* `'RANDID'`: Unique identification number for each participant
* `'TOTCHOL'`: Serum Total Cholesterol (mg/dL)
* `'ANYCVD'`: Angina Pectoris, Myocardial infarction (Hospitalized and silent or unrecognized), Coronary Insufficiency (Unstable Angina), or Fatal Coronary Heart Disease
    * `0` Did not occur during followup
    * `1` Did occur during followup
* `'DEATH'`: Death from any cause
    * `0`: Did not occur during followup
    * `1`: Did occur during followup

You can learn about the variables in the data set at [the Framingham data set website](https://search.r-project.org/CRAN/refmans/riskCommunicator/html/framingham.html).

Run the following code cell to load the data into the table `framingham`. Note that we are excluding subjects who already had heart disease (`'PREVCHD'`).

In [None]:
framingham = (
    Table.read_table('framingham.csv')
    .where('PREVCHD', 0)
    .select('RANDID', 'TOTCHOL', 'ANYCVD', 'DEATH')
)
framingham

---

### Task 2.2 📍

Data sets often contain missing values or errors, requiring cleanup before analysis. Update the `framingham` table by removing any rows with missing values. In this dataset, missing values are represented as `nan`.

**Hint:** Since the data is presented as numerical data types (`int` and `float`), you cannot just search for the string `nan` or the value `np.nan`. So, we recommend completing this task by combining the fact that `np.nan >= 0` is `False` with the `where ` table method or by utilizing the NumPy function `np.isnan`.

_Points:_ 3

In [None]:
framingham = ...
framingham

In [None]:
grader.check("task_2_2")

---

### Task 2.3 📍

Since cholesterol levels fluctuate over time, you will calculate the average cholesterol level for each individual over the given time period. Create a table called `framingham_by_id`, where each row represents a unique individual from the study, with the following two columns (in order):

* `'RANDID'`: Unique identification number for each participant
* `'AVE TOTCHOL'`: Average `'TOTCHOL'` over the data collection period for the individual
* `'ANYCVD'`: Angina Pectoris, Myocardial infarction (Hospitalized and silent or unrecognized), Coronary Insufficiency (Unstable Angina), or Fatal Coronary Heart Disease
    * `0` Did not occur during the data collection period
    * `1` Did occur during the data collection period
* `'DEATH'`: Death from any cause
    * `0`: Did not occur during the data collection period
    * `1`: Did occur during the data collection period

_Points:_ 3

In [None]:
framingham_by_id = ...
framingham_by_id

In [None]:
grader.check("task_2_3")

---

### Task 2.4 📍🔎

<!-- BEGIN QUESTION -->

Create overlaid histograms to compare the distribution of average serum total cholesterol values for individuals who developed CVD (`ANYCVD`: `1`) and those who did not (`ANYCVD`: `0`). Set the `unit` parameter to `'mg/dL'` and use `np.arange(min_ave_TOTCHOL, max_ave_TOTCHOL + 1, 25)` for the bins, where `min_ave_TOTCHOL` and `max_ave_TOTCHOL` are the minimum and maximum average cholesterol values.


_Points:_ 3

In [None]:
max_ave_TOTCHOL = ...
min_ave_TOTCHOL = ...
...

# Give a title to the graphic
plt.title('Distribution of Average Serum Total Cholesterol')
plt.show()

<!-- END QUESTION -->

---

### Framingham Heart Study Research Goal

As a student researcher, you are now going to use these data to examine one of the main findings of the Framingham study. The question to be addressed is: Is there an association between average serum cholesterol level (i.e., how much cholesterol is in a person's blood) and whether or not a person develops heart disease during the studied period? For the rest of this section, you will use the following hypotheses:

**Null Hypothesis:** In the population, the distribution of average cholesterol levels among those who get heart disease is the same as the distribution of average cholesterol levels among those who do not.

**Alternative Hypothesis:** The average cholesterol levels of people in the population who get heart disease are higher, on average, than the average cholesterol level of people who do not.

---

### Developing a Test Statistic

With the hypotheses and research goal established, you need to define a test statistic where larger values of the test statistic will point towards the alternative. In this case, the alternative states that the average cholesterol levels of people who get heart disease are higher on average than those who do not. As such, it is not enough to measure the magnitude of the difference between average cholesterol levels, but also maintain the directionality implied in the alternative. So, use the test statistic defined as **the average serum total cholesterol of those who develop CVD minus the average serum total cholesterol of those who didn't develop CVD.**

---

### Task 2.5 📍

Define a function called `compute_framingham_test_statistic` that computes the test statistic defined above. It should take a table like `framingham_by_id` with four columns, `'RANDID'`, `'AVE TOTCHOL'`, `'ANYCVD'`, and `'DEATH'`, and it should return the value of the test statistic defined above.

Lastly, call the function you defined to compute the observed test statistic based on the `framingham_by_id` table and assign it to `framingham_observed_statistic`.

_Points:_ 4

In [None]:
# Define the function
...

# Call the function
framingham_observed_statistic = ...
framingham_observed_statistic

In [None]:
grader.check("task_2_5")

---

### Task 2.6 📍

Define a function, `simulate_framingham_null`, to generate a test statistic under the null hypothesis. This function should 
* Take a table like `framingham_by_id` with four columns, `'RANDID'`, `'AVE TOTCHOL'`, `'ANYCVD'`, and `'DEATH'` as input.
* Use that table to create one simulated data set assuming the null hypothesis is true.
* Return the corresponding test statistic calculated on the simulated data set.
Lastly, call the function with `framingham_by_id` to generate one statistic and assign it to `one_simulated_test_stat`.

_Points:_ 3

In [None]:
# Define the function
...

# Call the function


In [None]:
grader.check("task_2_6")

---

### Task 2.7 📍

Define a function called `simulate_n`. The function should:
* Take an integer `n` and a table `tbl` like `framingham_id` as input.
* Run `simulate_framingham_null` `n` times to to generate `n` test statistics using data from `tbl` (not `framingham_by_id` directly).
* Return an array of the `n` generated test statistics.

Lastly, call `simulate_n(1_000, framingham_by_id)` and and assign`framingham_simulated_stats` to the array resulting from the function call.

**Note**: It should take around 1 minute to run this simulation.

_Points:_ 3

In [None]:
# Define the function
def simulate_n(n, tbl):
    ...

# Call the function
framingham_simulated_stats = ...

In [None]:
grader.check("task_2_7")

---

### Distribution of Test Statistics

Run the following code to generate a histogram showing the distribution of the simulated test statistics in `framingham_simulated_stats` along with the observed test statistic calculated earlier.

In [None]:
Table().with_column('Simulated Test Statistic', framingham_simulated_stats).hist(unit='mg/dL')
plt.text(framingham_observed_statistic, 0.02, 'Observed Test Statistic', 
         color='red', fontsize=10, ha='center')
plt.scatter(framingham_observed_statistic, 0, color='red', marker='^', s=60, zorder=3)
plt.title('Distribution of Test Statistics')
plt.show()

---

### Task 2.8 📍

Compute the p-value for this hypothesis test, and assign it to the name `framingham_p_value`.

**Hint**: One of the key findings of the Framingham study was a strong association between cholesterol levels and heart disease. If your p-value doesn't match up with this finding, you may want to take another look at your test statistic and/or your simulation.

_Points:_ 2

In [None]:
framingham_p_value = ...
framingham_p_value

In [None]:
grader.check("task_2_8")

---

### Task 2.9 📍

Is the following statement true or false? Based on the results from this hypothesis test, we can say that high cholesterol levels cause CVD. Assign `True` or `False` to `CVD_causation` to answer this question.

_Points:_ 2

In [None]:
CVD_causation = ...
CVD_causation

In [None]:
grader.check("task_2_9")

---

## 🥓 Section 3: National Diet-Heart Study

---

### Background

To establish a causal link between saturated fat intake, serum cholesterol, and heart disease, a group of doctors in the U.S. launched the [National Diet-Heart Study](https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/575481). The study was conducted across six centers: Baltimore, Boston, Chicago, Minneapolis-St. Paul, Oakland, and Faribault, MN. In the first five locations, volunteers from the local population—along with their families—were asked to modify their diets by either increasing or decreasing their saturated fat intake. The sixth center was located at Faribault State Hospital in Faribault, MN, where participants were institutionalized individuals with disabilities or mental illness. This center was led by Dr. Ivan Frantz and was part of the Minnesota Coronary Experiment. A strong advocate for reducing saturated fat to lower the risk of heart disease, Dr. Frantz was so committed to this idea that he maintained a strict low-saturated-fat diet for his own household.

You might already have a sense of what the doctors expected to find, but the trial's results turned out to be more complex than anticipated.

---

### Study Design

In the Faribault institution, the subjects were randomly divided into two equal groups where half of the subjects were fed meals cooked with saturated fats (which was normal at the time) and half were fed meals cooked with polyunsaturated fats.

---

### Task 3.1 📍

Based on the above description, which of the following best describes the study design of the National Diet-Heart Study?

1. An observational study
2. A non-randomized, non-controlled experiment
3. A randomized experiment
4. A controlled experiment
5. A randomized, controlled experiment

Assign `nation_diet_heart_design` to the integer that reflects your choice.

_Points:_ 2

In [None]:
nation_diet_heart_design = ...
nation_diet_heart_design

In [None]:
grader.check("task_3_1")

---

### Informed Consent

Although standards for informed consent in participation weren't as strict then as they are today, the study was described as follows:

> No consent forms were required because the study diets were considered to be acceptable as house diets and the testing was considered to contribute to better patient care.  Prior to beginning the diet phase, the project was explained and sample foods were served. Residents were given the opportunity to decline participation.

Despite the level of detail and effort in the study, the results of the study were never extensively examined until the late 21st century. Over 40 years after the data were collected, Dr. Christopher Ramsden heard about the experiment, and asked Dr. Frantz's son Robert to uncover the files in the Frantz family home's dusty basement. You can learn more about the story of how the data was recovered on the [Revisionist History podcast](http://revisionisthistory.com/episodes/20-the-basement-tapes) or in [Scientific American magazine](https://www.scientificamerican.com/article/records-found-in-dusty-basement-undermine-decades-of-dietary-advice/).

---

### Accountability

In recent years, poor treatment of patients at Faribault State Hospital (and other similar institutions in Minnesota) has come to light: the state has recently [changed patients' gravestones from numbers to their actual names](https://www.tcdailyplanet.net/minnesota-saying-sorry-treatment-persons-disabilities/), and [apologized for inhumane treatment of patients](https://www.tcdailyplanet.net/minnesota-saying-sorry-treatment-persons-disabilities/).

---

### The Data

Unfortunately, the data for each individual in the 1968 study is not available; only summary statistics are available. Run the following code cell to load the summarized data table.

In [None]:
national_diet_heart = Table.read_table('national_diet_heart.csv')
national_diet_heart

`national_diet_heart` is a table with four columns: `'Age'`, `'Diet'`, `'Participated'`, and `'Died'`. Each row contains a specific patient and has their age group and diet, either a `True` or `False` in the "Died" column, depending on if they are alive or dead at the time of the research reporting, and a `True` in the "Participated" column (since everyone participated in the experiment).

---

### National Diet-Heart Study Research Goal

As a student researcher, your goal now is to use the data in the `national_heart_diet` table along with methods from our course to **determine whether or not CVD death rates were reduced for people on a low saturated fat diet.** For this section, base your work on the following hypotheses:

**Null Hypothesis**: The death rates associated with a polyunsaturated fat diet and a saturated fat diet are the same. Any difference in death rates is due to chance.

**Alternative Hypothesis**: The death rates associated with a polyunsaturated fat diet and a saturated fat diet are different.

---

### Developing a Test Statistic

### Task 3.2 📍

Create a table named `summed_data`, with three columns and two rows. The three columns should be `'Diet'`, `'Death Total'`, and `'Participation Total'`. There should be one row for the polyunsaturated diet group and one row for the saturated diet group, and each row should encode the total number of people who participated in that group and the total number of people who died in that group. 

_Points:_ 3

In [None]:
summed_data = ...
summed_data

In [None]:
grader.check("task_3_2")

---

#### Hazard Rate

To combat the problem above, you decide to use the absolute difference in hazard rates between the two groups as our test statistic. **The *hazard rate* is defined as the proportion of people who died in a specific group out of the total number who participated in the study from that group.**

---

### Task 3.3 📍

Define a new table `summed_hazard_data` that contains the columns of `summed_data` along with an additional column, `Hazard Rate`, that contains the hazard rates for each condition.

_Points:_ 3

In [None]:
summed_hazard_data = ...
summed_hazard_data

In [None]:
grader.check("task_3_3")

---

### Task 3.4 📍

Define the function `compute_death_rate_test_statistic` which takes in a table like `national_diet_heart` (with the same column labels) and returns the absolute difference between the hazard rates of the two diet groups, and call the function to calculate the observed test statistic and assign it to `observed_death_rate_statistic`.

_Points:_ 3

In [None]:
# Define the function
...

# Calculate the observed test statistic
observed_death_rate_statistic = ...
observed_death_rate_statistic

In [None]:
grader.check("task_3_4")

---

### Task 3.5 📍

We are now ready to perform a hypothesis test to determine whether the observed difference in death rates between the two diet groups is statistically significant.  

Define a function, `complete_test`, which takes in `tbl`, a table structured like `national_diet_heart`. The function should:  

1. **Only use the data in `tbl`** and should **not reference `national_diet_heart` directly**. 
2. Simulate 100 samples under the **null hypothesis**, where death rates are **randomly shuffled** between the two groups.  
3. Compute the **difference in death rates** for each simulated sample.  
4. Use these simulated differences to calculate a **p-value** based on our observed data.
5. Return the p-value. 

Lastly, assign `national_diet_heart_p_value` to the p-value generated by `complete_test` when applied to the `national_diet_heart` table.

**Hint**: This problem involves multiple steps. To make implementation easier:  
* Outline the necessary steps before coding.  
* Work incrementally, testing small parts of your function as you go.
* Utilize work you've already done such as using functions you've already created.
* Use comments to keep track of each step.  

**Note:** Your function may take a few minutes to run.

_Points:_ 4

In [None]:
# Define the function
def complete_test(tbl):
    ...

# Call the function
national_diet_heart_p_value = ...
national_diet_heart_p_value

In [None]:
grader.check("task_3_5")

---

### Task 3.6 📍

Using the `national_diet_heart_p_value` above, can you conclude that the change in diet causes a difference in death rate? Assume a normal p-value cutoff of 0.05. Set `reject_national_diet_heart_null` to `True` if we reject the null hypothesis and `False` if we fail to reject the null hypothesis.

_Points:_ 2

In [None]:
reject_national_diet_heart_null = ...
reject_national_diet_heart_null

In [None]:
grader.check("task_3_6")

---

## 🎉 Reflection

Great work making through this project! You've just utilized a lot of the skills and concepts from the course to study the leading causes of death in the U.S. for over one century and replicated two major studies surrounding cardiovascular disease. Step back and think about how far you've come since day one of this course!

---

## 🏁 Submit Your Assignment to Canvas

Follow these steps to submit your project assignment:

1. **Review the Rubric:** View the rubric on the associated Canvas Assignment page to understand the scoring criteria.
2. **Run the Auto-Grader:** Ensure you have executed the code cell containing the command `grader.check_all()` to run all tests for auto-graded tasks marked with 📍. This command will execute all auto-grader tests sequentially.
3. **Complete Manually Graded Tasks:** Verify that you have responded to all the manually graded tasks marked with 📍🔎.
4. **Save Your Work:** In the notebook's Toolbar, go to `File -> Save Notebook` to save your work and create a checkpoint.
5. **Download the Notebook:** In the notebook's Toolbar, go to `File -> Download IPYNB` to download the notebook (`.ipynb`) file.
6. **Upload to Canvas:** On the Canvas Assignment page, click "Start Assignment" or "New Attempt" to upload the downloaded `.ipynb` file.

---

## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()