# COGS 108 - Data Checkpoint

## Authors

- Hannah Daniel: Conceptualization, background research, writing
- Isaac Cordova: Conceptualization, data curation, analysis
- Evenie Osorio: Data curation, analysis, visualization
- Evelyn Cobian: Analysis, visualization, writing
- Deandre Juguilon: Project administration, writing, coordination

## Research Question

How well do diagnosed depression rates predict obesity rates in the 50 U.S. states?

## Background and Prior Work

Obesity and depression are both major public health issues in the United States and rates of both vary a lot from state to state. Obesity rates have increased over time, while depression is one of the most commonly diagnosed mental health conditions among adults. Since both conditions are influenced by multiple factors like income, access to healthcare, and lifestyle, it is reasonable to think they may be related at a larger population level.

Previous research has looked at the relationship between obesity and depression mostly at the individual level. Many studies have found that people with higher body mass index are more likely to report symptoms of depression. Some research also suggests this relationship may work in both directions, where obesity can increase the risk of depression and depression can also contribute to weight gain through changes in behavior or biological factors.

Researchers have also used geographic data to study health trends across the United States. Public health datasets, such as those collected by the CDC, allow researchers to compare health outcomes across states and identify regional patterns. These datasets are commonly used to examine differences in physical and mental health at the population level.

While obesity and depression have both been studied individually, fewer studies focus on how these two variables relate to each other at the state level. This project aims to explore whether U.S. states with higher obesity rates also tend to report higher rates of diagnosed depression. Examining this relationship at the state level may provide insight into broader public health trends and help guide future research.

## Hypothesis

We hypothesize that U.S. states with higher obesity rates will also tend to report higher rates of diagnosed depression. This is based on previous research showing a connection between obesity and depression at the individual level, as well as shared factors such as socioeconomic conditions and access to healthcare. We expect to see a positive association between these two variables across states. Keep in mind, however, that correlation does not imply causation.

Additional factors, such as the stigmatization of mental health leading to unhealthy coping mechanisms or the societal attitudes towards overweight people also may link these variables. The scope of this project is to define the strength of this relationship.

## Data

### Data overview

For this project, we are using the following datasets:

- **Dataset #1: CDC BRFSS Obesity Prevalence**
  - **Link:** [CDC Data Portal](https://data.cdc.gov/)
  - **Observations:** 50 states + DC
  - **Variables:** State name, Obesity percentage (BMI > 30.0)

- **Dataset #2: CDC BRFSS Depression Prevalence**
  - **Link:** [CDC Data Portal](https://data.cdc.gov/)
  - **Observations:** 50 states + DC
  - **Variables:** State name, Percentage of adults ever told they have a depressive disorder

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
import pandas as pd
sys.path.append('./modules') 
import get_data

# Update these with your actual URLs later
datafiles = [
    { 'url': 'REPLACE_WITH_OBESITY_CSV_URL', 'filename':'obesity_data.csv'},
    { 'url': 'REPLACE_WITH_DEPRESSION_CSV_URL', 'filename':'depression_data.csv'}
]

# get_data.get_raw(datafiles, destination_directory='data/00-raw/')

## Ethics 

### A. Data Collection
 - [x] **A.1 Informed consent**: Participants gave informed consent when the data was originally collected by the CDC.
 - [x] **A.2 Collection bias**: Acknowledged that self-reported data may contain bias.
 - [x] **A.3 Limit PII exposure**: Data is aggregated at the state level; no individual PII is used.

*(Additional ethics sections omitted for brevity in this fix)*

## Project Timeline Proposal

| Meeting Date | Meeting Time | Completed Before Meeting | Discuss at Meeting |
|-------------|--------------|--------------------------|--------------------|
| 1/30 | 11:59 PM | Brainstorming | Finalize topic |
| 2/4 | 11:59 PM | Ethics | Submit Proposal |
| 2/18 | 11:59 PM | Data Cleaning | Submit Checkpoint |
| 3/4 | 11:59 PM | EDA | Discuss patterns |
| 3/13 | 11:59 PM | Results draft | Refine visuals |
| 3/18 | 11:59 PM | Final edits | Submit Project |