William Bo Individual Project Proposal

11/04/2024

STA130


Requested Group: William Bo, Leo Liu, Ivanna Garcia, Alexey Albert

**PROPOSAL 1:**  

# Project Proposal: COVID-19 Vaccination and Life Satisfaction

## Research Question

**Is there a relationship between an individual's COVID-19 vaccination status and their reported level of life satisfaction?**

- **Purpose:** To investigate the association between COVID-19 vaccination status and life satisfaction, using categorical data analysis techniques. This analysis aims to reveal potential psychological and social benefits associated with vaccination, providing insights into how vaccination may impact subjective well-being.
- **Significance:** Understanding this relationship could support public health efforts by highlighting the well-being benefits of vaccination, which may encourage higher vaccination uptake.

## Variables

### Outcome Variable
- **Variable Name:** `WELLNESS_life_satisfaction`
- **Description:** Represents an individual’s self-reported life satisfaction, originally measured on a scale from 1 to 10.
- **Data Type:** Numeric (`float64`). For analysis, the variable will be categorized as:
  - **Low** (1–3),
  - **Moderate** (4–7),
  - **High** (8–10).
- **Reason for Selection:** Life satisfaction is a central measure of well-being, making it ideal for assessing the subjective impact of vaccination. Grouping this variable into categories enhances interpretability across vaccination groups, facilitating categorical data analysis.

### Predictor Variable
- **Variable Name:** `COVID_vaccinated`
- **Description:** Indicates the individual’s COVID-19 vaccination status, specifying the number of doses received.
- **Categories:** "No" (0 doses), "Yes, one dose" (1 dose), "Yes, two doses" (2 doses), "Yes, three or more doses" (3+ doses).
- **Reason for Categorization:**  
  Treating vaccination status as a categorical variable, rather than a continuous one, is necessary due to the "three or more doses" category, which may include a range of values. This approach ensures that each level is accurately represented without oversimplification.
- **Reason for Selection:** Vaccination status may influence well-being by providing psychological reassurance or enhancing social interactions. Using it as a categorical predictor facilitates an investigation into potential links with life satisfaction.

## Data Cleaning and Preparation

1. **Handling Missing Values**: Any missing values in `WELLNESS_life_satisfaction` or `COVID_vaccinated` will be addressed. Records with missing data in these variables may either be removed if the proportion is minimal or imputed if appropriate, depending on the dataset size and distribution.
   
2. **Categorizing Life Satisfaction**: Convert `WELLNESS_life_satisfaction` from a numeric to categorical variable by assigning values of 1–3 as "Low," 4–7 as "Moderate," and 8–10 as "High." This transformation is essential to align the variable with categorical analysis methods.

## Visualization and Summary Statistics

### Visualizations for Each Variable

1. **Outcome Variable (`WELLNESS_life_satisfaction`)**:
   - **Visualization**: A **bar chart** of life satisfaction categories (Low, Moderate, High).
   - **Purpose**: To show the distribution of life satisfaction levels across the sample.
   - **Justification**: This visualization provides an overview of the frequency of each life satisfaction category, offering insights into the general well-being reported by the sample.

2. **Predictor Variable (`COVID_vaccinated`)**:
   - **Visualization**: A **bar chart** of vaccination status categories (No doses, One dose, Two doses, Three or more doses).
   - **Purpose**: To display the distribution of vaccination statuses within the sample.
   - **Justification**: This bar chart reveals how vaccination uptake is distributed across categories, which helps in understanding the sample's vaccination profile.

### Summary Statistics for Both Variables

1. **Life Satisfaction Levels**:
   - **Metrics**: Calculate the proportion of individuals in each life satisfaction category (Low, Moderate, High).
   - **Purpose**: To provide a numerical summary of life satisfaction levels, which complements the visual insights.

2. **Vaccination Status**:
   - **Metrics**: Calculate the proportion of individuals in each vaccination status category.
   - **Purpose**: To provide a clear view of the sample's vaccination distribution, supporting the analysis by confirming how many individuals fall into each category.

## Analysis Method

The primary method selected for analysis is **Fisher’s Exact Test**, which is particularly suitable for this dataset due to its robustness with smaller sample sizes or uneven distributions. This test will determine if there is a statistically significant association between COVID-19 vaccination status and life satisfaction.

1. **Fisher’s Exact Test (Primary Test)**:
   - **Objective**: To assess whether the distribution of life satisfaction levels differs significantly across vaccination status categories.
   - **Procedure**: Fisher’s Exact Test calculates an exact p-value based on the observed frequencies in each combination of vaccination status and life satisfaction category, thus bypassing the assumptions required by large-sample tests.
   - **Hypotheses**:
     - **Null Hypothesis (H₀)**: Vaccination status and life satisfaction levels are independent, indicating no association.
     - **Alternative Hypothesis (H₁)**: There is an association between vaccination status and life satisfaction levels.
   - **Interpretation**: A p-value below the significance level (e.g., 0.05) would suggest rejecting the null hypothesis, implying an association between vaccination status and life satisfaction. If the p-value is above 0.05, the null hypothesis cannot be rejected, indicating insufficient evidence for an association.

2. **Chi-Squared Test of Independence (Secondary Option)**:
   - **Purpose**: The chi-squared test is considered if sample sizes are sufficiently large (expected counts ≥ 5 in each cell).
   - **Procedure**: This test compares observed and expected frequencies to evaluate if there is a statistically significant association.
   - **Assumptions**: Requires that expected cell frequencies meet the threshold. If assumptions are not met, Fisher’s Exact Test will be preferred for accurate results.

## Hypothesized Results and Discussion

**Hypothesis:** Vaccinated individuals are hypothesized to report higher levels of life satisfaction compared to unvaccinated individuals. This hypothesis is grounded in the expectation that vaccination may reduce anxiety related to COVID-19 and encourage social participation, both of which could positively impact life satisfaction.

### Relevance
The findings will provide insights into the psychological and social implications of vaccination, which may be valuable for public health campaigns. An observed association would support messaging that emphasizes the broader well-being benefits of vaccination.

### Limitations
1. **Categorization of Life Satisfaction**: Converting a continuous variable to categories may reduce sensitivity, as subtle distinctions within satisfaction levels are lost.
2. **Cross-Sectional Design**: This study’s design limits causal interpretation, as it captures data at a single point in time without tracking changes over time.
3. **Confounding Variables**: Factors such as socioeconomic status, general health, or personality traits may influence both vaccination status and life satisfaction. These confounders are not controlled for in this analysis, which may affect the observed association.

By applying Fisher’s Exact Test as the primary method and considering the Chi-Squared Test for larger samples, this proposal provides a structured and interpretable approach to examining the potential link between COVID-19 vaccination and life satisfaction.


**PROPOSAL 2:**

# Project Proposal 2: COVID-19 Prevention Practices and Social Connections

## Research Question

**Is there a relationship between the extent to which individuals follow COVID-19 handwashing practices and the number of close friends they have?**

- **Purpose:** To assess whether adherence to COVID-19 handwashing guidelines is associated with social connectedness, as measured by the number of close friends.
- **Significance:** Insights from this analysis may shed light on how health practices during a pandemic correlate with social behaviors, potentially guiding public health communication strategies.

## Variables

### Predictor Variable: `COVID_prevention_hand_washing`
- **Description:** This variable captures an individual’s self-reported adherence to COVID-19 handwashing practices.
- **Unique Values:** "Very closely," "Somewhat closely," "Not at all."
- **Data Type:** Categorical (Ordinal).
- **Interpretation:** Varying levels of adherence to handwashing guidelines reflect differences in health-conscious behavior.
- **Reason for Selection:** Handwashing is a key preventive measure during COVID-19. Examining its association with social behaviors could help reveal how health consciousness affects social habits, offering insight into how pandemic health practices intersect with social connectivity.

### Outcome Variable: `CONNECTION_social_num_close_friends_grouped`
- **Description:** This variable categorizes the number of close friends an individual reports having.
- **Unique Values:** "5 or more," "3–4," "1–2."
- **Data Type:** Categorical (Ordinal).
- **Interpretation:** Reflects an individual’s social connectivity, with higher values indicating larger social circles.
- **Reason for Selection:** The number of close friends is a useful indicator of social connectedness. Investigating its relationship with health-conscious behaviors, such as handwashing, could provide insights into how pandemic-related practices relate to social relationships.

## Data Cleaning and Preparation

1. **Handling Missing Values:** Missing values in `COVID_prevention_hand_washing` and `CONNECTION_social_num_close_friends_grouped` will be addressed. Records with missing data may be excluded if they represent a small proportion of the dataset; otherwise, suitable imputation methods will be considered.

2. **Standardizing Categories:** Ensure consistency in categorical responses for both variables. For instance, responses for `COVID_prevention_hand_washing` (e.g., "Very closely") should be checked for variations in spelling or punctuation to ensure data consistency.

3. **Encoding Variables:** Encode ordinal categories numerically to prepare the data for statistical analysis. For example, encode "Very closely" as 3, "Somewhat closely" as 2, and "Not at all" as 1. This maintains the ordinal structure, allowing comparisons that reflect adherence levels.

## Visualization and Summary Statistics

### Visualizations and Summary Statistics for Each Variable

1. **Predictor Variable (`COVID_prevention_hand_washing`):**
   - **Visualization:** A **bar chart** displaying the distribution of handwashing adherence levels.
   - **Purpose:** This chart illustrates the frequency of each adherence level, providing a snapshot of health-conscious behavior across the sample.
   - **Summary Statistics:** Calculate the proportion of individuals within each adherence level. This provides a numerical summary that supports the visualization, offering a clear view of adherence patterns.

2. **Outcome Variable (`CONNECTION_social_num_close_friends_grouped`):**
   - **Visualization:** A **bar chart** showing the distribution of the number of close friends.
   - **Purpose:** This chart shows the distribution of social connectivity across the sample, illustrating the prevalence of different friend group sizes.
   - **Summary Statistics:** Calculate the proportion of individuals in each category of social connectivity. This provides a clear breakdown of social circles within the sample, setting the stage for further comparison across handwashing adherence levels.

## Analysis Method

The primary method for examining the association between handwashing practices and social connectivity is the **Chi-Squared Test of Independence**. This method evaluates whether the distribution of one categorical variable differs significantly across the levels of another categorical variable.

1. **Chi-Squared Test of Independence (Primary Test):**
   - **Objective:** Assess whether COVID-19 handwashing adherence levels are associated with the number of close friends.
   - **Procedure:** The test compares the observed distribution of social connectivity levels across each category of handwashing adherence to the expected distribution if there were no association.
   - **Hypotheses:**
     - **Null Hypothesis (H₀):** Handwashing adherence and social connectivity are independent (i.e., there is no association).
     - **Alternative Hypothesis (H₁):** There is an association between handwashing adherence and social connectivity.
   - **Interpretation:** A p-value below 0.05 would suggest rejecting the null hypothesis, indicating that social connectivity varies with handwashing adherence. A p-value above 0.05 would imply insufficient evidence to reject the null, suggesting no association.
   - **Assumptions:** This test requires that expected cell frequencies are at least 5. If any cell frequencies fall below this threshold, an alternative test is required.

2. **Fisher’s Exact Test (Secondary Option):**
   - **Purpose:** Fisher’s Exact Test is considered when the Chi-Squared Test assumptions are not met (e.g., if any cell has an expected frequency below 5).
   - **Procedure:** This test calculates an exact p-value, which is especially useful for small sample sizes or sparse data.
   - **Interpretation:** Similar to the Chi-Squared Test, a significant p-value would indicate a relationship between handwashing adherence and social connectivity.

## Hypotheses, Expected Results, and Discussion

### Hypotheses
- **Null Hypothesis (H₀):** There is no association between COVID-19 handwashing adherence and the number of close friends.
- **Alternative Hypothesis (H₁):** There is an association between COVID-19 handwashing adherence and the number of close friends.

### Expected Results
- **Test Output:**
   - **P-Value:** A p-value below 0.05 would indicate a statistically significant association between handwashing adherence and social connectivity.
   - **Interpretation:** A significant result would suggest that levels of social connectivity vary with adherence to handwashing practices, hinting at a possible link between health behaviors and social habits.

### Discussion

**Interpretation:** If significant, the findings could suggest that those who adhere more closely to handwashing guidelines tend to have different patterns of social connectivity. Such a correlation might be relevant for public health strategies aimed at balancing health practices with social well-being.

**Limitations:** This study relies on self-reported data, which may be subject to bias. Additionally, a cross-sectional design limits causal inference. Other unobserved factors, such as personality traits or lifestyle choices, could also influence both handwashing adherence and social connectivity, which are not accounted for here.

**Implications for Future Research:** Insights from this study could inform strategies to promote social well-being in public health campaigns. Future studies could investigate other health practices in relation to social behaviors and ideally use longitudinal data to explore potential causal relationships.

This approach, using the Chi-Squared Test of Independence and, if necessary, Fisher’s Exact Test, provides a structured way to analyze the potential relationship between health practices and social connectivity.


**PROPOSAL 3:**

# Project Proposal: Hugging Frequency and Self-Rated Physical Health

## Research Question

**Is there an association between how often individuals hug others in the past three months and their current self-rated physical health?**

- **Purpose:** To explore whether physical affection, measured by the frequency of hugging, is associated with self-assessed physical health. Understanding this relationship may provide insights into how social interactions and physical closeness influence health perceptions.
- **Significance:** Identifying non-medical factors that influence health perceptions could inform public health and social support strategies aimed at improving well-being.

## Variables

### Predictor Variable
- **Variable Name:** `CONNECTION_activities_hug_p3m`
- **Description:** This variable measures the frequency of hugging in the past three months, with categories such as "Daily or almost daily," "A few times a week," "Weekly," and "Not in the past three months."
- **Data Type:** Ordinal, where higher categories represent more frequent hugging.
- **Reason for Selection:** Hugging frequency is chosen as it may affect health perception through social and psychological benefits like stress reduction and increased social bonds, potentially linking physical affection to perceived health.

### Outcome Variable
- **Variable Name:** `WELLNESS_self_rated_physical_health`
- **Description:** Represents an individual’s self-assessment of their physical health, ranging from "Poor" to "Excellent."
- **Data Type:** Ordinal, with higher values indicating better self-rated health.
- **Reason for Selection:** Self-rated health is a widely recognized subjective measure of well-being, aligning with the study’s focus on perceived health in relation to social behavior.

## Data Cleaning and Preparation

1. **Handling Missing Data**: Any records with missing values in `CONNECTION_activities_hug_p3m` or `WELLNESS_self_rated_physical_health` will be reviewed. If missing values are minimal, records may be excluded to maintain data integrity. If a larger proportion of data is missing, appropriate imputation techniques will be considered to preserve sample size.

2. **Category Standardization**: Review and standardize categories to ensure uniformity across responses. For instance, ensure consistency in response options like "Daily or almost daily" across records, avoiding variations that could affect analysis.

3. **Encoding for Analysis**: Encode ordinal variables numerically, preserving their inherent order. For example, assign values to `CONNECTION_activities_hug_p3m` as follows: "Not in the past three months" = 1, "Weekly" = 2, "A few times a week" = 3, "Daily or almost daily" = 4. This encoding will support the statistical methods selected for ordinal data.

## Visualization and Summary Statistics

### Visualizations and Summary Statistics for Each Variable

1. **Predictor Variable (`CONNECTION_activities_hug_p3m`)**:
   - **Visualization**: A **bar chart** displaying the distribution of hugging frequency within the sample.
   - **Purpose**: This chart illustrates how often individuals report engaging in physical affection.
   - **Summary Statistics**: Calculate the proportion of responses within each frequency category. This provides a clear view of the sample’s general behavior concerning hugging.

2. **Outcome Variable (`WELLNESS_self_rated_physical_health`)**:
   - **Visualization**: A **bar chart** displaying the distribution of self-rated health.
   - **Purpose**: To show how individuals rate their physical health within the sample.
   - **Summary Statistics**: Calculate the proportion of individuals within each health rating category, allowing a detailed look at health perceptions across the sample.

## Analysis Method

The **Chi-Squared Test of Independence** will be used as the primary method to assess the association between hugging frequency and self-rated health, as both variables are categorical.

1. **Chi-Squared Test of Independence (Primary Test)**:
   - **Objective**: To determine if there is an association between hugging frequency and self-rated physical health.
   - **Procedure**: Compare the observed frequency distribution of health ratings across each hugging frequency category against the expected distribution under the null hypothesis of independence.
   - **Hypotheses**:
     - **Null Hypothesis (H₀)**: Hugging frequency and self-rated health are independent; health ratings do not vary significantly by hugging frequency.
     - **Alternative Hypothesis (H₁)**: There is an association between hugging frequency and self-rated health.
   - **Interpretation**: A p-value below the significance level (e.g., 0.05) would suggest rejecting the null hypothesis, indicating an association between hugging frequency and health perception. A p-value above the significance level would imply insufficient evidence to reject the null hypothesis.
   - **Assumptions**: The test requires expected cell frequencies to be at least 5. If this assumption is not met, an alternative test will be used.

2. **Fisher’s Exact Test (Secondary Option)**:
   - **Purpose**: Fisher’s Exact Test serves as an alternative when sample sizes are small or expected counts fall below 5 in any cell.
   - **Procedure**: Fisher’s Exact Test calculates an exact p-value based on observed frequencies, providing accurate results without large-sample assumptions.
   - **Interpretation**: A significant p-value from Fisher’s Test would also indicate an association between hugging frequency and health perception, similar to the interpretation of the Chi-Squared Test.

## Hypotheses, Expected Results, and Discussion

### Hypotheses
- **Null Hypothesis (H₀):** There is no association between hugging frequency and self-rated physical health; health ratings are similar across all hugging frequencies.
- **Alternative Hypothesis (H₁):** There is an association between hugging frequency and self-rated physical health; health ratings differ across hugging frequencies.

### Expected Results
- **Test Output**:
   - **P-Value**: A p-value below 0.05 would suggest a statistically significant association between hugging frequency and health perception.
   - **Interpretation**: A significant result would indicate that self-rated health varies across different levels of hugging frequency, potentially linking physical affection to health perceptions.

### Discussion

**Interpretation:** If an association is found, it may suggest that individuals who engage in physical affection more frequently report better self-rated health. This correlation could be due to psychological or social benefits, such as stress reduction or increased social support associated with hugging.

**Limitations:** This study uses self-reported data, which may be influenced by personal biases. The cross-sectional design limits causal inference, capturing data at one point in time without tracking changes over time. Additionally, unobserved confounding factors, such as lifestyle or personality traits, could influence both hugging frequency and health ratings, which could affect the results.

**Implications for Future Research:** To better understand the link between physical affection and health, future studies could use longitudinal designs to explore potential causal effects. Further research could also examine specific mechanisms, like stress reduction or social support, to clarify how physical affection may impact health perceptions.
