## Research Question1
How does the frequency of connecting with friends (`CONNECTION_activities_visited_friends_p3m`) affect feelings of being down (`WELLNESS_phq_feeling_down`)?

## Variables

### Independent Variable
- **Variable**: `CONNECTION_activities_visited_friends_p3m`
- **Type**: Ordinal categorical
- **Possible Values**: Monthly, less than monthly, weekly, a few times a month, not in the past three months

### Dependent Variable
- **Variable**: `WELLNESS_phq_feeling_down`
- **Type**: Ordinal categorical
- **Possible Values**: Nearly every day, several days, more than half the days, slightly, not at all

# Visualization Choice: Heatmap

### Heatmap Details
- **Axes**:
  - X-axis: Categories of connection frequency (`CONNECTION_activities_visited_friends_p3m`)
  - Y-axis: Categories of feeling down (`WELLNESS_phq_feeling_down`)
  
- **Color Intensity**:
  - Reflects count or proportion of respondents for each pairing. A darker or more intense color indicates a higher count or proportion of respondents who experience that specific combination.

### Interpretation of the Heatmap
- **Patterns**: By examining the color gradient, patterns in the data can be identified.
  - If cells with higher connection frequencies (e.g., "weekly") align with lower frequencies of feeling down (e.g., "not at all"), this may suggest a relationship between social connection and emotional well-being.
  - Conversely, if cells with low connection frequencies (e.g., "not in the past three months") align with higher frequencies of feeling down (e.g., "nearly every day"), it may indicate an association between low social interaction and negative feelings.

# Bootstrap Analysis

## Purpose of Bootstrap Analysis
Bootstrap analysis is useful for:
- Small sample sizes that don’t meet assumptions of parametric tests.
- Non-normal data distributions.
- Estimating confidence intervals without relying on parametric assumptions.

## Steps for Bootstrap Analysis

1. **Define the Statistic of Interest**:
   - Example: Mean difference or proportion difference in feeling down across connection frequencies, or use a chi-square statistic to test for association.

2. **Generate Bootstrap Samples**:
   - Randomly sample with replacement from the original data to create a large number of bootstrap samples (e.g., 1,000 to 10,000 samples).
   - Each sample size is the same as the original dataset, with replacement.

3. **Calculate the Statistic for Each Bootstrap Sample**:
   - For each bootstrap sample, compute the statistic (e.g., mean difference, proportion difference, or chi-square statistic).

4. **Construct Confidence Intervals**:
   - **Percentile Method**: Sort the bootstrap statistics and select the 2.5th and 97.5th percentiles for a 95% confidence interval.
   - **Bias-Corrected and Accelerated (BCa) Interval**: Adjusts for potential bias in the bootstrap distribution for a more accurate interval.

5. **Hypothesis Testing (Optional)**:
   - Compare the bootstrap distribution to the hypothesized value.
   - Calculate the p-value as the proportion of bootstrap statistics that fall beyond the observed value in the original data.

## Assumptions for Bootstrap Analysis
- **Independence**: Each observation in the dataset should be independent.
- **Representative Sample**: The original sample should be representative of the population.

Bootstrap analysis does not require assumptions of normality or equal variances, making it suitable for various data types.

# Interpretation of Results

### Confidence Interval
The confidence interval from the bootstrap distribution indicates the likely range of the true parameter. If the interval for the mean difference does not include zero, it suggests a significant difference.

### P-Value (if Hypothesis Testing)
- The p-value is estimated as the proportion of bootstrap samples exceeding the observed statistic.
- A p-value below 0.05 indicates a statistically significant association between the variables.

# Hypotheses and Analysis Method

## Hypotheses
1. **Null Hypothesis (H0)**: No significant association exists between connection frequency and feelings of being down.
2. **Alternative Hypothesis (H₁)**: There is a significant association between connection frequency and feelings of being down.

## Analysis Method
- **Sample**: Use the provided dataset to create 1,000 bootstrap samples.
- **Statistic of Interest**: Chi-square statistic to test for independence between the two ordinal categorical variables.
- **Bootstrap Sampling**: Calculate the chi-square statistic for each bootstrap sample.

## Confidence Interval
- **95% Confidence Interval**: Calculate the chi-square statistic for each bootstrap sample, then find the 2.5th and 97.5th percentiles for a 95% confidence interval.
- **Interpretation of CI**: If the interval does not include the chi-square statistic under the null hypothesis, it suggests a significant association.


## Research Question2
How does the degree of fatigue and burnout (`WELLNESS_malach_pines_burnout_measure_disappointed`) affect how lonely a person feels (`LONELY_direct(one week)`)?

## Variables

### Independent Variable
- **Variable**: `WELLNESS_malach_pines_burnout_measure_disappointed`
- **Type**: Ordinal categorical
- **Possible Values**: Always, very often, often, sometimes, almost never
- **Visualization**:
  - A bar plot showing the distribution of responses across burnout levels.
  - If most responses are in higher categories (e.g., "Always," "Very often"), it suggests a high level of burnout in the group.
  - If responses are in lower categories (e.g., "Almost never"), burnout may not be prevalent.

### Dependent Variable
- **Variable**: `LONELY_direct(one week)`
- **Type**: Ordinal categorical
- **Possible Values**: All of the time (5-7 days), Occasionally or moderate amount of time (3-4 days), Some or a little of the time (1-2 days), Rarely (less than 1 day), None of the time (0 days)
- **Visualization**:
  - A bar plot illustrating the frequency of loneliness over the past week.
  - High frequency in "5-7 days" suggests that individuals feel lonely most of the week, indicating high loneliness.
  - Low frequency (e.g., "Some or a little of the time") suggests relatively low loneliness.

# Analysis Method: Ordinal Logistic Regression

Since both `WELLNESS_malach_pines_burnout_measure_disappointed` and `LONELY_direct(one week)` are ordinal categorical variables, linear regression may not be directly applicable. However, transforming the variables into numerical values can allow for linear regression, though **Ordinal Logistic Regression** is more suitable for ordinal categorical data.

## Steps for Ordinal Logistic Regression Analysis

1. **Encode Ordinal Categories**: 
   - Convert both burnout and loneliness categories to numerical values (e.g., "Always" = 5, "Very often" = 4, "Often" = 3, etc.).
   
2. **Run Regression**:
   - Use burnout as the independent variable and loneliness as the dependent variable in an ordinal logistic regression model.
   
3. **Check Assumptions**:
   - **Proportional Odds**: Assumes that the relationship between each level of loneliness and burnout is consistent. This can be tested using the "Test of Parallel Lines" in statistical software. 
   - If this assumption is violated, use a Generalized Ordinal Logistic Regression model, which does not require proportional odds.

# Assumptions

## Proportional Odds Assumption
- **Explanation**: Proportional odds assumption means that the odds of moving from one level of loneliness to the next are proportional across all levels of burnout.
- **How to Check**: Use the Test of Parallel Lines. If violated, use Generalized Ordinal Logistic Regression.

## Independence of Observations
- Each participant’s response should be independent, with no repeated measurements or dependencies. This ensures unique burnout and loneliness levels for each person.

## No Multicollinearity (if additional predictors are introduced)
- If other predictors are added to the model, they should not be highly correlated with each other. Since this model only includes burnout, multicollinearity is not relevant here.

# Hypothesis Testing

## Hypotheses

1. **Null Hypothesis (H0)**: There is no significant association between burnout (`WELLNESS_malach_pines_burnout_measure_disappointed`) and loneliness (`LONELY_direct`).
2. **Alternative Hypothesis (H₁)**: There is a significant association between burnout and loneliness.

# Expected Results and Interpretation

If the ordinal logistic regression shows a significant relationship, we would reject the null hypothesis and conclude that burnout level significantly affects loneliness.

### Interpretation of Results
- If the p-value is less than 0.05, it indicates a statistically significant association between burnout and loneliness.
- The direction and strength of the relationship can be interpreted from the coefficients in the ordinal logistic regression model.


# Research Question and Variables

## Research Question
How does the frequency of video chat with others (`CONNECTION_activities_video_chat_p3m`) affect the stability of emotional anxiety (`PSYCH_ten_item_personality_inventory_emotional_stability_anxious_1r`)?

## Variables

### Independent Variable
- **Variable**: `CONNECTION_activities_video_chat_p3m`
- **Type**: Ordinal categorical
- **Possible Values**: Daily or almost daily, a few times a week, weekly, less than monthly, not in the past three months
- **Visualization**:
  - A bar plot showing the distribution of responses across each video chat frequency level.

### Dependent Variable
- **Variable**: `PSYCH_ten_item_personality_inventory_emotional_stability_anxious_1r`
- **Type**: Ordinal categorical
- **Possible Values**: Agree strongly, agree moderately, agree a little, neither agree nor disagree, disagree moderately, disagree strongly
- **Visualization**:
  - A bar plot showing the frequency of responses for each level of emotional stability, providing insights into how participants rate their emotional stability.

# Analysis: Chi-Square Test of Independence

To investigate whether the frequency of video chatting is associated with emotional stability levels, we will perform a **Chi-Square test of independence**.

## Steps for Chi-Square Analysis

1. **Construct Contingency Table**:
   - Create a contingency table that counts occurrences of each combination of video chat frequency and emotional stability level.

2. **Conduct Chi-Square Test**:
   - Calculate the Chi-Square statistic to test for independence between video chat frequency and emotional stability.
   - The Chi-Square test will compare observed counts in each cell to expected counts under the assumption of independence.

3. **Expected Counts**:
   - Expected counts are calculated based on the assumption of no association between video chat frequency and emotional stability. The Chi-Square statistic reflects the overall difference between observed and expected counts.

4. **Interpret Results**:
   - **Chi-Square Statistic**: Reflects the degree of difference between observed and expected counts.
   - **p-value**: Indicates the likelihood of observing this data if there were no association between the variables.

5. **Decision Rule**:
   - If the p-value is less than the significance level (e.g., p < 0.05), reject the null hypothesis, suggesting a significant association between video chat frequency and emotional stability.
   - If the p-value is not significant, there is no statistically meaningful association.

## Analyzing Specific Associations
If the Chi-Square test is significant, examine the contingency table to identify specific patterns:
   - For example, individuals who video chat frequently (e.g., "daily") may report higher emotional stability (e.g., "agree strongly").
   - Conversely, those who video chat infrequently (e.g., "not in the past three months") may report lower emotional stability.

This analysis can provide insights into whether frequent social connections through video chat contribute to emotional stability.

# Assumptions

## Assumptions for Chi-Square Test of Independence

1. **Independence of Observations**:
   - Each participant’s response should be independent. This means there should be no repeated measures or dependencies within the data; each observation should represent a unique individual’s video chat frequency and emotional stability level.

2. **Expected Frequency**:
   - The Chi-Square test assumes that the expected frequency in each cell of the contingency table is at least 5. If any cell has an expected frequency below 5, the test may not be valid, and a different test, such as Fisher’s Exact Test, might be more appropriate.

3. **Sufficient Sample Size**:
   - The test relies on a sufficiently large sample size to approximate the Chi-Square distribution. Smaller samples may not meet this assumption, reducing the reliability of the test results.

4. **Random Sampling**:
   - Observations should ideally be randomly sampled from the population to ensure generalizability. Non-random samples may lead to biased results that don’t accurately reflect the broader population.

## Assumptions for Bootstrap Confidence Interval

1. **Representative Sample for Resampling**:
   - The original sample should be representative of the population to ensure that the bootstrap samples provide meaningful estimates.

2. **Large Number of Resamples**:
   - A sufficient number of bootstrap samples (e.g., 1,000 or more) should be generated to ensure stability and accuracy in estimating the confidence interval.

Adding these assumptions clarifies the conditions under which the Chi-Square test and bootstrap analysis are valid, ensuring the reliability of the analysis. Be sure to include this section as a separate Markdown cell in your Jupyter Notebook for clarity.


# Hypothesis Testing

## Hypotheses

1. **Null Hypothesis (H0)**: There is no association between video chat frequency and emotional stability.
2. **Alternative Hypothesis (H₁)**: There is an association between video chat frequency and emotional stability.

## Bootstrap for Confidence Interval

1. **Calculate Original Chi-Square**:
   - Create a contingency table for video chat frequency vs. emotional stability and compute the Chi-Square statistic on the original sample.

2. **Generate Bootstrap Samples**:
   - Resample the data with replacement to create 1,000 bootstrap samples and calculate the Chi-Square statistic for each.

3. **Construct 95% Confidence Interval**:
   - Use the 2.5th and 97.5th percentiles of the bootstrap Chi-Square distribution to construct a 95% confidence interval.

4. **Expected Results**:
   - If the interval excludes zero, conclude that a significant association exists, suggesting that video chat frequency impacts emotional stability.
   - If the interval includes zero, conclude no significant association, implying that video chat frequency likely does not affect emotional stability.
