# Statistical Analysis Project Proposal

## Analysis 1: Impact of Social Connection on Mental Health During COVID-19

### Research Question
How does the frequency of social interactions (both virtual and in-person) correlate with reported mental health outcomes during the COVID-19 pandemic?

### Variables and Exploration Plan
- **Outcome Variable**: `WELLNESS_self_rated_mental_health`
  - Description: Self-reported mental health rating on an ordinal scale
  - Visualization: Histogram
    - Why: Shows the overall distribution of mental health ratings
    - Will reveal if ratings are normally distributed or skewed
    - Helps identify potential ceiling or floor effects
  - Additional Visualization: Box plot
    - Why: Shows key summary statistics (median, quartiles, outliers)
    - Useful for identifying unusual patterns or outliers


- **Predictor Variables**: `CONNECTION_activities_*`
  - **CONNECTION_activities_video_chat_p3m**
    - Description: Frequency of video chat interactions in past 3 months
    - Visualization: Bar chart
      - Why: Shows distribution across frequency categories
      - Easily compare proportions of different usage levels

  - **CONNECTION_activities_visited_friends_p3m**
    - Description: Frequency of in-person friend visits in past 3 months
    - Visualization: Bar chart 
      - Why: Can compare against video chat frequencies
      - Shows relative preference for virtual vs in-person contact

  - **CONNECTION_activities_phone_p3m**
    - Description: Frequency of phone calls in past 3 months
    - Visualization: bar chart
      - Why: Shows distribution while allowing comparison with other communication methods
      - Helps identify preferred communication modes

  - **Composite Social Interaction Score**
    - Description: Combined measure of all social interaction types
    - Visualization: Scatterplot against mental health
      - Why: Shows potential linear relationship
      - Can add trend line to visualize correlation
      - Helps identify any non-linear patterns
    
### Analysis Method
Will use Simple Linear Regression to analyze the relationship between the composite social interaction score and mental health ratings. 

### Hypothesis and Expected Results
Hypothesis: Higher levels of social interaction (even virtual) will be associated with better mental health ratings.
- Expect to see a positive slope in the regression line
- Anticipate stronger associations for in-person interactions compared to virtual ones
- Results will help understand the protective role of social connections during pandemic isolation

## Analysis 2: Burnout and Work-From-Home Transition

### Research Question
Is there a significant difference in burnout levels between those who transitioned to working from home and those who didn't during the pandemic?

### Variables and Exploration Plan
- **Outcome Variables**: Burnout measures (`WELLNESS_malach_pines_burnout_measure_*`)
  - **WELLNESS_malach_pines_burnout_measure_tired**
    - Description: Frequency of feeling tired
    - Visualization: Histogram for each group (WFH vs non-WFH)
      - Why: Shows distribution shape in each group
      - Can visually assess if distributions look similar
      - Better for seeing frequency patterns
    - Additional: Box plot
      - Why: Clearly shows median differences
      - Shows spread of data in each group
      - Identifies potential outliers

  - **WELLNESS_malach_pines_burnout_measure_hopeless**
    - Description: Frequency of feeling hopeless
    - Visualization: Side-by-side box plots
      - Why: Compare distributions between groups
      - Shows median, quartiles, and range
      - Easy to see group differences

  - **WELLNESS_malach_pines_burnout_measure_depressed**
    - Description: Frequency of feeling depressed
    - Visualization: Histogram with overlay
      - Why: Direct comparison of distributions
      - Shows shape differences between groups
      - Reveals any skewness or unusual patterns

  - **Composite Burnout Score**
    - Description: Combined measure of all burnout indicators
    - Visualization: Multiple histograms
      - Why: Check distribution shape
      - Compare between WFH and non-WFH groups
      - Important for bootstrap analysis
    - Additional: Box plot
      - Why: Clear visual comparison between groups
      - Shows key summary statistics
      - Helps identify any unusual patterns

- **Grouping Variable**: `WORK_shift_from_home`
  - Description: Whether employee transitioned to WFH
  - Visualization: Bar chart
    - Why: Show proportions in each category
    - Can be used to check group sizes
    - Important for bootstrap analysis planning
  - Additional: Stacked bar with burnout levels
    - Why: Shows burnout distribution within each group
    - Reveals patterns across work arrangements
    - Helps visualize relationship between variables
### Analysis Method
Will use Bootstrap hypothesis testing to compare mean burnout scores between work from home and non work from home groups.


### Hypothesis and Expected Results
Hypothesis: Workers who shifted to working from home will show different burnout levels compared to those who didn't.
- Results will provide insights into workplace policy implications
- May inform future hybrid work arrangements

## Analysis 3: Predictors of Loneliness During COVID-19

### Research Question
What demographic and lifestyle factors are most strongly associated with reported loneliness during the pandemic?

### Variables and Exploration Plan
- **Outcome Variable**: UCLA Loneliness Scale measures (`LONELY_ucla_loneliness_scale_*`)
  - **LONELY_ucla_loneliness_scale_companionship**
    - Description: Frequency of lacking companionship
    - Visualization: Histogram with density curve
      - Why: Shows distribution shape
      - Can compare to normal curve
      - Identifies potential skewness

  - **LONELY_ucla_loneliness_scale_isolated**
    - Description: Frequency of feeling isolated
    - Visualization: Box plot by age group
      - Why: Shows relationship with age
      - Identifies age-related patterns
      - Shows outliers within groups

  - **LONELY_ucla_loneliness_scale_left_out**
    - Description: Frequency of feeling left out
    - Visualization: Heat map
      - Why: Show relationship with multiple predictors
      - Reveals interaction patterns
      - Identifies clusters

- **Predictor Variables**:
  - **DEMO_age**
    - Description: Age of respondent
    - Visualization: Histogram
      - Why: Shows age distribution
      - Identifies any sampling bias
      - Helps determine age groupings
    - Additional: Scatterplot against loneliness
      - Why: Shows potential age-related trends
      - Reveals non-linear patterns

  - **GEO_housing_live_with_*** (Multiple binary variables)
    - Description: Living situation indicators
    - Visualization: Grouped bar chart
      - Why: Compare loneliness across living situations
      - Shows relative frequencies
      - Identifies important patterns

  - **DEMO_relationship_status**
    - Description: Current relationship status
    - Visualization: Box plots of loneliness by status
      - Why: Compare distributions across groups
      - Shows relationship impact
      - Identifies unusual patterns

### Analysis Method
Will use bootstrapping to construct confidence intervals for the relationship between each predictor and loneliness scores.

### Hypothesis and Expected Results
Hypothesis: Age, living alone, and relationship status will be significant predictors of loneliness.
- Expect stronger associations with living situation than demographic factors
- Results will help identify high-risk groups for social isolation
- Findings could inform targeted intervention strategies