### Analysis1

#### Research Question:
Is there an association between the perception of fair treatment at work and life satisfaction?

#### Variables:
- **WORK_feel_fair**: Ordinal variable from the 2021 cross-sectional survey, assessing the extent to which respondents feel treated fairly at work, with the following categorical responses:
  - **Categories**: "Completely," "Very Well," "Very Little," "Somewhat," "Presented but no response," "Not at all"
  - **Note**: In our analysis, we will exclude the "Presented but no response" category to focus only on responses with substantive answers.

- **WELLNESS_life_satisfaction**: Ordinal variable from the 2021 cross-sectional survey, measuring life satisfaction on a numerical scale from 1 to 10.

#### Visualizations:

To show the relationship between **WORK_feel_fair** and **WELLNESS_life_satisfaction**, we will use a **violin plot**. This type of plot combines a box plot and a kernel density plot, allowing us to visualize both the distribution and density of **WELLNESS_life_satisfaction** scores across different levels of **WORK_feel_fair** responses.

- **Violin Plot**: This plot will display **WELLNESS_life_satisfaction** (y-axis) across the categories of **WORK_feel_fair** (x-axis). Each "violin" represents the distribution of life satisfaction scores for a specific response category of fair treatment at work (e.g., "Completely," "Very Well," etc.).
- The width of each violin indicates the density of responses, showing where scores are most concentrated for each level of **WORK_feel_fair**.
  
This visualization will provide an intuitive assessment of how life satisfaction scores vary with perceptions of fair treatment at work, allowing us to see potential associations between the two variables directly.

#### Analysis:
Using data from the 2021 cross-sectional survey, we will construct a **contingency table** to examine the relationship between **WORK_feel_fair** and **WELLNESS_life_satisfaction**. The contingency table will display the frequencies of each combination of categories, with **WORK_feel_fair** (excluding the "Presented but no response" category) on one axis and **WELLNESS_life_satisfaction** (1 to 10 scale) on the other. This will result in a 5x10 table.

To determine if there is an association between these variables, we will perform a **Fisher's Exact Test** on the contingency table. Due to the large size of the table, we will use a **Monte Carlo simulation** to estimate the p-value. This approach involves generating a large number of simulated contingency tables under the null hypothesis of no association and calculating the proportion of tables that are as or more extreme than the observed table. We will compare the resulting p-value to our pre-determined alpha significance level of 0.05.

This analysis will allow us to evaluate whether the observed frequencies of life satisfaction scores differ significantly across levels of perceived fair treatment at work, providing insight into the potential association between these variables.

#### Assumptions:
- Independence of observations
- Mutually exclusive categories
- Random sampling
- Independent observations
- Fixed row and column totals

#### Hypotheses:
- **Null Hypothesis (H₀)**: There is no association between feeling treated fairly at work and life satisfaction.
- **Alternative Hypothesis (H₁)**: There is an association between feeling treated fairly at work and life satisfaction.

#### Possible Results:
- **Statistically significant association**: p-value ≤ 0.05 (Reject Null Hypothesis)
- **No statistically significant association**: p-value > 0.05 (Fail to reject Null Hypothesis)

#### Relevance to Question:
The results of this analysis will help determine if perceptions of fair treatment at work are associated with overall life satisfaction among survey respondents. If a significant association is found, it would suggest that how fairly individuals feel treated at work could impact their overall life satisfaction.

### Analysis2

#### Research Question:
Is there an association between household income and life satisfaction?

#### Variables:
- **DEMO_household_income**: Ordinal variable from the 2021 cross-sectional survey, representing the respondent's estimated total household income before taxes in the past year. The income is categorized into distinct income brackets.
  
- **WELLNESS_life_satisfaction**: Ordinal variable from the 2021 cross-sectional survey, measuring life satisfaction on a numerical scale from 1 to 10.

#### Visualizations:

To illustrate the distribution and mean life satisfaction for each income bracket, we will use **error bar plots** based on bootstrap sampling.

- **Error Bar Plot**: This plot will display the **predicted mean life satisfaction** (y-axis) for each **income bracket** (x-axis), along with **95% confidence intervals** for each income category.
- The mean values and confidence intervals will be estimated through bootstrapping, providing robust estimates of the central tendency and variation for each income bracket.

#### Analysis:
To assess the relationship between household income and life satisfaction, we will use a **bootstrap resampling method** to estimate the mean life satisfaction for each income bracket:

1. **Bootstrap Resampling**: For each income bracket, we will take a large number of bootstrap samples (e.g., 10,000 samples) of life satisfaction scores with replacement. For each sample, we calculate the mean life satisfaction.
2. **Confidence Interval Calculation**: Using the bootstrap sample means, we calculate a 95% confidence interval for the mean life satisfaction score for each income bracket.
3. **Comparison Across Brackets**: By comparing the confidence intervals across income brackets, we can infer whether there are statistically significant differences in mean life satisfaction between income levels. Non-overlapping confidence intervals between income categories would suggest a statistically significant difference in life satisfaction between those categories.

This method allows us to make predictions about the mean life satisfaction score for each income bracket, providing insight into how life satisfaction varies across income levels.

#### Assumptions:
- Independence of observations within each income bracket
- Each bootstrap sample is representative of the population within each income bracket

#### Hypotheses:
- **Null Hypothesis (H₀)**: There is no difference in mean life satisfaction scores between income brackets.
- **Alternative Hypothesis (H₁)**: There is a difference in mean life satisfaction scores between income brackets.

#### Possible Results:
- **Statistically significant differences**: Non-overlapping confidence intervals (suggesting different means across income brackets)
- **No statistically significant differences**: Overlapping confidence intervals (suggesting similar means across income brackets)

#### Relevance to Question:
The results of this analysis will help determine if household income is associated with overall life satisfaction among survey respondents. By comparing the predicted mean life satisfaction for each income level, we can assess whether life satisfaction is consistently higher in certain income brackets, which may imply that household income has an impact on individuals' life satisfaction.

### Analysis3

#### Research Question:
Is there an association between ethnicity and the number of hours worked per week?

#### Variables:
- **DEMO_ethnicity**: Categorical variable from the survey, indicating the respondent’s ethnicity with the following possible responses:
  - **Categories**: "White," "Indigenous," "Latin American," "Japanese," "African, Caribbean, or Black," "Southeast Asian (e.g., Vietnamese, Cambodian, Laotian, Thai, etc.)," "Filipino," "South Asian (e.g., East Indian, Pakistani, Sri Lankan, etc.)," "Chinese," "Arab," "Korean," "West Asian (e.g., Iranian, Afghan, etc.)," "Presented but no response," and "None of the above."
  - **Note**: In the analysis, we will exclude the "Presented but no response" and "None of the above" categories to focus on meaningful ethnicity responses.

- **WORK_hours_per_week**: Numerical variable from the survey, indicating the average number of hours worked per week.

#### Visualizations:

To show the relationship between **DEMO_ethnicity** and **WORK_hours_per_week**, we will use a **violin plot**. This plot will help visualize the distribution and density of **WORK_hours_per_week** across different ethnic groups, making it easier to see how working hours vary by ethnicity.

- **Violin Plot**: This plot will display **WORK_hours_per_week** (y-axis) across categories of **DEMO_ethnicity** (x-axis). Each "violin" represents the distribution of weekly working hours for a specific ethnicity group, with the width indicating the density of responses within that range of hours.
- We will exclude the "Presented but no response" and "None of the above" categories from the plot to focus only on meaningful ethnicity responses.

### Analysis

Using data from the survey, we will analyze the association between **DEMO_ethnicity** and **WORK_hours_per_week** by comparing the mean working hours for each ethnicity category using **bootstrap resampling**. This approach allows us to estimate the distribution of the mean working hours for each ethnicity and construct confidence intervals around these means.

#### Bootstrap Method:
1. **Bootstrap Resampling**: For each ethnicity group, we will perform a bootstrap procedure by repeatedly resampling (with replacement) from the **WORK_hours_per_week** data within that group. For each resample, we will calculate the mean working hours.
2. **Confidence Intervals**: Using the bootstrap samples, we will compute a confidence interval (e.g., 95%) for the mean working hours of each ethnicity group.
3. **Exclusions**: We will exclude the "Presented but no response" and "None of the above" categories from the analysis, as they do not provide meaningful information on ethnicity.

#### Interpretation:
Once we have confidence intervals for the mean working hours of each ethnicity, we will compare these intervals:
- **No Overlap**: If the confidence intervals for different ethnicity groups do not overlap significantly, it would indicate a likely association between ethnicity and working hours, as it suggests meaningful differences in average working hours across ethnic groups.
- **Overlap**: If the confidence intervals overlap substantially, it would suggest no significant differences in working hours by ethnicity.

#### Assumptions:
- Independence of observations within each ethnicity group.
- The bootstrap sample sizes are large enough to approximate the distribution of means accurately.

#### Hypotheses:
- **Null Hypothesis (H₀)**: There is no difference in mean working hours between ethnicity groups.
- **Alternative Hypothesis (H₁)**: There are differences in mean working hours between ethnicity groups.

#### Possible Results:
- **Significant Difference**: Non-overlapping confidence intervals for different ethnicities (suggesting an association between ethnicity and working hours).
- **No Significant Difference**: Overlapping confidence intervals (suggesting no association between ethnicity and working hours).

#### Relevance to Question:
This analysis will help us understand if there are significant differences in working hours across ethnic groups. If non-overlapping confidence intervals are observed, it would indicate that ethnicity is associated with different average working hours.