# Topic1: Impact of exercise on stress levels across different age groups

## Research question and what might be interesting to study

Does regular physical activity reduce self-reported stress levels, and how does this relationship vary across different age groups?

Understanding how different age groups benefit from physical activity in terms of stress reduction could guide tailored health recommendations for various age demographics, improving overall stress management strategies.


## Population parameter

Mean difference in stress levels across physical activity levels and among different age groups.

## Variables, the discription of variables, why I chose them and visualizations

I'm interested in exploring how physical activity levels and age groups are associated with stress levels.

1.BELIEFS_rank_physical_activity

Physical Activity Level: This variable categorizes individuals based on their frequency and intensity of exercise, which is critical because physical activity is often linked to stress reduction.

Visualization:Histogram to show the frequency of different activity levels across all participants.

2.BELIEFS_rank_stress

Stress Level: A continuous or ordinal scale rating of perceived stress. Estimating the average stress level across different activity levels helps understand the efficacy of exercise as a stress management tool.

Visualization:Box Plot to display the distribution of stress levels within each age group.

3.DEMO_age

Age Group: Age can influence both stress response and exercise habits, making it essential to control or examine this variable separately.

Visualization:Bar Plot to visualize the average stress level within each activity category.

Reason for Choosing These Variables:
Understanding the relationship between exercise and stress across age groups can help tailor public health recommendations for stress management.


## Brief justification of each type of visualization

Bar Plot:This visualization is chosen to depict the average stress level within each physical activity category. Bar plots are particularly effective for comparing categorical data—here, the activity levels—against a numerical variable—the average stress score. This helps in visually assessing the direct impact of activity levels on stress.

Box Plot: Used to display the distribution of stress levels within each age group, box plots are ideal because they show the median, quartiles, and potential outliers within the data. This makes it easier to identify variations and trends in stress levels across different age demographics, which is crucial for understanding how age might influence stress response to physical activity.

Histogram: This is employed to show the frequency of different activity levels across all participants. Histograms are useful for understanding the distribution of a single variable and can reveal the skewness or symmetry of the data, helping to contextualize the typical activity levels of the study population.

## Simplified analysis plan

1.Bootstrap Analysis:

Data Collection: Collect data on exercise frequency, intensity, and stress levels from a diverse age group sample.

Bootstrap Procedure: Perform bootstrap resampling to estimate subgroup mean stress levels by age and exercise habits, using 10,000 samples to create distributions.

Comparison: Analyze the bootstrap distributions to determine differences in stress levels across age groups and exercise variables.

2.Confidence Interval Estimation

Data Collection: Gather detailed data on exercise and stress levels across different age groups.

Statistical Modeling: Calculate mean stress levels for each age group and compute 95% confidence intervals using appropriate distribution methods.

Interpretation: Evaluate the confidence intervals to gauge the true mean stress levels for different exercise intensities and age groups.

3.Linear regression

Modeling Strategy: Model stress levels(dependent variable) as a function of exercise frequency and intensity(two independent variables), using linear regression.

Inclusion of Covariates: Include age as a covariate to adjust for its effect.

Analysis: Interpret regression coefficients to understand the impact of increased exercise on stress levels, considering age adjustments.

## Assumptions

1.Bootstrap Analysis:

• The sample is representative of the population.

• Resampled data adequately reflect the distribution of the original dataset.

• Independence within observations.

2.Confidence Interval Estimation

• The data are normally distributed within each group or the sample size is large enough for the Central Limit Theorem to apply.

• Samples are independent and identically distributed.

3.Linear Regression Analysis

• Linear relationship between the independent variables and the dependent variable.
• Homoscedasticity (constant variance of residuals across the range of values of an independent variable).
• Normal distribution of residuals.
• Independence of observations.

## Hypothesis about the possible results

Regular physical activity significantly reduces self-reported stress levels, and the strength of this relationship varies across different age groups.

## Potential results

It is hypothesized that there will be a negative correlation between regular physical activity and self-reported stress levels, indicating that as physical activity increases, stress levels decrease.The hypothesis suggests that the impact of physical activity on stress reduction varies by age group, with middle-aged adults potentially experiencing a more pronounced decrease in stress levels compared to younger or older adults, possibly due to varying stress factors or physical resilience.

## Relevance

Public Health Implications: Highlighting how physical activity reduces stress in older adults can shape public health efforts to promote exercise across age groups. Programs can be developed with strategies tailored to different ages, particularly emphasizing the benefits for older people.

Resource Allocation: Insights into the link between physical activity and stress reduction can guide healthcare providers and policymakers in focusing resources on promoting exercise as a stress management tool, especially for those more susceptible to stress-related health problems.

Personalized Recommendations: Fitness and wellness programs can be tailored to different age demographics, ensuring they meet physical needs and maximize mental health benefits.

# Topic2: The impact of age on income levels

## Research question and what might be interesting to study

How does age impact income levels?

Investigating how income changes with age can reveal typical career lifecycle trends, such as income peaks and plateaus. This can help identify the critical periods when workers are likely to experience income growth and when they might face stagnation or decline.

## Population Parameter 

Mean income level across the age groups

## Variables, the discription of variables, why I chose them and visualizations

I'm interested in exploring the association between age and income level to understand how income varies across different age groups.

1.DEMO_age

Age: Measured continuously or categorized into brackets (e.g., 20-30 years, 31-40 years, 41 -50 years, etc.).

2.DEMO_household_income

Income level: Quantified into categories such as low, medium, and high based on annual income brackets.

Visualizations:

Bar Plot: To compare average income levels across different age brackets, providing insights into how income tends to increase, plateau, or decrease with age.

Histogram: To display the distribution of age within the dataset, helping to understand the demographic structure of the sample and to visualze the prevalence of different age groups in the workforce.

## Brief justification of each type of visualization

Histogram: lt gives an overview of the age distribution within the sample, offering insights into the predominant age groups and potential workforce imbalances or gaps, this can inform human resource strategies and broader demographic studies.

Bar plot: lt provides a straightforward and visually impactful way to demonstrate age-related income trends to stakeholders; which can be crucial for discussions about wage policies, retirement planning, and economic forecasting.

## Simplified analysis plan

1.Linear Regression: To determine if there is a significant relationship between age and income, considering age as a continuous variable to explore linear or potential nonlinear trends (e.g., quadratic effects where income might peak at a certain age before declining).

2.General hypothesis testing:

Objective: To test if changes in age are associated with statistically significant changes in income levels. 

Method: Use the results in the linear regression analysis. Specifically, test the null hypothesis that the coefficient of the age variable is equal to zero against the alternative hypothesis that it is not. This can be done by examining the p-values for the coefficients in the regression model output.

Hypothesis Statement:

Null Hypothesis (H0): The coefficient(s) for age in the income prediction model is zero, indicating no effect.

Alternative Hypothesis(H1): The coefficient(s) for age is not zero, indicating a significant effect of age on income.


## Assumptions

1.Linear regression 

Linearity: The relationship between age and income is linear or well-modeled by included polynomial terms for non-linear trends.

Independence of Residuals: Residuals are independent across observations, with no data clustering that could impact results.

Homoscedasticity: Residual variance is constant across all age levels, without patterns of variability.

Normality of Residuals: Regression residuals are normally distributed, ensuring valid hypothesis testing for coefficients.

2.General Hypothesis Testing Using Regression

1.Model Specification Correctness: The regression model correctly includes the necessary predictors (age, age squared) and does not omit any variable that could cause omitted variable bias.
    
2.Normality of Errors: Similar to the regression assumptions, for the t-tests of the coefficients to be valid, the error terms in the regression model must be normally distributed, particularly when the sample size is small.

## Hypothesis about the possible results

Income levels increase with age up to a certain point, reflecting career development and peak earning years, and then potentially plateau or decline as individuals approach retirement.

## Potential results

The hypothesis states that income levels begin low for younger workers, increase with career advancement, peak during middle age, and then decline or stabilize as workers approach retirement due to fewer work hours or retirement.

## Relevance

1.Understanding economic lifecycle:

Career progression insight: Validating the hypothesis reveals how income varies with career stages, enhancing personal and retirement planning.

Policy formulation: Data informs policymakers to create age-talored economic policies, such as wage protections and retraining for aging workers

2.Social and economic equity:

Addressing disparities: Discovering age-related income differences could lead to targeted actions to reduce inequalities and support older workers

3.Retirement and social security planning:

Informing public policy: Insights into income patterns over time can help develop realistic and supportive social security systems.

Individual planning: Enables more accurate retirement planning based on anticipated income changes due to aging.

# Topic3: Examining the Impact of Social Media on Social Connectivity and Mental Well-being

## Reseach question and what might be interesting to study

Does the frequency and nature of social media usage influence perceptions of social connectivity, and how does this relationship affect mental well-being?

## Population parameter

Correlation coefficient between social media usage and mental well-being 

## Variables, the discription of variables, why I chose them and visualizations

1.LIFESTYLE_time_use_balance_media

Social Media Usage Frequency: Measures how often individuals use social media daily, categorized as low, medium, or high.

2.ORIGINAL_social_life_evaluations_less_connected_than_preferred

Perceived Social Connectivity: Assessed through survey questions that evaluate how connected individuals feel to others through their social media interactions

3.WELLNESS_self_rated_mental_health

Mental Well-being: Evaluated using psychological scales that measure aspects such as happiness, anxiety, and stress levels.

Visualizations:

Histogram:To show the distribution of social media usage frequency among participants.

Box Plot:To compare the distribution of mental well-being scores across different types of social media usage.

Bar Plot:To display average perceived social connectivity scores by frequency and type of social media usage.

## Brief justification of each type of visualization

Histogram: Helps identify common usage patterns and the prevalence of high vs. low usage, which is essential for analyzing how widespread certain behaviors are within the sample.

Box plot: Effective for showing medians, ranges, and outliers in well-being scores, highlighting how different engagement types might affect mental health.

Bar plot:Allows for a clear comparison of how different usage intensities and styles impact perceptions of social connectivity, illustrating potential trends in how social media can either bolster or hinder feelings of connectedness.

##  Simplified analysis plan

1.Hypothesis Testing

•Data Collection: Gather data on frequency and type of social media use, social connectivity perceptions, and mental well-being using psychological scales.

•Hypothesis Formulation:

•Null Hypothesis (H0): There is no correlation between social media usage and social connectivity perceptions.

•Alternative Hypothesis (H1): There is a correlation between social media usage and social connectivity perceptions.

•Statistical Test: Conduct simulations to compare synthetic data generated under the null hypothesis with the observed data to assess if the observed correlation is statistically significant.

2.Confidence Interval Estimation

Data Collection: Use the same dataset as hypothesis testing.

Statistical Modeling: Compute mean and standard deviation of mental well-being scores by social connectivity level.

Confidence Interval Calculation: Construct 95% confidence intervals for each mean.

3.Linear Regression

Data Collection: Gather data on social media use frequency, type, social connectivity scores, and mental well-being.

Model Building: Create a linear regression model with mental well-being as the outcome and social media usage as predictors, including interactions.

Analysis: Predict mental well-being based on social media patterns, controlling for age, gender, and socioeconomic status.

## Assumptions

1.Hypothesis testing

Independence: Each data point is independent, ensuring no biases affect the simulated data under the null hypothesis.

Model Specification: The simulation model accurately mirrors the actual data’s structure, assuming no existing correlation.

Sufficient Sample Size: The sample is large enough to effectively simulate distributions and detect significant effects if present.

2.Confidence interval estimate 

Normal Distribution of Sample Means: The sample means are normally distributed under the central limit theorem, which is crucial for large samples when constructing confidence intervals.

Random Sampling: Data are randomly drawn, making each sample representative of the population and ensuring unbiased estimates.

3.Linear regression
Linearity: There is a proportional linear relationship between mental well-being and social media usage variables.

No Multicollinearity: The predictors, social media usage frequency and nature, are not highly correlated, ensuring stable estimates.

Homoscedasticity: The variance of residuals is constant across all predictor levels, critical for unbiased estimates.

Normal Distribution of Errors: Errors are normally distributed, facilitating reliable hypothesis testing of regression coefficients.

Independence of Observations: Each observation is independent, vital for credible regression analysis.

## Hypothesis about the possible results

The frequency and nature of social media usage are hypothesized to impact perceptions of social connectivity and mental well-being positively. Active social media engagement, like commenting and sharing, is expected to boost social connectivity and improve mental health more effectively than passive browsing.

## Potential results

Active social media users who engage in posting, commenting, and sharing tend to report higher social connectivity than those who mainly browse passively. Higher social connectivity correlates with improved mental well-being, emphasizing the importance of interactive social media use. Conversely, passive users may experience lower social connectivity and mental well-being due to less interaction.

## Relevance

1.Guidance for Mental Health Interventions:
If confirmed, the hypothesis could lead mental health professionals to use social media as a tool in therapeutic settings, encouraging active participation in online communities to enhance social connectivity and mental health.

2.Policy and design recommendations: The findings could guide the design of social media platforms to boost social connectivity and mental well-being, and help policymakers develop guidelines and programs for healthier social media use.

## Teammate


I would like to be a teammate with Bowen Zhao.