# STA 130 Individual Course Project Proposal by Yiheng Wu

# 1. Research Question

Is there a significant relationship between life satisfaction and loneliness levels among individuals? Specifically, I want to determine if individuals with higher life satisfaction scores have lower loneliness levels, as measured by various loneliness sub-scales. This question will help us understand the potential link between overall well-being and perceived loneliness, and explore how improving one’s life satisfaction might reduce loneliness.

# 2. Variables and Exploration Plan

Variables:


1. WELLNESS_life_satisfaction: A continuous variable representing the overall life satisfaction of the individuals.

2. LONELY_ucla_loneliness_scale_score: A continuous variable representing the loneliness level of individuals based on the UCLA Loneliness Scale.

3. LONELY_dejong_social_loneliness_sub_scale_score: A continuous variable representing social loneliness as measured by the De Jong Gierveld scale.

To explore these variables, I will perform summary statistics and visualizations to better understand the relationships. I plan to use:

1. Scatter Plot: To illustrate the relationship between life satisfaction and loneliness levels, I will create scatter plots for each pair of variables. This will help visually depict if there is a negative correlation between life satisfaction and loneliness, which would be represented by a downward trend in the scatter plot.

2. Histogram: I will use histograms to show the distribution of life satisfaction scores and loneliness scores separately, allowing us to see the overall spread, shape, and central tendencies of each variable. This will help in understanding whether the data is skewed and the typical values for each variable.

3. Correlation Matrix: I will use a correlation matrix to calculate the Pearson correlation coefficients between the different variables. This will help determine the strength and direction of the relationships among the variables, providing quantitative support for visual insights from the scatter plots.

# 3. Analysis Plan

To answer the research question, I plan to use the following analysis methods, with detailed steps provided for clarity:

1. One-sample Hypothesis Test

Objective: To determine if the average life satisfaction score significantly differs from a specified value (e.g., the midpoint of the scale, indicating average satisfaction).

Method: I will use bootstrapping to create a confidence interval for the mean of the WELLNESS_life_satisfaction variable.

Steps:

1. Calculate the sample mean of WELLNESS_life_satisfaction.

2. Generate 1,000 bootstrap samples from the original dataset, each of the same size as the original sample, using replacement.

3. Compute the mean of each bootstrap sample and generate a distribution of bootstrap means.

4. Calculate the 95% confidence interval for the bootstrap distribution.

5. Compare the confidence interval to the hypothesized value.

Code:

In [None]:
import numpy as np
import pandas as pd

# Bootstrap Confidence Interval for Life Satisfaction
np.random.seed(42)
life_satisfaction = data['WELLNESS_life_satisfaction'].dropna()
sample_mean = life_satisfaction.mean()
boot_means = []

for _ in range(1000):
    boot_sample = life_satisfaction.sample(len(life_satisfaction), replace=True)
    boot_means.append(boot_sample.mean())

lower_ci, upper_ci = np.percentile(boot_means, [2.5, 97.5])
print(f"95% Confidence Interval: ({lower_ci}, {upper_ci})")

2. Two-sample Hypothesis Test

Objective: To determine if there is a significant difference in loneliness scores between individuals with high and low life satisfaction.

Method: I will create two groups based on life satisfaction scores (e.g., split at the median value) and conduct a two-sample t-test to assess differences in LONELY_ucla_loneliness_scale_score.

Steps:

1. Split the sample into two groups: high life satisfaction (above median) and low life satisfaction (below median).

2. Calculate the mean loneliness score for each group.

3. Use a two-sample t-test to compare the means of the two groups.

Code:

In [None]:
from scipy.stats import ttest_ind

# Split data into high and low life satisfaction groups
median_satisfaction = life_satisfaction.median()
high_satisfaction = data[data['WELLNESS_life_satisfaction'] > median_satisfaction]['LONELY_ucla_loneliness_scale_score'].dropna()
low_satisfaction = data[data['WELLNESS_life_satisfaction'] <= median_satisfaction]['LONELY_ucla_loneliness_scale_score'].dropna()

# Two-sample t-test
t_stat, p_value = ttest_ind(high_satisfaction, low_satisfaction)
print(f"t-statistic: {t_stat}, p-value: {p_value}")

3. Simple Linear Regression

Objective: To explore the relationship between WELLNESS_life_satisfaction and LONELY_dejong_social_loneliness_sub_scale_score and determine if higher life satisfaction is associated with lower social loneliness.

Method: I will fit a simple linear regression model to predict loneliness using life satisfaction as the predictor variable.

Steps:

1. Define WELLNESS_life_satisfaction as the independent variable (X) and LONELY_dejong_social_loneliness_sub_scale_score as the dependent variable (Y).

2. Fit a linear regression model and interpret the coefficient to determine if the relationship is positive or negative.

Code:

In [None]:
import statsmodels.api as sm

# Prepare data for linear regression
X = data['WELLNESS_life_satisfaction'].dropna()
Y = data['LONELY_dejong_social_loneliness_sub_scale_score'].dropna()
X = sm.add_constant(X)  # Add constant term for intercept

# Fit the regression model
model = sm.OLS(Y, X).fit()
print(model.summary())

# 4. Hypotheses and Expected Results

Null Hypothesis (H0): There is no significant relationship between life satisfaction and loneliness levels.

Alternative Hypothesis (H1): There is a significant negative relationship between life satisfaction and loneliness levels.

I expect to find that individuals with higher life satisfaction have lower loneliness scores, indicating a negative correlation between these variables. The expected results are:

1. A 95% confidence interval that does not include the hypothesized value would indicate that the mean life satisfaction is significantly different from the assumed value.

2. A significant p-value from the two-sample t-test would suggest that loneliness differs significantly between high and low life satisfaction groups.

3. A negative coefficient in the linear regression would indicate that higher life satisfaction predicts lower loneliness scores.

These results would highlight the importance of life satisfaction in reducing feelings of loneliness and improving overall well-being.

# 5. Group Preferences

I have recently switched to this TUT session and am not yet familiar with the classmates around me. My only preference for forming a group is that my teammates are native Mandarin speakers, as it will facilitate smoother communication for us. Thank you very much for your consideration.