Analysis 1: The Impact of Demographic Factors on Greeting Strangers
Statement of the Research Question:
How do demographic factors—such as age, gender, and province—affect the frequency of greeting strangers?

This question aims to explore whether certain demographic characteristics correlate with the likelihood of greeting strangers. Understanding these associations can reveal whether specific social behaviors vary by age, gender, or region, which could provide insights into cultural or regional differences in social habits. By examining these factors, the analysis seeks to determine if demographic variables can predict how frequently people engage in friendly, spontaneous social interactions.

Variables and Exploration Plan

    Dependent Variable: Frequency of greeting strangers (CONNECTION_activities_greeted_stranger_p3m). This variable likely represents ordinal data, indicating how frequently individuals greet strangers (such as "monthly," "weekly," "daily"). Since it's ordinal, certain analyses (like ordinal regression) will be more appropriate to capture the relative ranking.

    Independent Variables:
    Age (DEMO_age): As a continuous variable, age is used to see if people in different age brackets have varying social habits. We will examine this by categorizing age groups (such as young adults, middle-aged, seniors) if necessary.
    Gender (DEMO_gender): This nominal categorical variable (such as Male, Female, Non-binary) will allow us to examine if greeting strangers varies across genders.
    Province (GEO_province): This nominal categorical variable indicates the region within the country. Cultural and regional differences might affect social behaviors, so this variable will help assess any geographical patterns.

Visualization:
    Bar Plots for gender and province will display the distribution of greeting frequency across each category. These plots will help identify any obvious variations between groups.
    Box Plots for age can reveal how greeting frequency changes across age groups by displaying the distribution, median, and potential outliers, which are helpful for continuous data like age.

Summary Statistics: Using to compute means and counts within each demographic category will allow us to quickly observe central tendencies and variations, which could hint at associations before formal analysis.(df.groupby)

Analysis Plan and Assumptions

    Analysis Method:
    An ordinal logistic regression (proportional odds model) will be used since greeting frequency is ordinal. This model will help determine if demographic variables significantly predict the likelihood of greeting strangers at different frequencies. We will run the model as .smf.ols("greeting_frequency ~ age + gender + province", data=df).fit()

    Assumptions:
    Independence of Observations: Each observation (such as each person’s response) should be independent.
    Proportional Odds: Assumes that the relationship between each pair of outcome levels is consistent (such as the effect of age on moving from "monthly" to "weekly" is the same as moving from "weekly" to "monthly").
    Ordinal Scale Validity: Greeting frequency levels should have a meaningful order (such as "never" < "sometimes" < "often").
    If assumptions are violated (such as if proportional odds don’t hold), we could consider other approaches like separate logistic regressions for each pair of levels.

Hypothesis:

Age: Older individuals may greet strangers more frequently compared to younger participants, given that older adults often have more ingrained social habits.
Gender: There may be differences in greeting behavior based on gender, with women potentially greeting strangers more frequently due to sociocultural norms emphasizing social connectedness.
Province: We expect regional variations, with some provinces showing higher or lower frequencies of greeting behavior, reflecting cultural and social norms.

Relevance: If significant associations are found, this analysis would shed light on how demographic characteristics influence social behavior. It could provide insights into cultural differences or age-related shifts in social habits, which may be useful for understanding and fostering community engagement.

Analysis 2: The Influence of Personality Traits on Greeting Behavior
Statement of the Research Question:
Are personality traits, specifically agreeableness and extraversion, associated with a higher frequency of greeting strangers?

This question investigates whether individuals who score higher on certain personality traits, like agreeableness and extraversion, are more likely to greet strangers. Since these traits are often linked to sociability and friendliness, this analysis seeks to identify if people with higher scores in these areas are more inclined to engage in social behaviors, such as greeting unfamiliar people. If a strong association exists, it could indicate that personality traits play a significant role in shaping one’s likelihood to participate in social exchanges.

Variables and Exploration Plan

    Dependent Variable: Frequency of greeting strangers (CONNECTION_activities_greeted_stranger_p3m). We assume this is ordinal, as it represents the likelihood or frequency of greeting behavior.

    Independent Variables:
    Agreeableness (PSYCH_big_five_inventory_agreeable_score): This is a continuous variable that measures the respondent’s tendency toward kindness, empathy, and cooperativeness, traits commonly linked to social behavior.
    Extraversion (PSYCH_big_five_inventory_extraverted_score): Also continuous, this variable captures the respondent's tendency toward sociability, energy, and enthusiasm, which we hypothesize to be positively correlated with greeting strangers.

Visualization:
    Scatter Plots will be used to visualize potential linear relationships between greeting frequency and each personality trait, with each dot representing an individual's score. This will help assess whether higher agreeableness or extraversion scores correspond with increased greeting frequency.

Correlation Analysis: Using on the personality traits and greeting frequency will help measure the strength and direction of relationships. We may use a heatmap to better visualize these correlations.(df.corr())

Analysis Plan and Assumptions

    Analysis Method:
    We plan to use multiple linear regression to examine if agreeableness and extraversion predict greeting frequency. If greeting frequency is ordinal, ordinal logistic regression would be more appropriate to respect the ordered nature of the outcome variable. The model will be structured as .smf.ols("greeting_frequency ~ agreeableness + extraversion", data=df).fit()

    Assumptions:
    Linearity: If using linear regression, we assume a linear relationship between agreeableness/extraversion and greeting frequency.
    Normality of Residuals: For linear regression, residuals should ideally be normally distributed. If this assumption doesn’t hold, we could use transformations or non-parametric alternatives.
    No Multicollinearity: The two predictors (agreeableness and extraversion) should not be highly correlated with each other, as this could inflate standard errors.
    If assumptions are not met, such as if residuals are not normally distributed, we might consider rank-based or non-parametric approaches.

Hypothesis:

Agreeableness: We expect a positive relationship, where higher agreeableness scores are associated with more frequent greetings, as agreeable individuals are typically cooperative and friendly.
Extraversion: We also hypothesize a positive relationship, with higher extraversion scores correlating with increased greeting frequency, as extraverts are naturally sociable and enjoy interacting with others.

Relevance: Understanding these associations could help identify personality traits that influence social engagement. This knowledge can inform interventions aimed at improving social connectivity or understanding personality-driven social behaviors.

Analysis 3: The Relationship Between Greeting Strangers and Psychological Well-Being
Statement of the Research Question:
Does the frequency of greeting strangers correlate with lower neuroticism scores, potentially reflecting a relationship between casual social interactions and emotional stability?

This question explores whether greeting strangers—a simple form of social engagement—relates to lower levels of neuroticism, a trait often associated with emotional instability and anxiety. By analyzing this potential correlation, the study aims to determine if individuals who frequently greet strangers also report lower neuroticism scores, suggesting that everyday social interactions might be associated with better psychological well-being. This could provide valuable insights into how social behavior impacts emotional health.

Variables and Exploration Plan
Dependent Variable: Neuroticism score (PSYCH_big_five_inventory_neurotic_score). This is a continuous variable measuring emotional instability and anxiety levels, relevant for studying the relationship with social engagement.

Independent Variable: Frequency of greeting strangers (CONNECTION_activities_greeted_stranger_p3m). This is assumed to be ordinal, capturing how often an individual greets strangers.

Visualization:
    Box Plots or Violin Plots will compare neuroticism scores across different levels of greeting frequency, showing the spread, median, and any potential outliers within each greeting frequency level.

Summary Statistics: Mean and median neuroticism scores within each greeting frequency group will help quantify differences and inform potential trends.
Spearman Correlation: If a linear relationship isn’t expected, Spearman’s rank correlation can quantify the association between neuroticism and greeting frequency without assuming linearity.

Analysis Plan and Assumptions

    Analysis Method:
    Given the ordinal nature of the greeting frequency and continuous neuroticism, a Spearman correlation would quantify the association without assuming linearity or normality. Alternatively, an ordinal logistic regression could assess if greeting frequency levels predict neuroticism scores, accounting for ordinal levels.

    Assumptions:
    Ordinal Validity: Assumes the greeting frequency levels are meaningfully ordered.
    No Strong Outliers: Outliers in neuroticism scores might unduly influence correlation calculations, so we’ll check for and possibly address extreme values.
    If assumptions are violated (such as if neuroticism scores are heavily skewed), we may apply rank transformations to better meet the assumptions of non-parametric tests.

Hypothesis:

We expect that individuals who frequently greet strangers will have lower neuroticism scores, indicating better emotional stability. Casual social interactions may have a buffering effect on stress and anxiety, contributing to psychological well-being.

Relevance: If a significant negative correlation is found, it would suggest that simple social interactions, like greeting strangers, could play a role in enhancing emotional health. This could have practical implications for promoting well-being through increased social engagement.