# Gender Identity vs. Height Dissatisfaction

## Research Question
- Is there a significant difference in height dissatisfaction among individuals of different gender identities?

## Variables
- **Independent Variable: Gender Identity (`DEMO_gender`)**
  - In this case, "Presented but no response" can not be taken into account. Also, the sample amounts of "Non-binary" are too small to analyze. Let's just take the **variable "gender identity" only to be "Men" or "Women"**.
  
- **Dependent Vriable: Height Dissatisfaction (`PSYCH_body_self_image_questionnaire_height_dissatisfaction_score`)**
  - This variable is a bunch of numbers. We can calculate its statistics.
  
  *To visualize we can draw **histogram, box plot and violin plot** for it.*
  
   **Why choose these plot?**
   - **Box Plots**: Show spread and central tendency, highlighting median, interquartile range (IQR), and outliers to facilitate direct gender comparison.
   - **Violin Plots**: Reveal distribution shapes within each gender, helping to identify patterns like bimodal distributions that may not appear in box plots.
   - **Histograms**: Present the overall frequency distribution of dissatisfaction scores, offering insight into general trends.
   
## Analysis and Assumptions

### Data Cleaning
   - Ensure there are no invalid or missing entries in the two key columns. If there are, handle missing values by either dropping them or using imputation methods if appropriate.
   - Check for outliers in height dissatisfaction scores.

### Data Exploration
   - Calculate summary statistics for height dissatisfaction scores within each gender identity category.
   - Visualize the data distribution, using:
     - **Box plots or violin plots** to show score distributions across genders.
     - **Histograms** to inspect overall dissatisfaction score patterns.
     
      **Assumptions for Visualizations**

     - Visualizations like box plots and histograms assume that the data is **representative** of the populations they intend to describe.

     - Height dissatisfaction scores should be on a **comparable scale** across all groups.


### Statistical Analysis

   - **t-tests:** for binary analysis
   
     **Assumptions for t-tests**
   
     - The dissatisfaction scores within each gender identity group should ideally follow a **normal distribution**.
     
     - The variance in dissatisfaction scores should be roughly equal across gender identities.

     - Observations within each group (gender identity) must be **independent** of each other.
     
## Hypothesis and Results

### Hypothesis
- **Null Hypothesis (H₀):** There is no significant difference in height dissatisfaction scores among different gender identities.
- **Alternative Hypothesis (H₁):** Height dissatisfaction scores vary significantly across gender identities.

### Possible Results
- **Rejecting the Null Hypothesis (Support for H₁):** If analysis finds statistically significant differences in height dissatisfaction scores across different gender identities, it would reject the null hypothesis, suggesting that certain gender identities report higher or lower levels of height dissatisfaction.
- **Failing to Reject the Null Hypothesis (Support for H₀):** If analysis shows no statistically significant differences in height dissatisfaction scores among gender identities, it would fail to reject the null hypothesis, suggesting that height dissatisfaction is relatively uniform across different gender identities.

# Age vs. Big Five Traits

## Research Question
- How do Big Five personality traits vary with age?

## Variables
- **Independent Variable: Age** (`DEMO_age`)
   - Represents participants' ages, which we’ll treat as a continuous variable.
- **Dependent Variable: Big Five Traits**:
   - `PSYCH_big_five_inventory_agreeable_score`
   - `PSYCH_big_five_inventory_conscientious_score`
   - `PSYCH_big_five_inventory_extraverted_score`
   - `PSYCH_big_five_inventory_neurotic_score`
   - `PSYCH_big_five_inventory_open_score`
   
   *To visualize we can draw **line chart** for it.*
   
   **Why choose this plot?**
   - **line chart**: Show the trend of variables and show five traits in one chart.
   
## Analysis and Assumptions
 
### Data Cleaning and Exploration
   - **Handle Missing Values**: Check each trait and age column for missing values and decide whether to remove or impute them.
   - **Descriptive Statistics**: Calculate the mean, median, and standard deviation for each Big Five trait to get a sense of the typical scores.

### Visualization
   - **Line Chart**: Plot a line chart with age and five traits.
   
     **Assumptions for Visualization (Line Chart)**
   - **Comparable Trait Scores**: The scores should be on a consistent scale across all age groups to make visual comparisons meaningful.
   - **Sufficient Sample Size per Group**: Each age group should have enough participants to produce reliable averages. If some age groups have very few observations, trends may be less accurate.

### Statistical Analysis
   - **Regression Analysis**: Run a linear regression analysis for each trait with age as the predictor variable to assess the effect of age on each trait’s score.
   - **Polynomial Regression**: If relationships aren’t linear, a polynomial regression might reveal a better fit.
   
     **Assumptions for Statistical Analysis**
   - **Normality of Trait Scores in Age Groups:** Ideally, each age group’s trait scores should follow a normal distribution.
   - **Homoscedasticity:** The variability in trait scores should be similar across age groups.
   - **Linearity:** If analyzing age and traits with linear regression, we assume a roughly linear relationship.
   - **Independence of Observations:** Each participant’s data should be independent, meaning responses from one person don’t influence others.
     
## Hypothesis and Results

### Hypothesis
- **Null Hypothesis (H₀):** There is no significant relationship between age and big five traits.
- **Alternative Hypothesis (H₁):** There is a certain relationship between age and big five traits.

### Possible Results
- **Rejecting** the Null Hypothesis (Support for H₁)
- **Failing to Reject** the Null Hypothesis (Support for H₀) 

*Here is interpretations for possible outcome.*

| Trait           | Positive Trend with Age               | Negative Trend with Age                  | Non-linear Trend                               | No Significant Relationship                   |
|-----------------|--------------------------------------|------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| **Agreeableness** | Increasing empathy and tolerance with age | Increased assertiveness over time         | Not commonly expected                         | Stable personality aspect across lifespan    |
| **Conscientiousness** | Increased responsibility and organization | Decreasing focus on structure            | Peaks in middle age, declines in older years  | Stable across age                            |
| **Extraversion** | More socially engaged in older age   | Less social engagement with age           | Not commonly expected                         | Stable personality trait                      |
| **Neuroticism** | Increased emotional reactivity       | Increased stability and resilience       | Not commonly expected                         | Stable emotional stability                    |
| **Openness**    | Lifelong openness to experiences     | Decline in curiosity over time           | High in youth/mid-life, decline in later years| Consistent trait unaffected by aging          |


# Conscientiousness vs.  Life Satisfaction

## Research Question
- How does conscientiousness influence life satisfaction among individuals?

## Variables
- **Independent Variable: Conscientious** (`PSYCH_big_five_inventory_conscientious_score`)
- **Dependent Variable: Life Satisfaction**(`WELLNESS_life_satisfaction`)


  *To visualize we can draw **Scatter Plot with Regression Line** and **Box Plot** for it.*

  **Why choose these plot?**
   - **Scatter Plot with Regression Line**: A scatter plot to visualize the relationship between conscientiousness and life satisfaction, with a fitted regression line to show trends.
   -  **Box Plot**: A box plot to compare life satisfaction across different levels of conscientiousness.
   
## Analysis and Assumptions

### Data Cleaning and Exploration
   - **Handle Missing Values**: Check each trait and age column for missing values and decide whether to remove or impute them.
   - **Descriptive Statistics**: Calculate the mean, median, and standard deviation for each Big Five trait to get a sense of the typical scores.

### Visualization
   - **Scatter Plot with Regression Line** and **Box Plot** 
   

### Statistical Analysis
   - **Correlation Analysis**: To determine the strength and direction of the relationship between conscientiousness and life satisfaction.

     **Assumptions**:
     - **Linearity**: The relationship between conscientiousness and life satisfaction should be linear.
     - **Independence**: Observations should be independent of each other.
     - **Homoscedasticity**: The variance of residuals should be constant across levels of conscientiousness.
     - **Normality**: The residuals of the model should be normally distributed.
     
## Hypothesis and Results

### Hypotheses

- **Null Hypothesis (H0)**: There is no significant relationship between conscientiousness and life satisfaction
- **Alternative Hypothesis (H1)**: There is a significant positive relationship between conscientiousness and life satisfaction

### Possible Results and Interpretations

- **Positive Correlation (correlation coefficient > 0.5)**: indicates that higher levels of conscientiousness are associated with higher life satisfaction.
   
- **Negative Correlation (correlation coefficient < 0)**: suggest that as conscientiousness increases, life satisfaction decreases, which could indicate that overly conscientious individuals may experience stress or dissatisfaction due to their high standards.

- **No Significant Correlation (correlation coefficient close to zero)**: suggests no relationship, indicating that factors other than conscientiousness may be more influential in determining life satisfaction.