### Hello everyone! I know this is not a Tumblr post, but this seemed a bit more adequate for me!

# Code
Below you can find the code that I used as well as its output:

In [None]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi

data=pd.read_csv("nesarc.csv",low_memory=False)

#Data for Total Personal Income in the last 12 months (Ages 20-25) (Income between 0=25000$)
A1 = data[(data['AGE'] >= 20) & (data['AGE'] <= 25) & (data['S1Q10B'] >= 0) & (data['S1Q10B'] <= 7)]
B1 = A1.copy()

recode= {0:0,1:4999,2:7999,3:9999,4:12999,5:14999,6:19999,7:24999}
B1['INCOMERANGE']=B1['S1Q10B'].map(recode)
d1=B1['INCOMERANGE'].value_counts(sort=False)
B1['REGION']=B1['REGION'].apply(lambda x: f"Region_{x}")

#CONVERT TO NUMERIC
B1['INCOMERANGE']=B1['INCOMERANGE'].apply(pd.to_numeric, errors='coerce')

<a id="cell1"></a>
## No Moderator

In [15]:
# using ols function for calculating the F-statistic and associated p value
model1 = smf.ols(formula='INCOMERANGE ~ C(DMAJORDEPSNI12)', data=B1)
results1 = model1.fit()
print (results1.summary())

B2 = B1[['INCOMERANGE', 'DMAJORDEPSNI12']].dropna()

print ('Means for INCOMERANGE by Major Depression status')
m1= B2.groupby('DMAJORDEPSNI12').mean()
print (m1)

print ('Standard Deviations for INCOMERANGE by Major Depression status')
sd1 = B2.groupby('DMAJORDEPSNI12').std()
print (sd1)

                            OLS Regression Results                            
Dep. Variable:            INCOMERANGE   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.3300
Date:                Mon, 13 Nov 2023   Prob (F-statistic):              0.566
Time:                        17:28:07   Log-Likelihood:                -36390.
No. Observations:                3510   AIC:                         7.278e+04
Df Residuals:                    3508   BIC:                         7.280e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
Intercept               1.19

<a id="cell2"></a>
## Moderator = Region 1

In [25]:
sub2=B1[(B1['REGION']=='Region_1')]

print ('association between diet and weight loss for those using Cardio exercise')
model2 = smf.ols(formula='INCOMERANGE ~ C(DMAJORDEPSNI12)', data=sub2).fit()
print (model2.summary())

print ("Means for INCOMERANGE by Major Depression status for Region 1")
m2= sub2.groupby('DMAJORDEPSNI12').mean()["INCOMERANGE"]
print (m2)

print ("Standart Deviation for INCOMERANGE by Major Depression status for Region 1")
sd2= sub2.groupby('DMAJORDEPSNI12').std()["INCOMERANGE"]
print (sd2)

association between diet and weight loss for those using Cardio exercise
                            OLS Regression Results                            
Dep. Variable:            INCOMERANGE   R-squared:                       0.004
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     2.223
Date:                Mon, 13 Nov 2023   Prob (F-statistic):              0.137
Time:                        17:52:30   Log-Likelihood:                -5819.7
No. Observations:                 562   AIC:                         1.164e+04
Df Residuals:                     560   BIC:                         1.165e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------

<a id="cell3"></a>
## Moderator = Region 3

In [26]:
sub3=B1[(B1['REGION']=='Region_3')]

print ('association between diet and weight loss for those using Weights exercise')
model3 = smf.ols(formula='INCOMERANGE ~ C(DMAJORDEPSNI12)', data=sub3).fit()
print (model3.summary())

print ("Means for INCOMERANGE by Major Depression status for Region 3")
m3 = sub3.groupby('DMAJORDEPSNI12').mean()["INCOMERANGE"]
print (m3)

print ("Standart Deviation for INCOMERANGE by Major Depression status for Region 3")
sd3= sub3.groupby('DMAJORDEPSNI12').std()["INCOMERANGE"]
print (sd3)

association between diet and weight loss for those using Weights exercise
                            OLS Regression Results                            
Dep. Variable:            INCOMERANGE   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                   0.06003
Date:                Mon, 13 Nov 2023   Prob (F-statistic):              0.806
Time:                        17:52:37   Log-Likelihood:                -13541.
No. Observations:                1305   AIC:                         2.709e+04
Df Residuals:                    1303   BIC:                         2.710e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------

# Examining the results

In [Part 1](#cell1), we decided to examine the association between Low Income Young Adults (which is our Quantitive Response) and Major Depression in the Last 12 Months between Young Adults (which is our Categorical Explanatory). To do this we performed an Analysis of Variance (ANOVA). The test revealed that, among the low income young adults, those who said that were experiencing a major depression reported lesser income (Mean=11730.31, s.d. ±7593.49) compared to those people who don’t experience major depression (Mean=11961.97, s.d. ±7712.05).

However, for this test the F-statistic is **0.33** and the P-value is **0.566** which is much higher than our alpha (0.05). This means that **we cannot reject** the null Hypothesis (Ho), meaning that there is **no association** between low income young adults and major depression.

**Please remember that since our variable only has 2 levels (True or False), we do not need to perform Post Hoc tests.**

In [Part 2](#cell2) and [Part 3](#cell3), we decided to make use of a *Moderator* to study if these results would be different when taking into account subgroups based on the **Region** the Young Adults lived in.

In Region 1, there was a slight difference between the Income of the Young Adults that experienced major depression (Mean=10438.47, s.d. ±7508.10) and those who did not (Mean=11926.53, s.d. ±7630.89).
However, in Region 3, there was almost no difference between the Income of the Young Adults that experienced major depression (Mean=11658.65, s.d. ±7276.88) and those who did not (Mean=11488.82, s.d. ±7830.73).

But, in conclusion, the F-statistics were **2.22** for Region 1 and **0.060** for Region 3, and the P-Values, even though for Region 1 (**0.14**) was much smaller than for Region 3 (**0.81**), both were still higher than our alpha (0.05). This means that **we cannot reject** the Null Hypothesis (Ho), meaning that there is **no association** between low income young adults and major depression, **even using the Region as a Moderator**.