Notebook used to analyze Change My View (Tan et al. 2016) data. The following hypotheses will be tested:
    H1: Are arguments which use moral language (as detected using either paradigm) more persuasive than those that do not?
    H2: Other things being equal, a moralistic argument will be more likely to result in a delta if the argument is presented quickly after the initial post.
    H3: A positive relationship exists between the (Jaccard) similarity of OP and a responder in their MFT affinities and the persuasiveness of that responder.

In [57]:
import os
import pickle
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [58]:
os.chdir("/Users/amruch/Documents/Research/~ Projects/Cornell Research/SDL/A1 data/")
data_file = "preproc_data.p"
data = pickle.load(open(data_file,'rb'))

DATASET DESCRIPTION:

In [59]:
print("(rows, cols):", data.shape)
print("cell size:", data.size)
print("feature types:", '\n', data.dtypes)
print('\n')
print("Head values' content:", '\n', data.head())

(rows, cols): (1239691, 57)
cell size: 70662387
feature types: 
 comment_id                       object
comment_auth                     object
comment_depth                     int64
op_id                            object
op_content                       object
op_auth                          object
delta                             int64
delta_thread                      int64
time_diff                       float64
comment_harm_virtue               int64
op_harm_virtue                    int64
comment_harm_virtue_bin           int64
op_harm_virtue_bin                int64
comment_MFT_usage                 int64
comment_harm_vice                 int64
op_harm_vice                      int64
comment_harm_vice_bin             int64
op_harm_vice_bin                  int64
comment_fairness_virtue           int64
op_fairness_virtue                int64
comment_fairness_virtue_bin       int64
op_fairness_virtue_bin            int64
comment_fairness_vice             int64
op_fairness_vic

INITIALIZATIONS: OUTCOME LIST, LOCAL VARIABLE LISTS, DATAFRAME SUBSETS

In [62]:
outcomes = [] # Initialize outcomes list to store model outcomes

In [63]:
MFT_domains_bin = ['comment_MFT_usage_bin', 'op_harm_virtue_bin',
                   'op_harm_vice_bin', 'comment_harm_virtue_bin',
                   'comment_harm_vice_bin', 'op_fairness_virtue_bin',
                   'op_fairness_vice_bin', 'comment_fairness_virtue_bin',
                   'comment_fairness_vice_bin', 'op_ingroup_virtue_bin',
                   'op_ingroup_vice_bin', 'comment_ingroup_virtue_bin',
                   'comment_ingroup_vice_bin', 'op_authority_virtue_bin',
                   'op_authority_vice_bin', 'comment_authority_virtue_bin',
                   'comment_authority_vice_bin', 'op_purity_virtue_bin',
                   'op_purity_vice_bin', 'comment_purity_virtue_bin',
                   'comment_purity_vice_bin']

MFT_domains_n = ['comment_MFT_usage', 'op_harm_virtue',
                 'op_harm_vice', 'comment_harm_virtue',
                 'comment_harm_vice', 'op_fairness_virtue',
                 'op_fairness_vice', 'comment_fairness_virtue',
                 'comment_fairness_vice', 'op_ingroup_virtue',
                 'op_ingroup_vice', 'comment_ingroup_virtue',
                 'comment_ingroup_vice', 'op_authority_virtue',
                 'op_authority_vice', 'comment_authority_virtue',
                 'comment_authority_vice', 'op_purity_virtue',
                 'op_purity_vice', 'comment_purity_virtue',
                 'comment_purity_vice']

data['MFT_any_n'] = 0
for d in MFT_domains_n:
    data['MFT_any_n'] += data[d]

data['MFT_any_bin'] = 0
data.loc[data['MFT_any_n'] > 0, 'MFT_any_bin'] = 1

# print(data.MFT_any_bin.head())
# print(data.MFT_any_n.head())

MFT_domains_all = MFT_domains_bin + MFT_domains_n + ['MFT_any_bin', 'MFT_any_n']
# print(MFT_domains_all)

In [64]:
data_subset_deltas = data.query('delta_thread == 1') # Subset of delta-awarded threads
data_subset_moral = data.query('MFT_any_bin == 1') # Subset of moral threads/posts

# data_subset_deltas.head()
# data_subset_moral.head()

DATA SUMMARIZATION:

In [69]:
data.describe()

Unnamed: 0,comment_depth,delta,delta_thread,time_diff,comment_harm_virtue,op_harm_virtue,comment_harm_virtue_bin,op_harm_virtue_bin,comment_MFT_usage,comment_harm_vice,...,op_purity_vice_bin,comment_morality_general,op_morality_general,comment_morality_general_bin,op_morality_general_bin,comment_MFT_usage_bin,jaccard_sim_split,jaccard_sim_same,MFT_any_n,MFT_any_bin
count,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,...,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0,1239691.0
mean,3.900733,0.008349661,0.3905328,78104.54,0.2440665,0.791415,0.1451442,0.3699454,3.659122,0.5250534,...,0.5393981,0.4433976,1.415558,0.2500583,0.5609019,0.7288945,0.4774898,0.3526817,18.23669,0.9943559
std,2.533883,0.09099424,0.48787,536993.8,0.7934,1.581287,0.3522463,0.4827898,5.653354,1.429614,...,0.4984456,1.080878,2.26887,0.4330465,0.4962773,0.4445306,0.1834957,0.2234096,15.73771,0.07491526
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,0.0,0.0,9462.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.2,8.0,1.0
50%,3.0,0.0,0.0,25354.0,0.0,0.0,0.0,0.0,2.0,0.0,...,1.0,0.0,1.0,0.0,1.0,1.0,0.4666667,0.3333333,14.0,1.0
75%,5.0,0.0,1.0,54667.5,0.0,1.0,0.0,1.0,5.0,0.0,...,1.0,1.0,2.0,1.0,1.0,1.0,0.5714286,0.5,23.0,1.0
max,10.0,1.0,1.0,34481400.0,33.0,42.0,1.0,1.0,169.0,83.0,...,1.0,46.0,63.0,1.0,1.0,1.0,1.0,1.0,411.0,1.0


In [61]:
print(data['delta_thread'].value_counts())
print(data['delta'].value_counts())

0    755551
1    484140
Name: delta_thread, dtype: int64
0    1229340
1      10351
Name: delta, dtype: int64


HYPOTHESIS 1 TESTS: Are arguments which use moral language (as detected using either paradigm) more persuasive than those that do not?

In [65]:
print('FULL DATA ANALYSES:')
for d in MFT_domains_all:
    print('MFT DOMAIN: ' + d)    
    print('ASSOCIATION WITH DELTA THREAD DATA (DELTA_THREAD):')
    # print(pd.crosstab(data.delta_thread, data[d]).apply(lambda r: r/r.sum(), axis=1))
    try:
        logit = smf.glm(formula = 'delta_thread ~ data[d]',
                        data = data,
                        family = sm.families.Binomial()).fit()
        print(logit.summary())
        print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
        outcomes.append(['H1', 'full data', 'delta_thread', d, logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
        print('\n', '\n')
    except:
        outcomes.append(['H1', 'full data', 'delta_thread', d, 'error', 'error', 'error'])
        pass
    
    
    print('ASSOCIATION WITH DELTA POST DATA (DELTA):')    
    # print(pd.crosstab(data.delta, data[d]).apply(lambda r: r/r.sum(), axis=1))
    try:
        logit = smf.glm(formula = 'delta_thread ~ data[d]',
                        data = data,
                        family = sm.families.Binomial()).fit()
        print(logit.summary())
        print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
        outcomes.append(['H1', 'full data', 'delta', d, logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
        print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
        print('\n', '\n', '\n', '\n')
    except:
        outcomes.append(['H1', 'full data', 'delta', d, 'error', 'error', 'error'])
        pass
    
    
print('DATA SUBSET ANALYSES (DELTA-AWARDED SUBSET):')
for d in MFT_domains_all:
    # There are no delta_thread models here because the delta-awarded subset is perfectly collinear
    print('MFT DOMAIN: ' + d)    
    print('ASSOCIATION WITH DELTA POST DATA (DELTA):')    
    # print(pd.crosstab(data_subset_deltas.delta, data_subset_deltas[d]).apply(lambda r: r/r.sum(), axis=1))
    try:
        logit = smf.glm(formula = 'delta ~ data_subset_deltas[d]',
                        data = data_subset_deltas,
                        family = sm.families.Binomial()).fit()
        print(logit.summary())
        print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
        outcomes.append(['H1', 'delta subset', 'delta', d, logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
        print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
        print('\n', '\n', '\n', '\n')
    except:
        outcomes.append(['H1', 'delta subset', 'delta', d, 'error', 'error', 'error'])
        pass

FULL DATA ANALYSES:
MFT DOMAIN: comment_MFT_usage_bin
ASSOCIATION WITH DELTA THREAD DATA (DELTA_THREAD):
                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2924e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6585e+06
Time:                        03:20:49   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4025     

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2919e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6584e+06
Time:                        03:21:00   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4322      0.002   -217.367      0.000      -0.436      -0.428
data[d]       -0.0895      0.005    -16.993      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2906e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6581e+06
Time:                        03:21:09   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4644      0.002   -229.863      0.000      -0.468      -0.460
data[d]        0.1151      0.005     23.446      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2860e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6572e+06
Time:                        03:21:19   Pearson chi2:                 1.24e+06
No. Iterations:                     5                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.6484      0.006   -114.631      0.000      -0.660      -0.637
data[d]        0.2282      0.006     38.151      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2930e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6586e+06
Time:                        03:21:29   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4410      0.002   -232.308      0.000      -0.445      -0.437
data[d]       -0.0684      0.008     -8.779      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2904e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6581e+06
Time:                        03:21:38   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4186      0.002   -195.740      0.000      -0.423      -0.414
data[d]       -0.1013      0.004    -24.096      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2908e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6582e+06
Time:                        03:21:47   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4904      0.003   -179.838      0.000      -0.496      -0.485
data[d]        0.0836      0.004     22.615      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2907e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6581e+06
Time:                        03:21:56   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4172      0.002   -189.691      0.000      -0.422      -0.413
data[d]       -0.0077      0.000    -22.972      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2925e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6585e+06
Time:                        03:22:05   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4377      0.002   -227.198      0.000      -0.442      -0.434
data[d]       -0.0304      0.002    -12.795      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2905e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6581e+06
Time:                        03:22:14   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4593      0.002   -237.231      0.000      -0.463      -0.455
data[d]        0.0473      0.002     23.933      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2870e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6574e+06
Time:                        03:22:23   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5103      0.003   -196.023      0.000      -0.515      -0.505
data[d]        0.0190      0.001     35.653      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2932e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6586e+06
Time:                        03:22:32   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4429      0.002   -236.181      0.000      -0.447      -0.439
data[d]       -0.0239      0.004     -6.118      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2915e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6583e+06
Time:                        03:22:42   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4310      0.002   -217.467      0.000      -0.435      -0.427
data[d]       -0.0260      0.001    -18.892      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2917e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6583e+06
Time:                        03:22:51   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4676      0.002   -211.238      0.000      -0.472      -0.463
data[d]        0.0191      0.001     18.378      0.0

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2934e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6587e+06
Time:                        03:23:00   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4592      0.025    -18.712      0.000      -0.507      -0.411
data[d]        0.0142      0.025      0.579      0.5

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -50035.
Date:                Fri, 08 Sep 2017   Deviance:                   1.0007e+05
Time:                        03:23:07   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.7819      0.015   -255.716      0.000      -3.811      -3.753
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -49580.
Date:                Fri, 08 Sep 2017   Deviance:                       99161.
Time:                        03:23:11   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.9697      0.012   -343.908      0.000      -3.992      -3.947
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -49730.
Date:                Fri, 08 Sep 2017   Deviance:                       99460.
Time:                        03:23:15   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.8960      0.011   -367.613      0.000      -3.917      -3.875
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -50041.
Date:                Fri, 08 Sep 2017   Deviance:                   1.0008e+05
Time:                        03:23:19   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.8167      0.011   -358.504      0.000      -3.838      -3.796
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -50029.
Date:                Fri, 08 Sep 2017   Deviance:                   1.0006e+05
Time:                        03:23:23   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.7979      0.011   -342.428      0.000      -3.820      -3.776
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -50026.
Date:                Fri, 08 Sep 2017   Deviance:                   1.0005e+05
Time:                        03:23:27   Pearson chi2:                 4.85e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.8048      0.010   -364.370      0.000      -3.825      -3.784
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -48362.
Date:                Fri, 08 Sep 2017   Deviance:                       96724.
Time:                        03:23:30   Pearson chi2:                 4.59e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -4.1024      0.012   -352.291      0.000      -4.125      -4.080
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -49921.
Date:                Fri, 08 Sep 2017   Deviance:                       99842.
Time:                        03:23:34   Pearson chi2:                 4.82e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -3.8488      0.010   -380.074      0.000      -3.869      -3.829
data_subset_deltas[

                 Generalized Linear Model Regression Results                  
Dep. Variable:                  delta   No. Observations:               484140
Model:                            GLM   Df Residuals:                   484138
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -50034.
Date:                Fri, 08 Sep 2017   Deviance:                   1.0007e+05
Time:                        03:23:38   Pearson chi2:                 4.84e+05
No. Iterations:                     7                                         
                            coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                -4.4920      0.184    -24.467      0.000      -4.852      -4.132
data_subset_deltas[

HYPOTHESIS 2 TESTS: A positive relationship exists between the (Jaccard) similarity of OP and a responder in their MFT affinities and the persuasiveness of that responder.

In [66]:
print('FULL DATA ANALYSES:')
try:
    logit = smf.glm(formula = 'delta_thread ~ jaccard_sim_same',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'full data', 'delta_thread', 'jaccard_sim_same', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'full data', 'delta_thread', 'jaccard_sim_same', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta_thread ~ jaccard_sim_split',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'full data', 'delta_thread', 'jaccard_sim_split', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'full data', 'delta_thread', 'jaccard_sim_split', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_same',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'full data', 'delta', 'jaccard_sim_same', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'full data', 'delta', 'jaccard_sim_same', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_split',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'full data', 'delta', 'jaccard_sim_split', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H2', 'full data', 'delta', 'jaccard_sim_split', 'error', 'error', 'error'])
    pass


print('DATA SUBSET ANALYSES (DELTA-AWARDED SUBSET):')
# Again, no delta_thread prediction models are here due to their collinearity
try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_same',
                    data = data_subset_deltas,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'delta subset', 'delta', 'jaccard_sim_same', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'delta subset', 'delta', 'jaccard_sim_same', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_split',
                    data = data_subset_deltas,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'delta subset', 'delta', 'jaccard_sim_split', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H2', 'delta subset', 'delta', 'jaccard_sim_split', 'error', 'error', 'error'])
    pass


print('DATA SUBSET ANALYSES (MORAL SUBSET):')
try:
    logit = smf.glm(formula = 'delta_thread ~ jaccard_sim_same',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'moral subset', 'delta_thread', 'jaccard_sim_same', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'moral subset', 'delta_thread', 'jaccard_sim_same', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta_thread ~ jaccard_sim_split',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'moral subset', 'delta_thread', 'jaccard_sim_split', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'moral subset', 'delta_thread', 'jaccard_sim_split', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_same',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'moral subset', 'delta', 'jaccard_sim_same', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H2', 'moral subset', 'delta', 'jaccard_sim_same', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ jaccard_sim_split',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H2', 'moral subset', 'delta', 'jaccard_sim_split', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H2', 'moral subset', 'delta', 'jaccard_sim_split', 'error', 'error', 'error'])
    pass

FULL DATA ANALYSES:
                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2894e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6579e+06
Time:                        03:23:41   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept           -0.3631      0.003   -105.703      0.000      -0.370      -0.356
jaccard_sim_sa

                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1232694
Model:                            GLM   Df Residuals:                  1232692
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2424e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6485e+06
Time:                        03:23:55   Pearson chi2:                 1.23e+06
No. Iterations:                     4                                         
                       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept           -0.3592      0.003   -103.657      0.000      -0.366      -0.352
jaccard_sim_same    -0.2465      0

HYPOTHESIS 3 TESTS: Other things being equal, a moralistic argument will be more likely to result in a delta if the argument is presented quickly after the initial post.

In [67]:
print('FULL DATA ANALYSES:')
try:
    logit = smf.glm(formula = 'delta_thread ~ time_diff',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H3', 'full data', 'delta_thread', 'time_diff', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H3', 'full data', 'delta_thread', 'time_diff', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ time_diff',
                    data = data,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H3', 'full data', 'delta', 'time_diff', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H3', 'full data', 'delta', 'time_diff', 'error', 'error', 'error'])
    pass


print('DATA SUBSET ANALYSES (DELTA-AWARDED SUBSET):')
# Again, no delta_thread prediction models are here due to their collinearity
try:
    logit = smf.glm(formula = 'delta ~ time_diff',
                    data = data_subset_deltas,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H3', 'delta subset', 'delta', 'time_diff', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H3', 'delta subset', 'delta', 'time_diff', 'error', 'error', 'error'])
    pass


print('DATA SUBSET ANALYSES (MORAL SUBSET):')
try:
    logit = smf.glm(formula = 'delta_thread ~ time_diff',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H3', 'moral subset', 'delta_thread', 'time_diff', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('\n', '\n')
except:
    outcomes.append(['H3', 'moral subset', 'delta_thread', 'time_diff', 'error', 'error', 'error'])
    pass

try:
    logit = smf.glm(formula = 'delta ~ time_diff',
                    data = data_subset_moral,
                    family = sm.families.Binomial()).fit()
    print(logit.summary())
    print('\n', 'Coef. odds ratio:', '\n', np.exp(logit.params))
    outcomes.append(['H3', 'moral subset', 'delta', 'time_diff', logit.params[1], logit.params[1] > 0, logit.pvalues[1] < 0.05])
    print('', '-' * 80, '\n', '-' * 80, '\n', '-' * 80, '\n', '-' * 80)
    print('\n', '\n', '\n', '\n')
except:
    outcomes.append(['H3', 'moral subset', 'delta', 'time_diff', 'error', 'error', 'error'])
    pass

FULL DATA ANALYSES:
                 Generalized Linear Model Regression Results                  
Dep. Variable:           delta_thread   No. Observations:              1239691
Model:                            GLM   Df Residuals:                  1239689
Model Family:                Binomial   Df Model:                            1
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:            -8.2928e+05
Date:                Fri, 08 Sep 2017   Deviance:                   1.6586e+06
Time:                        03:24:04   Pearson chi2:                 1.24e+06
No. Iterations:                     4                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4479      0.002   -240.766      0.000      -0.452      -0.444
time_diff   3.557e-08   3.36e-09

END: Export outcome list as .csv file

In [71]:
outcomes_df = pd.DataFrame(outcomes)
outcomes_df.to_csv("data_analysis_outcomes.csv")