# Investment crowdfunding has little faith in sustainability! At least for the moment.

## Journal:

*Venture Capital*, 25:1, 91-115, 2023, [DOI: 10.1080/13691066.2022.2129510](https://doi.org/10.1080/13691066.2022.2129510)

## Authors:

Carmen Mendoza

Isabel María Parra Oller

Álvaro Rezola (@alvarorezola)

Nuria Suárez

In [14]:
# Import libraries
import pandas as pd
import locale
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
from sklearn.linear_model import LogisticRegression
from statsmodels.api import add_constant
from statsmodels.discrete.discrete_model import Probit
from statsmodels.discrete.discrete_model import Probit
import math

In [15]:
# Function definition
def significance_stars(pvalue):
    if pvalue < 0.01:
        return '***'
    elif pvalue < 0.05:
        return '**'
    elif pvalue < 0.1:
        return '*'
    else:
        return '-'
    
def summary_with_stars(model):
    # get the summary table as a DataFrame
    summary_df = model.summary2().tables[1]
    
    # create a new column for stars
    summary_df['stars'] = summary_df['P>|z|'].apply(significance_stars)

    # return the modified summary table
    return summary_df

def confidence_interval(X1, X2):
    #calculate degrees of freedom
    df = len(X1) + len(X2) - 2

    #calculate standard error
    s1 = np.var(X1, ddof=1)
    s2 = np.var(X2, ddof=1)
    n1 = len(X1)
    n2 = len(X2)
    standard_error = np.sqrt((s1/n1) + (s2/n2))

    #calculate margin of error
    margin_of_error = stats.t.ppf(0.95, df) * standard_error

    #calculate confidence interval
    lower_limit = (np.mean(X1) - np.mean(X2)) - margin_of_error
    upper_limit = (np.mean(X1) - np.mean(X2)) + margin_of_error

    return lower_limit, upper_limit

# calculamos el logit del puntuaje de propensión para emparejar
def logit(p):
    logit_value = math.log(p/(1 - p))
    return logit_value

# calcular standard errors ATE
def ate_se(df):
    # obtener el numero de observaciones de cada grupo
    n_t = df[df.sustainable == 1].shape[0]
    n_c = df[df.sustainable == 0].shape[0]
    
    # obtener la varianza de los resultados en cada grupo
    s_t = df[df.sustainable == 1].exito.var()
    s_c = df[df.sustainable == 0].exito.var()
    
    # calcular error estandar del ATE usando la fórmula
    se = np.sqrt((s_t / n_t) + (s_c / n_c))
    
    return se

In [None]:
# Import data
stata_dataset = "/home/alvaro/Desktop/MendozaEtAl2023-VC/Data/CROWD_SUSTAINABILITY_FINAL.dta"
df = pd.read_stata(stata_dataset)

# Prepare data-set
df.drop(index=range(3679,len(df)), inplace=True)
df = df[df["form_c"] == 1]
df["deadline"] = pd.to_datetime(df["deadline"])
df["datestart"] = pd.to_datetime(df["datestart"])
df["dateincorporation"] = pd.to_datetime(df["dateincorporation"])
df = df[df["deadline"] <= "2019-10-01"]
df["totalassetsmostrecent1"] += 1e-6
df["logtotalassetsmostrecent1"] = np.log(df["totalassetsmostrecent1"])
df = df.dropna(subset=["desastre6meses"])
df['desastre6meses_indemnizaciones'].fillna(0, inplace=True)
df["desastre6meses_indemnizaciones"] += 1e-6
df["log_desastre6meses_ind"] = np.log(df["desastre6meses_indemnizaciones"])

# Bank Net Income to Total Assets as value of the ratio net income-to-total assets in the banking sector per state and year.
df["bank_netincome_assets"] = df["lagnetinc"]/df["lagasset"]*100
df['bank_netincome_assets'].fillna(0, inplace=True)
df['banknetincome_assets_number'] = df['bank_netincome_assets']*df['lagbanks']
df['banknetincome_assets_number'].fillna(0, inplace=True)

# Treatment and control dataframes
df_sust = df[df["sustainable"] == 1]
df_non_sust = df[df["sustainable"] == 0]

# Fixed effect variables
year_dummies = ["yr2c"]
for i in range(3, 19):
    year_dummies.append(f"yr{i}c")
industry_dummies = ["ind2c"]
for i in range(3, 69):
    industry_dummies.append(f"ind{i}c")
state_dummies = ["state2c"]
for i in range(3, 63):
    state_dummies.append(f"state{i}c")

There are 1,766 investment crowdfunding campaings issued under the Form C exemption from May 1st, 2016 until September 10th, 2019.

### Table 1 (a): Descriptive statistics
This table shows the descriptive statistics – mean, standard deviation, 25th percentile, median, 75th percentile – of the main variables of interest.

In [17]:
df[["exito",
    "quick75relative",
    "sustainable",
    "totalassetsmostrecent1",
    "employees",
    "loglife",                     # Age as in np.log(date_diff(start, incorporation))              
    "equity",                     
    "asked",
    "lagbranches",                 # Bank branches
    "bank_netincome_assets",       # Ratio Net Income to Total Assets (after we will use banknetincome_assets_number)
    "lagvcfundraising",            # VC fundraising (not sure if then I need to do the np.log())
    "loglagnum_oper_por_platf_y",  # Number of offerings per platform
    ]].describe()

Unnamed: 0,exito,quick75relative,sustainable,totalassetsmostrecent1,employees,loglife,equity,asked,lagbranches,bank_netincome_assets,lagvcfundraising,loglagnum_oper_por_platf_y
count,1766.0,1766.0,1766.0,1766.0,1766.0,1765.0,1766.0,1766.0,1760.0,1766.0,1581.0,1765.0
mean,0.346546,0.026614,0.151755,334653.0,5.361835,6.102006,0.270102,70457.05,3968.019318,1.037196,8813.899414,67.960342
std,0.476001,0.160998,0.358885,1762435.0,10.314373,1.464374,0.44414,118504.8,2359.036538,0.558752,11128.952148,65.647285
min,0.0,0.0,0.0,1e-06,0.0,0.69,0.0,1000.0,117.0,0.0,0.62,2.0
25%,0.0,0.0,0.0,1e-06,1.0,5.32,0.0,10000.0,1484.0,0.818707,90.800003,9.0
50%,0.0,0.0,0.0,26561.0,3.0,6.35,0.0,25000.0,4260.0,0.885748,943.030029,37.0
75%,1.0,0.0,0.0,203207.0,6.0,7.15,1.0,78750.0,6728.0,1.091834,20763.099609,110.0
max,1.0,1.0,1.0,58129370.0,225.0,9.87,1.0,1070000.0,6868.0,4.925659,33174.179688,262.0


 ### Table 1 (b): Mean differences across subsamples
 This table shows the mean values of the main variables across the two subsamples of offerings and the T-statistic for the mean differences. The T-statistics reported are obtained for the differences between the means across groups of offerings. All the variables are defined in Annex 1. ***, **, and * indicate statistical significance at 1, 5, and 10 percent, respectively.

In [18]:
data = {
    "success": (df_non_sust["exito"],df_sust["exito"]),
    "quick75relative": (df_non_sust["quick75relative"],df_sust["quick75relative"]),
    "totalassetsmostrecent1": (df_non_sust["totalassetsmostrecent1"],df_sust["totalassetsmostrecent1"]),
    "employees": (df_non_sust["employees"],df_sust["employees"]),
    "loglife": (df_non_sust["loglife"],df_sust["loglife"]),
    "equity": (df_non_sust["equity"],df_sust["equity"]),
    "asked": (df_non_sust["asked"],df_sust["asked"]),
    "lagbranches": (df_non_sust["lagbranches"],df_sust["lagbranches"]),
    "banknetincome_assets_number": (df_non_sust["banknetincome_assets_number"], df_sust["banknetincome_assets_number"]),
    "lagvcfundraising": (df_non_sust["lagvcfundraising"],df_sust["lagvcfundraising"]),
    "loglagnum_oper_por_platf_y": (df_non_sust["loglagnum_oper_por_platf_y"],df_sust["loglagnum_oper_por_platf_y"]),
}

results = {}

for key in data:
    group1 = data[key][0]
    group2 = data[key][1]

    # calculate t-statistic    
    t_statistic, p_value = stats.ttest_ind(group1, group2, nan_policy="omit")
    
    # Indicate statistical significance at different levels
    if p_value < 0.01:
        significance = "***"
    elif p_value < 0.05:
        significance = "**"
    elif p_value < 0.10:
        significance = "*"
    else:
        significance = "-"
        
    # calculate mean of each variable
    mean_group1 = np.mean(group1)
    mean_group2 = np.mean(group2)

    results[key]={'non sustainable':mean_group1,
                  'sustainable':mean_group2,
                  't-statistic':t_statistic,
                  'p-value':p_value,
                  'significance':significance}
# Display results
pd.DataFrame(results).T

# print(df_results)

Unnamed: 0,non sustainable,sustainable,t-statistic,p-value,significance
success,0.341121,0.376866,-1.132289,0.257667,-
quick75relative,0.02737,0.022388,0.46644,0.640958,-
totalassetsmostrecent1,327075.5,377007.5625,-0.427066,0.669383,-
employees,5.491989,4.634328,1.253924,0.210036,-
loglife,6.067301,6.295858,-2.356181,0.018573,**
equity,0.261682,0.317164,-1.884844,0.059615,*
asked,70123.804688,72319.789062,-0.279324,0.780029,-
lagbranches,4045.154722,3536.696629,3.252588,0.001165,***
banknetincome_assets_number,139.827187,107.90303,3.804976,0.000147,***
lagvcfundraising,8817.848633,8790.697266,0.034193,0.972728,-


### Table 2: Propensity Score Matching 
This table shows the mean values of the main variables across the two subsamples of offerings and the t-statistics obtained for the differences between the means across groups of offerings, before matching and after matching using caliper, nearest 1-to-1 and nn-VBC methods. ***, ** and * indicate statistical significance at 1, 5, and 10%, respectively.

In [19]:
# Test for heteroskedascity to understand if we should use robust standard errors

In [20]:
# Two probit functions to validate covariates
formula_treatment = f"sustainable ~ logtotalassetsmostrecent1 + logemployees1 + logasked1"
treatment_model = smf.probit(formula=formula_treatment, data=df).fit(cov_type='HC0')

# Get log-likelihood
log_likelihood = treatment_model.llf

# Get Wald chi-squared test statistic and p-value
wald_test = treatment_model.wald_test(treatment_model.model.exog_names)
wald_chi2 = wald_test.statistic[0][0]
wald_pvalue = wald_test.pvalue

print(f'Log-Likelihood: {log_likelihood:.4f}')
print(f'Wald Chi-Squared: {wald_chi2:.4f}')
print(f'Wald p-value: {wald_pvalue:.4f}')

summary_with_stars(treatment_model)

Optimization terminated successfully.
         Current function value: 0.424215
         Iterations 5
Log-Likelihood: -749.1635
Wald Chi-Squared: 801.8215
Wald p-value: 0.0000




Unnamed: 0,Coef.,Std.Err.,z,P>|z|,[0.025,0.975],stars
Intercept,-0.398424,0.355543,-1.120606,0.262456,-1.095275,0.298428,-
logtotalassetsmostrecent1,0.004643,0.003503,1.32546,0.185019,-0.002223,0.011509,-
logemployees1,-0.075966,0.045489,-1.669971,0.094925,-0.165123,0.013192,*
logasked1,-0.052412,0.033573,-1.561123,0.118495,-0.118213,0.01339,-


In [21]:
formula_outcome = f"exito ~ logtotalassetsmostrecent1 + logemployees1 + logasked1"
outcome_model = smf.probit(formula=formula_outcome, data=df).fit(cov_type='HC0')

# Get log-likelihood
log_likelihood = outcome_model.llf

# Get Wald chi-squared test statistic and p-value
wald_test = outcome_model.wald_test(outcome_model.model.exog_names)
wald_chi2 = wald_test.statistic[0][0]
wald_pvalue = wald_test.pvalue

print(f'Log-Likelihood: {log_likelihood:.4f}')
print(f'Wald Chi-Squared: {wald_chi2:.4f}')
print(f'Wald p-value: {wald_pvalue:.4f}')

summary_with_stars(outcome_model)

Optimization terminated successfully.
         Current function value: 0.626012
         Iterations 5
Log-Likelihood: -1105.5368
Wald Chi-Squared: 227.5048
Wald p-value: 0.0000




Unnamed: 0,Coef.,Std.Err.,z,P>|z|,[0.025,0.975],stars
Intercept,1.497462,0.289416,5.174074,2.290439e-07,0.930216,2.064708,***
logtotalassetsmostrecent1,0.003466,0.003018,1.148521,0.2507537,-0.002449,0.009382,-
logemployees1,0.127192,0.039222,3.242845,0.001183428,0.050318,0.204066,***
logasked1,-0.201891,0.027189,-7.425398,1.124414e-13,-0.255181,-0.148601,***


### Table 2 (A): Before Matching

In [22]:
df[["exito", "quick75relative", "logtotalassetsmostrecent1", "logemployees1", "logasked1"]].describe()
df_sust = df[df["sustainable"] == 1]
df_non_sust = df[df["sustainable"] == 0]

data = {
    "success": (df_sust["exito"], df_non_sust["exito"]),
    "quick75relative": (df_sust["quick75relative"], df_non_sust["quick75relative"]),
    "logtotalassetsmostrecent1": (df_sust["logtotalassetsmostrecent1"], df_non_sust["logtotalassetsmostrecent1"]),
    "logemployees1": (df_sust["logemployees1"], df_non_sust["logemployees1"]),
    "logasked1": (df_sust["logasked1"], df_non_sust["logasked1"])
}

results = {}

for key in data:
    group1 = data[key][0]
    group2 = data[key][1]

    # calculate t-statistic    
    t_statistic, p_value = stats.ttest_ind(group1, group2, nan_policy="omit")
    
    # Indicate statistical significance at different levels
    if p_value < 0.01:
        significance = "***"
    elif p_value < 0.05:
        significance = "**"
    elif p_value < 0.10:
        significance = "*"
    else:
        significance = "-"
        
    # calculate mean of each variable
    mean_group1 = np.mean(group1)
    mean_group2 = np.mean(group2)

    results[key]={'sustainable':mean_group1,
                  'non_sustainable':mean_group2,
                  "difference" :(mean_group1 - mean_group2),
                  't-statistic':t_statistic,
                  'p-value':p_value,
                  'significance':significance}
# Display results
pd.DataFrame(results).T

Unnamed: 0,sustainable,non_sustainable,difference,t-statistic,p-value,significance
success,0.376866,0.341121,0.035744,1.132289,0.257667,-
quick75relative,0.022388,0.02737,-0.004982,-0.46644,0.640958,-
logtotalassetsmostrecent1,4.733572,4.211653,0.52192,0.699538,0.484308,-
logemployees1,1.36893,1.434692,-0.065762,-1.174444,0.240376,-
logasked1,10.326214,10.433923,-0.107709,-1.415024,0.157238,-


### Table 2 (B): After Matching

In [28]:
# Logit model para estimar el puntaje de propensión (ps)
model = LogisticRegression()
df = df.dropna(axis = 1) # drop all variables that have empty values

# Independent variables
X = df[["logtotalassetsmostrecent1",
        "logemployees1",
        "logasked1"]]

# Dependent variable (treatment group)
y = df["sustainable"]

# Model adjustment & predicted probabilities
model.fit(X, y) 
pred_prob = model.predict_proba(X)
df["ps"] = pred_prob[:, 1]

df["ps_logit"] = df.ps.apply(logit)

# Implementing the caliper match
def caliper_match(df, threshold):
    # ordenar los datos por ps_logit y crear una columna con el índice original
    df_sorted = df.sort_values("ps_logit").reset_index()
    df_sorted["orig_index"] = df_sorted.index
    
    # crear listas vacias para almacenar los indices emparejados y no emparejados
    matched_index = []
    unmatched_index = []
    
    # iterar sobre las filas del dataframe ordenado
    for i in range(len(df_sorted)):
        row = df_sorted.iloc[i]
        if i not in matched_index: # si la fila no está emparejada todavía
            potential_matches = df_sorted[(df_sorted.sustainable != row.sustainable) & (abs(df_sorted.ps_logit - row.ps_logit) <= threshold)]
            
            # encontrar las filas potenciales que tienen un tratamiento diferente y una diferencia de ps_logit menor o igual al umbral
            
            if len(potential_matches) > 0: # si hay al menos una fila potencialmente emparejable
                closest_match_index = potential_matches.iloc[0].orig_index
                # Tomar la priemra fila potencial como la más cercana
                
                matched_index.append(i)
                matched_index.append(closest_match_index)
                # añadir ambos índices a la lista de emparejados
                
            else:
                unmatched_index.append(i)
                # si no hay ninguna fila potencialmente emparejable, añadir el índice a la lista d eno emparejados
    return matched_index, unmatched_index

caliper_matched, caliper_unmatched = caliper_match(df[["sustainable", "logtotalassetsmostrecent1", "logemployees1", "logasked1", "ps_logit"]], 0.1) # umbral 0.2
caliper_df_matched = df.iloc[caliper_matched]

# mean of each variable in the treatment & control group
treatment_means_caliper = caliper_df_matched[caliper_df_matched["sustainable"] == 1]
control_means_caliper = caliper_df_matched[caliper_df_matched["sustainable"] == 0]

data = {
    "success": (treatment_means_caliper["exito"], control_means_caliper["exito"]),
    "quick75relative": (treatment_means_caliper["quick75relative"], control_means_caliper["quick75relative"]),
    "logtotalassetsmostrecent1": (treatment_means_caliper["logtotalassetsmostrecent1"], control_means_caliper["logtotalassetsmostrecent1"]),
    "logemployees1": (treatment_means_caliper["logemployees1"], control_means_caliper["logemployees1"]),
    "logasked1": (treatment_means_caliper["logasked1"], control_means_caliper["logasked1"])
}

results = {}

for key in data:
    group1 = data[key][0]
    group2 = data[key][1]

    # calculate t-statistic    
    t_statistic, p_value = stats.ttest_ind(group1, group2, nan_policy="omit")
    
    # Indicate statistical significance at different levels
    if p_value < 0.01:
        significance = "***"
    elif p_value < 0.05:
        significance = "**"
    elif p_value < 0.10:
        significance = "*"
    else:
        significance = "-"
        
    # calculate mean of each variable
    mean_group1 = np.mean(group1)
    mean_group2 = np.mean(group2)

    results[key]={'sustainable':mean_group1,
                  'non_sustainable':mean_group2,
                  "difference" :(mean_group1 - mean_group2),
                  't-statistic':t_statistic,
                  'p-value':p_value,
                  'significance':significance}
# Display results
pd.DataFrame(results).T

Unnamed: 0,sustainable,non_sustainable,difference,t-statistic,p-value,significance
success,0.340564,0.306615,0.033949,1.468236,0.142129,-
quick75relative,0.017354,0.020528,-0.003174,-0.45255,0.650901,-
logtotalassetsmostrecent1,5.069388,5.53946,-0.470072,-0.902258,0.366981,-
logemployees1,1.318171,1.482502,-0.16433,-4.057515,5.1e-05,***
logasked1,10.476681,10.427021,0.04966,0.890431,0.373295,-


### Table 3: Average Treatment Effect on the Treated (ATET) 
This table shows the average treatment effect on the treated individuals (ATET) for each method: caliper, nearest 1-to-1 and nn-VBC methods. ***, ** and * indicate statistical significance at 1, 5, and 10%, respectively.

In [24]:
caliper_ate_success = caliper_df_matched.groupby("sustainable")["exito"].mean().diff().iloc[-1]
caliper_ate_quick = caliper_df_matched.groupby("sustainable")["quick75relative"].mean().diff().iloc[-1]
caliper_se = ate_se(caliper_df_matched)

# realizar una preuba de t de dos muestras independientes y obtener los valores p y los intervalso de confianza al 95%
caliper_success_tstat, caliper_success_pvalue, caliper_success_desc = sms.ttest_ind(
    caliper_df_matched[caliper_df_matched.sustainable == 1].exito.values,
    caliper_df_matched[caliper_df_matched.sustainable == 0].exito.values,
    usevar="unequal",
    alternative="larger",
    value=0
)
lower_limit_success, upper_limit_success = confidence_interval(caliper_df_matched[caliper_df_matched.sustainable == 1].exito.values,
                                                               caliper_df_matched[caliper_df_matched.sustainable == 0].exito.values)

# realizar una preuba de t de dos muestras independientes y obtener los valores p y los intervalso de confianza al 95%
caliper_quick_tstat, caliper_quick_pvalue, caliper_quick_desc = sms.ttest_ind(
    caliper_df_matched[caliper_df_matched.sustainable == 1].quick75relative.values,
    caliper_df_matched[caliper_df_matched.sustainable == 0].quick75relative.values,
    usevar="unequal",
    alternative="larger",
    value=0
)

lower_limit_quick, upper_limit_quick = confidence_interval(caliper_df_matched[caliper_df_matched.sustainable == 1].quick75relative.values,
                                                           caliper_df_matched[caliper_df_matched.sustainable == 0].quick75relative.values)

data = {"Modelo caliper 1-to-1": ["success", "quickrelative75"],
        "ATE": [caliper_ate_success, caliper_ate_quick],
        "Error estándar": [caliper_se, caliper_se],
        "Valor p": [caliper_success_pvalue, caliper_quick_pvalue],
        "[Intervalo de": [lower_limit_success, lower_limit_quick],
        "confianza (95%)]": [upper_limit_success, upper_limit_quick]}

pd.DataFrame(data)

Unnamed: 0,Modelo caliper 1-to-1,ATE,Error estándar,Valor p,[Intervalo de,confianza (95%)]
0,success,0.033949,0.023612,0.075504,-0.004899,0.072798
1,quickrelative75,-0.003174,0.023612,0.684516,-0.014041,0.007693


### Table 4: Sustainability and success
This table presents IV results examining the effect of the sustainable orientation of investment crowdfunding offerings on the probability of success. The dependent variable in columns (1) and (2) is the dummy that identifies sustainable offerings (Sustainable). The dependent variable in columns (3) and (5) is SUCCESS. QUICK75 is the dependent variables in columns (4) and (6). Variables definitions are reported in Annex 1. Year, industry-year and state-year fixed effects are included but not reported. T-statistics are in parentheses. ***, ** and * indicate statistical significance at 1, 5, and 10%, respectively.

$$
SUCCESSFUL_{ijklt} = \beta_0 + \beta_1SUSTAINABLE\_predicted_{ijklt} + \beta_2FIRM_{jklt-1} + \beta_3OFFERING_{ijklt} + \theta_k + \delta_l + \lambda_t + \epsilon_{ijklt}
$$

In [29]:
# First-Stage Dependent Variable:
y = df['sustainable']

# specifications table 4.1
columns = ['desastre6meses',
           'sizemostrecent1',
           'logemployees1',
           'edad',
           'equity',
           'logasked1',
           "ind_year",
           "state_id"] + year_dummies
X = df[columns]

# Drop dummies that have a constant value for all rows
nunique = X[year_dummies].nunique()
X = X.drop(columns = nunique[nunique == 1].index)

# fit model 4.1
model1 = Probit(y, X).fit(cov_type='HC0')
results1 = pd.DataFrame(model1.params, columns=["Sustainable 1"])
results1["pvalue"] = model1.pvalues
results1.loc['Year Dummies', 'Sustainable 1'] = 'Yes'
results1.loc['Industry-Y Dummies', 'Sustainable 1'] = 'Yes'
results1.loc['State-Y Dummies', 'Sustainable 1'] = 'Yes'
results1['stars'] = results1["pvalue"].apply(significance_stars)
results1.loc['log_likelihood', 'Sustainable 1'] = model1.llf
wald_test = model1.wald_test(model1.model.exog_names)
wald_chi2 = wald_test.statistic[0][0]
results1.loc['wald_pvalue', 'Sustainable 1'] = wald_test.pvalue

# specifications table 4.2
columns = ['log_desastre6meses_ind',
           'sizemostrecent1',
           'logemployees1',
           'edad',
           'equity',
           'logasked1',
           "ind_year",
           "state_id"] + year_dummies

X = df[columns]

# Drop dummies that have a constant value for all rows
nunique = X[year_dummies].nunique()
X = X.drop(columns = nunique[nunique == 1].index)

# fit model 4.2
mode42 = Probit(y, X).fit(cov_type='HC0')
results42 = pd.DataFrame(mode42.params, columns=["Sustainable 2"])
results42["pvalue"] = mode42.pvalues
results42.loc['Year Dummies', 'Sustainable 2'] = 'Yes'
results42.loc['Industry-Y Dummies', 'Sustainable 2'] = 'Yes'
results42.loc['State-Y Dummies', 'Sustainable 2'] = 'Yes'
results42['stars'] = results42["pvalue"].apply(significance_stars)
results42.loc['log_likelihood', 'Sustainable 2'] = mode42.llf
wald_test = mode42.wald_test(mode42.model.exog_names)
wald_chi2 = wald_test.statistic[0][0]
results42.loc['wald_pvalue', 'Sustainable 2'] = wald_test.pvalue

# Desired order
row_order = ["desastre6meses",
             "log_desastre6meses_ind",
             "sizemostrecent1",
             "logemployees1",
             "edad",
             "equity",
             "logasked1",
             "ind_year",
             "state_id",
             'Year Dummies',
             "Industry-Y Dummies",
             "State-Y Dummies",
             "log_likelihood",
             "wald_pvalue"]

pd.concat([results1, results42], axis=1).reindex(row_order).drop(['ind_year', 'state_id'])

Optimization terminated successfully.
         Current function value: 0.405450
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.407041
         Iterations 6




Unnamed: 0,Sustainable 1,pvalue,stars,Sustainable 2,pvalue.1,stars.1
desastre6meses,-0.292856,0.002204,***,,,
log_desastre6meses_ind,,,,-0.005874,0.058445,*
sizemostrecent1,-0.003814,0.674738,-,-0.003874,0.668866,-
logemployees1,-0.121083,0.011615,**,-0.115823,0.015152,**
edad,0.085368,0.007388,***,0.084627,0.007611,***
equity,0.17912,0.040536,**,0.177633,0.041371,**
logasked1,-0.039076,0.286853,-,-0.041432,0.256842,-
Year Dummies,Yes,,-,Yes,,-
Industry-Y Dummies,Yes,,-,Yes,,-
State-Y Dummies,Yes,,-,Yes,,-


In [26]:
# Second-Stage Dependent Variable:
 
# Specificación 1

# calcular predicted Sustainab

# probit success

# probit quick

# Specificación 2

# calcular predicted Sustainab

# probit success

# probit quick

### Table 5: Sustainability and success: the role of firm- and offering-level characteristics
This table presents results examining the effect of firm- and offering-level characteristics on the relationship between the sustainable orientation of investment crowdfunding offerings and the probability of success. The dependent variable is SUCCESS. Variables definitions are reported in Annex 1. Year, industry-year and state-year fixed effects are included but not reported. T-statistics are in parentheses. *** and ** indicate statistical significance at 1 and 5 percent, respectively.

$$
SUCCESSFUL_{ijklt} = \beta_0 + \beta_1SUSTAINABLE\_predicted_{ijklt} + \beta_2FIRM_{jklt-1} + \beta_3OFFERING_{ijklt} + \beta_4SUSTAINABLE\_predicted_{ijklt}*FIRM_{jklt-1} + \beta_5SUSTAINABLE\_predicted_{ijklt}*FIRM_{jklt-1}*OFFERING_{ijklt} + \theta_k + \delta_l + \lambda_t + \epsilon_{ijklt}
$$

In [None]:
result = model.fit(cov_type='HC0')

# Uses the fitted model to make predictions for the dependent variable (sustainable)
df['sustainablep'] = result.predict(X)

# This new columns represent an interaction term between the predicted values of sustainable and the values of XX.
df['sustainablep_lagbranches'] = df['sustainablep']*df["lagbranches"]
df['sustainablep_netincome_assetsB'] = df['sustainablep']*df["bank_netincome_assets"]
df['sustainablep_lagvcfundraising'] = df['sustainablep']*df["lagvcfundraising"]
df['sustainablep_loglagnum_op_per_platform_y'] = df['sustainablep']*df["loglagnum_oper_por_platf_y"]

# Without desastre6meses
controls = ['sustainablep',
           'sizemostrecent1',
           'logemployees1',
           'edad',
           'equity',
           "logasked1",
           "ind_year",
           "state_id"] + year_dummies

Table_51_controls = controls + "sustainablep_size"
Table_52_controls = controls + "sustainablep_employees"
Table_53_controls = controls + "sustainablep_age"
Table_54_controls = controls + "sustainablep_equity"
Table_55_controls = controls + "sustainablep_asked"

X2 = df[Table_51_controls]

# Drop dummies that have a constant value for all rows
nunique = X2[year_dummies].nunique()
X2 = X2.drop(columns = nunique[nunique == 1].index)
y2 = df['exito']

model2 = Probit(y2, X2)
summary_with_stars(model2.fit(cov_type='HC0'))

### Table 6: Sustainability and success: the role of the financing environment
This table presents results examining the effect of the characteristics of the financing environment on the relationship between the sustainable orientation of investment crowdfunding offerings and the probability of success. The dependent variable is SUCCESS. Variables definitions are reported in Annex 1. Firm and offering control variables, year, industry-year and state-year fixed effects are included but not reported. T-statistics are in parentheses. ***; ** and * indicate statistical significance at 1, 5, and 10 percent, respectively.

In [None]:
result = model.fit(cov_type='HC0')

# Uses the fitted model to make predictions for the dependent variable (sustainable)
df['sustainablep'] = result.predict(X)

# This new columns represent an interaction term between the predicted values of sustainable and the values of XX.
df['sustainablep_lagbranches'] = df['sustainablep']*df["lagbranches"]
df['sustainablep_netincome_assetsB'] = df['sustainablep']*df["bank_netincome_assets"]
df['sustainablep_lagvcfundraising'] = df['sustainablep']*df["lagvcfundraising"]
df['sustainablep_loglagnum_op_per_platform_y'] = df['sustainablep']*df["loglagnum_oper_por_platf_y"]

# Without desastre6meses
controls = ['sustainablep',
           'sizemostrecent1',
           'logemployees1',
           'edad',
           'equity',
           "logasked1",
           "ind_year",
           "state_id"] + year_dummies

Table_61_controls = controls + "sustainablep_lagbranches"
Table_62_controls = controls + "sustainablep_netincome_assetsB"
Table_63_controls = controls + "sustainablep_lagvcfundraising"
Table_64_controls = controls + "sustainablep_loglagnum_op_per_platform_y"

X2 = df[Table_61_controls]

# Drop dummies that have a constant value for all rows
nunique = X2[year_dummies].nunique()
X2 = X2.drop(columns = nunique[nunique == 1].index)
y2 = df['exito']

model2 = Probit(y2, X2)
summary_with_stars(model2.fit(cov_type='HC0'))

### Table 7: Sustainability and success: robustness tests
This table presents a set of robustness tests for the relationship between the sustainable orientation of investment crowdfunding offerings and the probability of success. In column (1), we report the results for the second-stage regression for the Heckman (1979) method. In column (2), we find that the results do not vary when controlling for the characteristics of the team in terms of gender and size. In columns (3) to (5), we control for the funding history of the company. In column (6), we control for the cost structure defined by the funding portal. The dependent variable is SUCCESS. Variables definitions are reported in Annex 1. Firm and offering control variables, year, industry-year and state-year fixed effects are included but not reported. T-statistics are in parentheses. ***; ** and * indicate statistical significance at 1, 5, and 10 percent, respectively.