This file contains all scripts to get the statistics in our paper. To run this script, you need to unzip Dataset/kept_lambda_dataset.zip and Dataset/removed_lambda_dataset.zip first.

In [12]:
import os
removed_num = len(os.listdir('Dataset/removed_lambda_dataset'))
kept_num = len(os.listdir('Dataset/kept_lambda_dataset'))

In [13]:
print('Number of removed lambdas: ' + str(removed_num))
print('Number of kept lambdas: ' + str(kept_num))

Number of removed lambdas: 3662
Number of kept lambdas: 31228


Life time of removed lambdas:

In [14]:
import csv

removed_in_one_month = 0
removed_after_one_year = 0
with open('Quantitative Analysis/Lifetime of lambdas/life_time.csv') as f:
    for row in csv.reader(f):
        if row[1] == 'Removed' and int(row[0]) == 0:
            removed_in_one_month += 1
        if row[1] == 'Removed' and int(row[0]) >= 12:
            removed_after_one_year += 1
    print ("There are %d lambda expressions out of %d removed in one month, the percentage is %.2f%%." % (removed_in_one_month, removed_num, removed_in_one_month / removed_num * 100))
    print ("%.2f%% (%d out of %d) lambdas are removed after one year" % (removed_after_one_year / removed_num * 100, removed_after_one_year, removed_num))

There are 1401 lambda expressions out of 3662 removed in one month, the percentage is 38.26%.
20.23% (741 out of 3662) lambdas are removed after one year


Chi-Square test for lambda expressions removed or not and usage of functional interface:

In [15]:
from scipy.stats import chi2_contingency
import math

removed_built_in = 1188
kept_built_in = 12828
removed_self_defined = 659
kept_self_defined = 4995

table = [[removed_self_defined, kept_self_defined], [removed_built_in, kept_built_in]]
size = sum(table[0]) + sum(table[1])
chi_square, p, df, expected = chi2_contingency(table)
fi = math.sqrt(chi_square / size)
print("Chi-Square: " + str(chi_square))
print("p-value: " + str(p))
print("Degree of freedom: " + str(df))
print('Effect size: ' + str(fi))

Chi-Square: 47.49414926448731
p-value: 5.516835530106593e-12
Degree of freedom: 1
Effect size: 0.04913804531036624


Chi-Square test for lambda expressions removed or not and whether the lambda has parameters:

In [16]:
from scipy.stats import chi2_contingency
import math

removed_empty_para = 1404
removed_has_para = 2258
kept_empty_para = 9604
kept_has_para = 21624

table = [[removed_empty_para, kept_empty_para], [removed_has_para, kept_has_para]]
size = sum(table[0]) + sum(table[1])
chi_square, p, df, expected = chi2_contingency(table)
fi = math.sqrt(chi_square / size)
print("Chi-Square: " + str(chi_square))
print("p-value: " + str(p))
print("Degree of freedom: " + str(df))
print('Effect size: ' + str(fi))

Chi-Square: 86.97131133703881
p-value: 1.1010682017865398e-20
Degree of freedom: 1
Effect size: 0.04992723605868541


Mann-Whitney test for lambda lines for removed and kept empty argument lambdas:

In [17]:
from scipy import stats
import csv
import numpy as np

kept_lines = []
removed_lines = []
with open('Quantitative Analysis/The complexity of lambda expressions/lines_of_empty_argument_lambdas/lines_empty_argument_lambda_raw.csv') as f:
    for row in csv.reader(f):
        if row[1] == 'Kept':
            kept_lines.append(int(row[0]))
        if row[1] == 'Removed':
            removed_lines.append(int(row[0]))
    
    statistics, p = stats.mannwhitneyu(kept_lines, removed_lines, alternative='greater')
    print('p-value: ' + str(p))

p-value: 2.4407145638139196e-16


Mann-Whitney test for lambda body depth for removed and kept empty argument lambdas:

In [18]:
from scipy import stats
import csv
import numpy as np

kept_depth = []
removed_depth = []
with open('Quantitative Analysis/The complexity of lambda expressions/body_depth_of_empty_argument_lambdas/body_depth_empty_argument_lambda_raw.csv') as f:
    for row in csv.reader(f):
        if row[1] == 'Kept':
            kept_depth.append(int(row[0]))
        if row[1] == 'Removed':
            removed_depth.append(int(row[0]))
    
    statistics, p = stats.mannwhitneyu(kept_depth, removed_depth, alternative='greater')
    print('p-value: ' + str(p))

p-value: 8.909011334696569e-28


Mann-Whitney test for lambda lines for removed and kept non-empty argument lambdas:

In [19]:
from scipy import stats
import csv
import numpy as np

kept_lines = []
removed_lines = []
with open('Quantitative Analysis/The complexity of lambda expressions/lines_of_non_empty_argument_lambdas/lines_non_empty_argument_lambda_raw.csv') as f:
    for row in csv.reader(f):
        if row[1] == 'Kept':
            kept_lines.append(int(row[0]))
        if row[1] == 'Removed':
            removed_lines.append(int(row[0]))
    
    statistics, p = stats.mannwhitneyu(kept_lines, removed_lines, alternative='less')
    print('p-value: ' + str(p))

p-value: 2.469788400293504e-12


Mann-Whitney test for lambda captured variable number for removed and kept lambdas:

In [20]:
from scipy import stats
import csv
import numpy as np

kept_variable = []
removed_variable = []
with open('Quantitative Analysis/The complexity of lambda expressions/number_of_variables_used_in_lambda_bodies/variable_number_raw.csv') as f:
    for row in csv.reader(f):
        if row[1] == 'Kept':
            kept_variable.append(int(row[0]))
        if row[1] == 'Removed':
            removed_variable.append(int(row[0]))
    
    statistics, p = stats.mannwhitneyu(kept_variable, removed_variable, alternative='less')
    print('p-value: ' + str(p))

4.893492023297037
5.490438695163105
p-value: 0.003581436371616115


Chi-Square test for lambda expressions removed or not and whether the lambda is passed to a built-in or self-defined method:

In [21]:
from scipy.stats import chi2_contingency
import math

removed_built_in = 624
kept_built_in = 9361
removed_self_defined = 1289
kept_self_defined = 9482

table = [[removed_self_defined, kept_self_defined], [removed_built_in, kept_built_in]]
size = sum(table[0]) + sum(table[1])
chi_square, p, df, expected = chi2_contingency(table)
fi = math.sqrt(chi_square / size)
print("Chi-Square: " + str(chi_square))
print("p-value: " + str(p))
print("Degree of freedom: " + str(df))
print('Effect size: ' + str(fi))

Chi-Square: 201.78817255919424
p-value: 8.503957577377727e-46
Degree of freedom: 1
Effect size: 0.09859979741392562
