# Analysis of survey evaluations
This Jupyter notebook examines the evaluations recorded in `data/evaluations.csv`.

## The data
We start by loading the CSV file into a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and print some information on the size and structure of the dataset.

In [1]:
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn', hides SettingWithCopyWarning

file = 'data/evaluations.csv'
conversion_dict = {'research_type': lambda x: int(x == 'E')}
evaluation_data = pd.read_csv(file, sep=',', header=0, index_col=0, converters=conversion_dict)

print('Samples per conference\n{}'.format(evaluation_data.groupby('conference').size()), end='\n')

column_headers = evaluation_data.columns.values
print('\nColumn headers: {}'.format(column_headers))

Samples per conference
conference
AAAI 14     100
AAAI 16     100
IJCAI 13    100
IJCAI 16    100
dtype: int64

Column headers: ['title' 'research_type' 'result_outcome' 'affiliation'
 'problem_description' 'goal/objective' 'research_method'
 'research_question' 'hypothesis' 'prediction' 'contribution' 'pseudocode'
 'open_source_code' 'open_experiment_code' 'train' 'validation' 'test'
 'results' 'hardware_specification' 'software_dependencies'
 'third_party_citation' 'experiment_setup' 'evaluation_criteria' 'authors'
 'link' 'comments' 'conference']


The dataset has 400 samples with 27 columns. Some of these columns are not necessary for further analysis: *title*, *authors*, *link*, *comments*. Dropping these leaves us with a numerical index for each paper, the conference it was published to, and survey related data. The lambda function above converts the *research_type* data from E (experimental) and T (theoretical) to 1 and 0 respectively, making it easier to work with in pandas.

In [2]:
evaluation_data.drop(['title', 'authors', 'link', 'comments'], axis=1, inplace=True)
column_headers = evaluation_data.columns.values

evaluation_data.head(2)

Unnamed: 0_level_0,research_type,result_outcome,affiliation,problem_description,goal/objective,research_method,research_question,hypothesis,prediction,contribution,...,train,validation,test,results,hardware_specification,software_dependencies,third_party_citation,experiment_setup,evaluation_criteria,conference
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1,1,0,1,0,0,0,0,0,1,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,IJCAI 16
2,0,1,0,0,0,0,0,0,0,0,...,,,,,,,,,,IJCAI 16


The above two rows exemplify an experimental (top row) and a theoretical (bottom row) paper. Values with a NaN value appear for theoretical papers for all columns that are specific to experimental papers. For the *affiliation* column, 0 represents academia, 1 represents collaboration and 2 represents industry authors. The remaining columns are boolean, 1 if documented and 0 if not. Note that some experimental papers have no value (NaN) for training and/or validation data if a train/validation/test split is not applicable.

## Miscellaneous statistics

In [3]:
print('Samples per affiliation\n{}'.format(evaluation_data.groupby('affiliation').size()), end='\n\n')
print('Affiliation by conference\n{}'.format(evaluation_data.groupby(['conference', 'affiliation']).size()), end='\n\n')

print('Samples per research type\n{}'.format(evaluation_data.groupby('research_type').size()), end='\n\n')
print('Research type by conference\n{}'.format(evaluation_data.groupby(['conference', 'research_type']).size()), end='\n\n')

print('Samples per research outcome\n{}'.format(evaluation_data.groupby('result_outcome').size()), end='\n\n')
print('Research outcome by conference\n{}'.format(evaluation_data.groupby(['conference', 'result_outcome']).size()), end='\n\n')


Samples per affiliation
affiliation
0    331
1     58
2     11
dtype: int64

Affiliation by conference
conference  affiliation
AAAI 14     0              83
            1              14
            2               3
AAAI 16     0              79
            1              17
            2               4
IJCAI 13    0              89
            1              11
IJCAI 16    0              80
            1              16
            2               4
dtype: int64

Samples per research type
research_type
0     75
1    325
dtype: int64

Research type by conference
conference  research_type
AAAI 14     0                15
            1                85
AAAI 16     0                15
            1                85
IJCAI 13    0                29
            1                71
IJCAI 16    0                16
            1                84
dtype: int64

Samples per research outcome
result_outcome
0     23
1    377
dtype: int64

Research outcome by conference
conference  result_outcome

## Extracting experimental papers
Analysis of the reproducibility is relevant for experimental papers, as such we will filter out the experimental papers.

In [4]:
experimental_data = evaluation_data[evaluation_data.research_type == 1]

## $R3(e) = Method(e)$

In [5]:
method = ['conference','problem_description','goal/objective','research_method',
        'research_question','contribution','pseudocode']
r3_columns = method

print('Total R3(e) = {}'.format(experimental_data[r3_columns].all(axis=1).sum()))
experimental_data.loc[:, 'r3'] = experimental_data[r3_columns].all(axis=1)
print('\nR3(e) by conference\n{}\n'.format(
    experimental_data[['r3', 'conference']].groupby('conference').sum()['r3']))

print('Total R3D(e) = {}'.format(experimental_data[r3_columns].mean(axis=1).mean()))
experimental_data.loc[:, 'r3d'] = experimental_data[r3_columns].mean(axis=1)
print('\nR3d(e) by conference\n{}\n'.format(
    experimental_data[['r3d', 'conference']].groupby('conference').mean()['r3d']))

Total R3(e) = 0

R3(e) by conference
conference
AAAI 14     False
AAAI 16     False
IJCAI 13    False
IJCAI 16    False
Name: r3, dtype: bool

Total R3D(e) = 0.2964102564102564

R3d(e) by conference
conference
AAAI 14     0.307843
AAAI 16     0.274510
IJCAI 13    0.265258
IJCAI 16    0.333333
Name: r3d, dtype: float64



## $R2(e) = Method(e) \land Data(e)$

In [6]:
data = ['train', 'validation', 'test', 'results']
r2_columns = r3_columns + data

print('Total Data(e) = {}'.format(experimental_data[data].all(axis=1).sum()))
experimental_data.loc[:, 'data'] = experimental_data[data].all(axis=1)
print('Data(e) by conference\n{}\n'.format(
    experimental_data[['data', 'conference']].groupby('conference').sum()['data']))

print('Total DataDegree(e) = {}'.format(experimental_data[data].mean(axis=1).mean()))
experimental_data.loc[:, 'dataD'] = experimental_data[data].mean(axis=1)
print('DataDegree(e) by conference\n{}\n'.format(
    experimental_data[['dataD', 'conference']].groupby('conference').mean()['dataD']))

print('Total R2(e) = {}'.format(experimental_data[r2_columns].all(axis=1).sum()))
experimental_data.loc[:, 'r2'] = experimental_data[r2_columns].all(axis=1)
print('\nR2(e) by conference\n{}\n'.format(
    experimental_data[['r2', 'conference']].groupby('conference').sum()['r2']))

print('Total R2D(e) = {}'.format(experimental_data[r2_columns].mean(axis=1).mean()))
experimental_data.loc[:, 'r2d'] = experimental_data[r2_columns].mean(axis=1)
print('\nR2d(e) by conference\n{}\n'.format(
    experimental_data[['r2d', 'conference']].groupby('conference').mean()['r2d']))

Total Data(e) = 9
Data(e) by conference
conference
AAAI 14     2.0
AAAI 16     1.0
IJCAI 13    0.0
IJCAI 16    6.0
Name: data, dtype: float64

Total DataDegree(e) = 0.2287179487179487
DataDegree(e) by conference
conference
AAAI 14     0.202941
AAAI 16     0.261765
IJCAI 13    0.131455
IJCAI 16    0.303571
Name: dataD, dtype: float64

Total R2(e) = 0

R2(e) by conference
conference
AAAI 14     False
AAAI 16     False
IJCAI 13    False
IJCAI 16    False
Name: r2, dtype: bool

Total R2D(e) = 0.277

R2d(e) by conference
conference
AAAI 14     0.275948
AAAI 16     0.272353
IJCAI 13    0.228991
IJCAI 16    0.323347
Name: r2d, dtype: float64



## $R1(e) = Method(e) \land Data(e) \land Exp(e)$

In [7]:
experiment = ['hypothesis', 'prediction',
        'open_source_code', 'open_experiment_code',
        'hardware_specification', 'software_dependencies',
        'experiment_setup', 'evaluation_criteria']
r1_columns = r2_columns + experiment

print('Total Exp(e) = {}'.format(experimental_data[experiment].all(axis=1).sum()))
experimental_data.loc[:, 'exp'] = experimental_data[experiment].all(axis=1)
print('Exp(e) by conference\n{}\n'.format(
    experimental_data[['exp', 'conference']].groupby('conference').sum()['exp']))

print('Total ExpDegree(e) = {}'.format(experimental_data[experiment].mean(axis=1).mean()))
experimental_data.loc[:, 'expD'] = experimental_data[experiment].mean(axis=1)
print('ExpDegree(e) by conference\n{}\n'.format(
    experimental_data[['expD', 'conference']].groupby('conference').mean()['expD']))

print('Total R1(e) = {}'.format(experimental_data[r1_columns].all(axis=1).sum()))
experimental_data.loc[:, 'r1'] = experimental_data[r1_columns].all(axis=1)
print('\nR1(e) by conference\n{}\n'.format(
    experimental_data[['r1', 'conference']].groupby('conference').sum()['r1']))

print('Total R1D(e) = {}'.format(experimental_data[r1_columns].mean(axis=1).mean()))
experimental_data.loc[:, 'r1d'] = experimental_data[r1_columns].mean(axis=1)
print('\nR1d(e) by conference\n{}\n'.format(
    experimental_data[['r1d', 'conference']].groupby('conference').mean()['r1d']))

Total Exp(e) = 0
Exp(e) by conference
conference
AAAI 14     False
AAAI 16     False
IJCAI 13    False
IJCAI 16    False
Name: exp, dtype: bool

Total ExpDegree(e) = 0.22346153846153846
ExpDegree(e) by conference
conference
AAAI 14     0.172059
AAAI 16     0.214706
IJCAI 13    0.197183
IJCAI 16    0.306548
Name: expD, dtype: float64

Total R1(e) = 0

R1(e) by conference
conference
AAAI 14     False
AAAI 16     False
IJCAI 13    False
IJCAI 16    False
Name: r1, dtype: bool

Total R1D(e) = 0.2519884364002011

R1d(e) by conference
conference
AAAI 14     0.227028
AAAI 16     0.245997
IJCAI 13    0.213552
IJCAI 16    0.315797
Name: r1d, dtype: float64



## Versions
Here's a generated output to keep track of software versions used to run this Jupyter notebook.

In [8]:
import IPython
import platform

print('Python version: {}'.format(platform.python_version()))
print('IPython version: {}'.format(IPython.__version__))
print('pandas version: {}'.format(pd.__version__))

Python version: 3.5.3
IPython version: 6.1.0
pandas version: 0.20.3
