# Analysis of survey evaluations
This Jupyter notebook examines the evaluations recorded in `data/evaluations.csv`.

## The data
We start by loading the CSV file into a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and print some information on the size and structure of the dataset.

In [1]:
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn', hides SettingWithCopyWarning

file = 'data/evaluations.csv'
conversion_dict = {'research_type': lambda x: int(x == 'E')}
evaluation_data = pd.read_csv(file, sep=',', header=0, index_col=0, converters=conversion_dict)

print('Samples per conference\n{}'.format(evaluation_data.groupby('conference').size()), end='\n')

column_headers = evaluation_data.columns.values
print('\nColumn headers: {}'.format(column_headers))

Samples per conference
conference
AAAI 14     100
AAAI 16     100
IJCAI 13    100
IJCAI 16    100
dtype: int64

Column headers: ['title' 'research_type' 'result_outcome' 'affiliation'
 'problem_description' 'goal/objective' 'research_method'
 'research_question' 'hypothesis' 'prediction' 'contribution' 'pseudocode'
 'open_source_code' 'open_experiment_code' 'train' 'validation' 'test'
 'results' 'hardware_specification' 'software_dependencies'
 'third_party_citation' 'experiment_setup' 'evaluation_criteria' 'authors'
 'link' 'comments' 'conference']


The dataset has 400 samples with 27 columns. Some of these columns are not necessary for further analysis: *title*, *authors*, *link*, *comments*. Dropping these leaves us with a numerical index for each paper, the conference it was published to, and survey related data. The lambda function above converts the *research_type* data from E (experimental) and T (theoretical) to 1 and 0 respectively, making it easier to work with in pandas.

In [2]:
evaluation_data.drop(['title', 'authors', 'link', 'comments'], axis=1, inplace=True)
column_headers = evaluation_data.columns.values

evaluation_data.head(2)

Unnamed: 0_level_0,research_type,result_outcome,affiliation,problem_description,goal/objective,research_method,research_question,hypothesis,prediction,contribution,...,train,validation,test,results,hardware_specification,software_dependencies,third_party_citation,experiment_setup,evaluation_criteria,conference
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1,1,0,1,0,0,0,0,0,1,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,IJCAI 16
2,0,1,0,0,0,0,0,0,0,0,...,,,,,,,,,,IJCAI 16


The above two rows exemplify an experimental (top row) and a theoretical (bottom row) paper. Values with a NaN value appear for theoretical papers for all columns that are specific to experimental papers. For the *affiliation* column, 0 represents academia, 1 represents collaboration and 2 represents industry authors. The remaining columns are boolean, 1 if documented and 0 if not. Note that some experimental papers have no value (NaN) for training and/or validation data if a train/validation/test split is not applicable.

## Miscellaneous statistics

In [3]:
print('Samples per affiliation\n{}'.format(evaluation_data.groupby('affiliation').size()), end='\n\n')
print('Affiliation by conference\n{}'.format(evaluation_data.groupby(['conference', 'affiliation']).size()), end='\n\n')

print('Samples per research type\n{}'.format(evaluation_data.groupby('research_type').size()), end='\n\n')
print('Research type by conference\n{}'.format(evaluation_data.groupby(['conference', 'research_type']).size()), end='\n\n')

print('Samples per research outcome\n{}'.format(evaluation_data.groupby('result_outcome').size()), end='\n\n')
print('Research outcome by conference\n{}'.format(evaluation_data.groupby(['conference', 'result_outcome']).size()), end='\n\n')

print('Samples with contribution\n{}'.format(evaluation_data.groupby('contribution').size()), end='\n\n')
print('Contribution by conference\n{}'.format(evaluation_data.groupby(['conference', 'contribution']).size()), end='\n\n')

Samples per affiliation
affiliation
0    331
1     58
2     11
dtype: int64

Affiliation by conference
conference  affiliation
AAAI 14     0              83
            1              14
            2               3
AAAI 16     0              79
            1              17
            2               4
IJCAI 13    0              89
            1              11
IJCAI 16    0              80
            1              16
            2               4
dtype: int64

Samples per research type
research_type
0     75
1    325
dtype: int64

Research type by conference
conference  research_type
AAAI 14     0                15
            1                85
AAAI 16     0                15
            1                85
IJCAI 13    0                29
            1                71
IJCAI 16    0                16
            1                84
dtype: int64

Samples per research outcome
result_outcome
0     23
1    377
dtype: int64

Research outcome by conference
conference  result_outcome

## Extracting experimental papers
Analysis of the reproducibility is relevant for experimental papers, as such we will filter out the experimental papers.

In [4]:
experimental_data = evaluation_data[evaluation_data.research_type == 1]
early_years_index = (experimental_data.conference == 'AAAI 14') | (experimental_data.conference == 'IJCAI 13')
late_years_index = (experimental_data.conference == 'AAAI 16') | (experimental_data.conference == 'IJCAI 16')

## $R3(e) = Method(e)$

In [5]:
method = ['conference','problem_description','goal/objective','research_method',
        'research_question','pseudocode']
r3_columns = method

experimental_data.loc[:, 'R3'] = experimental_data[r3_columns].all(axis=1)
print('R3(e)\nTotal = {}'.format(experimental_data['R3'].sum()))
display(experimental_data[['R3', 'conference']].groupby('conference').sum())


experimental_data.loc[:, 'R3D'] = experimental_data[r3_columns].mean(axis=1)
print('\n\nR3D\nTotal: {:.4f}, variance = {:.4f}\nBy conference, followed by variance'
      .format(experimental_data['R3D'].mean(), experimental_data['R3D'].var()))
display(experimental_data[['R3D', 'conference']].groupby('conference').mean())
display(experimental_data[['R3D', 'conference']].groupby('conference').var())

print('\n\nYear\tR3D\tVariance\n2013/14\t{:.4f}\t{:.4f}'.format(
    experimental_data[early_years_index].R3D.mean(),
    experimental_data[early_years_index].R3D.var()))
print('2016\t{:.4f}\t{:.4f}'.format(
    experimental_data[late_years_index].R3D.mean(),
    experimental_data[late_years_index].R3D.var()))

R3(e)
Total = 0


Unnamed: 0_level_0,R3
conference,Unnamed: 1_level_1
AAAI 14,False
AAAI 16,False
IJCAI 13,False
IJCAI 16,False




R3D
Total: 0.2615, variance = 0.0342
By conference, followed by variance


Unnamed: 0_level_0,R3D
conference,Unnamed: 1_level_1
AAAI 14,0.28
AAAI 16,0.235294
IJCAI 13,0.23662
IJCAI 16,0.290476


Unnamed: 0_level_0,R3D
conference,Unnamed: 1_level_1
AAAI 14,0.039238
AAAI 16,0.034454
IJCAI 13,0.02664
IJCAI 16,0.034125




Year	R3D	Variance
2013/14	0.2603	0.0338
2016	0.2627	0.0349


## $R2(e) = Method(e) \land Data(e)$

In [6]:
data = ['train', 'validation', 'test', 'results']
r2_columns = r3_columns + data

experimental_data.loc[:, 'Data'] = experimental_data[data].all(axis=1)
print('Data(e)\nTotal = {:}'.format(experimental_data['Data'].sum()))
display(experimental_data[['Data', 'conference']].groupby('conference').sum())

experimental_data.loc[:, 'DataD'] = experimental_data[data].mean(axis=1)
print('\n\nDataDegree(e)\nTotal = {:.4f}, variance = {:.4f}\nBy conference, followed by variance'
      .format(experimental_data['DataD'].mean(), experimental_data['DataD'].var()))
display(experimental_data[['DataD', 'conference']].groupby('conference').mean())
display(experimental_data[['DataD', 'conference']].groupby('conference').var())

print('\n\nYear\tDataD\tVariance\n2013/14\t{:.4f}\t{:.4f}'.format(
    experimental_data[early_years_index].DataD.mean(),
    experimental_data[early_years_index].DataD.var()))
print('2016\t{:.4f}\t{:.4f}'.format(
    experimental_data[late_years_index].DataD.mean(),
    experimental_data[late_years_index].DataD.var()))


experimental_data.loc[:, 'R2'] = experimental_data[r2_columns].all(axis=1)
print('\n\nR2(e)\nTotal = {}'.format(experimental_data['R2'].sum()))
display(experimental_data[['R2', 'conference']].groupby('conference').sum())

experimental_data.loc[:, 'R2D'] = experimental_data[r2_columns].mean(axis=1)
print('\n\nR2D(e)\nTotal = {:.4f}, variance = {:.4f}\nBy conference, followed by variance'
      .format(experimental_data['R2D'].mean(), experimental_data['R2D'].var()))
display(experimental_data[['R2D', 'conference']].groupby('conference').mean())
display(experimental_data[['R2D', 'conference']].groupby('conference').var())

print('\n\nYear\tR2D\tVariance\n2013/14\t{:.4f}\t{:.4f}'.format(
    experimental_data[early_years_index].R2D.mean(),
    experimental_data[early_years_index].R2D.var()))
print('2016\t{:.4f}\t{:.4f}'.format(
    experimental_data[late_years_index].R2D.mean(),
    experimental_data[late_years_index].R2D.var()))

Data(e)
Total = 9


Unnamed: 0_level_0,Data
conference,Unnamed: 1_level_1
AAAI 14,2.0
AAAI 16,1.0
IJCAI 13,0.0
IJCAI 16,6.0




DataDegree(e)
Total = 0.2287, variance = 0.0763
By conference, followed by variance


Unnamed: 0_level_0,DataD
conference,Unnamed: 1_level_1
AAAI 14,0.202941
AAAI 16,0.261765
IJCAI 13,0.131455
IJCAI 16,0.303571


Unnamed: 0_level_0,DataD
conference,Unnamed: 1_level_1
AAAI 14,0.064723
AAAI 16,0.068312
IJCAI 13,0.048346
IJCAI 16,0.107035




Year	DataD	Variance
2013/14	0.1704	0.0582
2016	0.2825	0.0875


R2(e)
Total = 0


Unnamed: 0_level_0,R2
conference,Unnamed: 1_level_1
AAAI 14,False
AAAI 16,False
IJCAI 13,False
IJCAI 16,False




R2D(e)
Total = 0.2525, variance = 0.0251
By conference, followed by variance


Unnamed: 0_level_0,R2D
conference,Unnamed: 1_level_1
AAAI 14,0.254972
AAAI 16,0.247246
IJCAI 13,0.204924
IJCAI 16,0.295517


Unnamed: 0_level_0,R2D
conference,Unnamed: 1_level_1
AAAI 14,0.023186
AAAI 16,0.02305
IJCAI 13,0.017109
IJCAI 16,0.033059




Year	R2D	Variance
2013/14	0.2322	0.0209
2016	0.2712	0.0284


## $R1(e) = Method(e) \land Data(e) \land Exp(e)$

In [7]:
experiment = ['hypothesis', 'prediction',
        'open_source_code', 'open_experiment_code',
        'hardware_specification', 'software_dependencies',
        'experiment_setup', 'evaluation_criteria']
r1_columns = r2_columns + experiment

experimental_data.loc[:, 'Exp'] = experimental_data[experiment].all(axis=1)
print('Exp(e)\nTotal = {:.4f}'.format(experimental_data['Exp'].sum()))
display(experimental_data[['Exp', 'conference']].groupby('conference').sum())

experimental_data.loc[:, 'ExpD'] = experimental_data[experiment].mean(axis=1)
print('\n\nExpDegree(e)\nTotal = {:.4f}, variance = {:.4f}\nBy conference, followed by variance'
      .format(experimental_data['ExpD'].mean(), experimental_data['ExpD'].var()))
display(experimental_data[['ExpD', 'conference']].groupby('conference').mean())
display(experimental_data[['ExpD', 'conference']].groupby('conference').var())

print('\n\nYear\tExpD\tVariance\n2013/14\t{:.4f}\t{:.4f}'.format(
    experimental_data[early_years_index].ExpD.mean(),
    experimental_data[early_years_index].ExpD.var()))
print('2016\t{:.4f}\t{:.4f}'.format(
    experimental_data[late_years_index].ExpD.mean(),
    experimental_data[late_years_index].ExpD.var()))


experimental_data.loc[:, 'R1'] = experimental_data[r1_columns].all(axis=1)
print('\n\nR1(e)\nTotal = {:.4f}'.format(experimental_data['R1'].sum()))
display(experimental_data[['R1', 'conference']].groupby('conference').sum())

experimental_data.loc[:, 'R1D'] = experimental_data[r1_columns].mean(axis=1)
print('\n\nR1D(e)\nTotal = {:.4f}, variance = {:.4f}\nBy conference, followed by variance'
      .format(experimental_data['R1D'].mean(), experimental_data['R1D'].var()))
display(experimental_data[['R1D', 'conference']].groupby('conference').mean())
display(experimental_data[['R1D', 'conference']].groupby('conference').var())

print('\n\nYear\tR1D\tVariance\n2013/14\t{:.4f}\t{:.4f}'.format(
    experimental_data[early_years_index].R1D.mean(),
    experimental_data[early_years_index].R1D.var()))
print('2016\t{:.4f}\t{:.4f}'.format(
    experimental_data[late_years_index].R1D.mean(),
    experimental_data[late_years_index].R1D.var()))

Exp(e)
Total = 0.0000


Unnamed: 0_level_0,Exp
conference,Unnamed: 1_level_1
AAAI 14,False
AAAI 16,False
IJCAI 13,False
IJCAI 16,False




ExpDegree(e)
Total = 0.2235, variance = 0.0219
By conference, followed by variance


Unnamed: 0_level_0,ExpD
conference,Unnamed: 1_level_1
AAAI 14,0.172059
AAAI 16,0.214706
IJCAI 13,0.197183
IJCAI 16,0.306548


Unnamed: 0_level_0,ExpD
conference,Unnamed: 1_level_1
AAAI 14,0.018592
AAAI 16,0.018085
IJCAI 13,0.019046
IJCAI 16,0.02199




Year	ExpD	Variance
2013/14	0.1835	0.0188
2016	0.2604	0.0220


R1(e)
Total = 0.0000


Unnamed: 0_level_0,R1
conference,Unnamed: 1_level_1
AAAI 14,False
AAAI 16,False
IJCAI 13,False
IJCAI 16,False




R1D(e)
Total = 0.2383, variance = 0.0140
By conference, followed by variance


Unnamed: 0_level_0,R1D
conference,Unnamed: 1_level_1
AAAI 14,0.213408
AAAI 16,0.231972
IJCAI 13,0.200977
IJCAI 16,0.301436


Unnamed: 0_level_0,R1D
conference,Unnamed: 1_level_1
AAAI 14,0.010715
AAAI 16,0.0115
IJCAI 13,0.008913
IJCAI 16,0.018873




Year	R1D	Variance
2013/14	0.2078	0.0099
2016	0.2665	0.0163


## Versions
Here's a generated output to keep track of software versions used to run this Jupyter notebook.

In [8]:
import IPython
import platform

print('Python version: {}'.format(platform.python_version()))
print('IPython version: {}'.format(IPython.__version__))
print('pandas version: {}'.format(pd.__version__))

Python version: 3.5.3
IPython version: 6.1.0
pandas version: 0.20.3
